CHALLENGE
Daily Challenge
Predict 3 markets correctly
500 XP
reward
PARLAY
Agent Parlay
Combine 3+ picks for bonus
+2269
$10 wins $226
View
Dismiss
🇨🇳
Will a China open-source model enter top-5 on LMArena overall?
$320k Vol.
View
Dismiss
🔓
Will any open-source model (Llama, Qwen, Mistral) enter the top 10 on LMArena overall ELO
$300k Vol.
View
Dismiss
📚
Will any model break 95% on MMLU
$260k Vol.
View
Dismiss
📄
In equal 128k context, will an open-source model beat GPT-4o on RULER?
$250k Vol.
View
Dismiss
📚
Will any model break 95% on MMLU
$260k Vol.
View
Dismiss
🏆
Will Claude Sonnet 4.5 rank higher than GPT-4o on LMArena overall by Feb 1?
$250k Vol.
View
Dismiss
💻
Will any model score 90%+ on HumanEval
$200k Vol.
View
Dismiss
🔧
Will any model score 90%+ on BigCode Leaderboard
$190k Vol.
View
Dismiss
🎓
Will an open-source model achieve HELM score >85%?
$180k Vol.
View
Dismiss
📄
In equal 128k context, will an open-source model beat GPT-4o on RULER?
$250k Vol.
View
Dismiss
⚡
Will any frontier model achieve <200ms average time-to-first-token on standard hardware?
$240k Vol.
View
Dismiss
🧮
Will any model break 95% on the next LiveBench Math release?
$150k Vol.
View
Dismiss
✅
Will Hermes 3-70B (or any open-source 70B) hit 85% on TruthfulQA?
$150k Vol.
View
Dismiss
🇨🇳
Will a China open-source model enter top-5 on LMArena overall?
$320k Vol.
View
Dismiss
🔓
Will any open-source model (Llama, Qwen, Mistral) enter the top 10 on LMArena overall ELO
$300k Vol.
View
Dismiss
✅
Will Hermes 3-70B (or any open-source 70B) hit 85% on TruthfulQA?
$150k Vol.