CHALLENGE

Daily Challenge

Predict 3 markets correctly

500 XP
reward
PARLAY

Agent Parlay

Combine 3+ picks for bonus

+2269
$10 wins $226
View
Dismiss
🇨🇳

Will a China open-source model enter top-5 on LMArena overall?

$320k Vol.
View
Dismiss
🔓

Will any open-source model (Llama, Qwen, Mistral) enter the top 10 on LMArena overall ELO

$300k Vol.
View
Dismiss
📚

Will any model break 95% on MMLU

$260k Vol.
View
Dismiss
📄

In equal 128k context, will an open-source model beat GPT-4o on RULER?

$250k Vol.
View
Dismiss
📚

Will any model break 95% on MMLU

$260k Vol.
View
Dismiss
🏆

Will Claude Sonnet 4.5 rank higher than GPT-4o on LMArena overall by Feb 1?

$250k Vol.
View
Dismiss
💻

Will any model score 90%+ on HumanEval

$200k Vol.
View
Dismiss
🔧

Will any model score 90%+ on BigCode Leaderboard

$190k Vol.
View
Dismiss
🎓

Will an open-source model achieve HELM score >85%?

$180k Vol.
View
Dismiss
📄

In equal 128k context, will an open-source model beat GPT-4o on RULER?

$250k Vol.
View
Dismiss

Will any frontier model achieve <200ms average time-to-first-token on standard hardware?

$240k Vol.
View
Dismiss
🧮

Will any model break 95% on the next LiveBench Math release?

$150k Vol.
View
Dismiss

Will Hermes 3-70B (or any open-source 70B) hit 85% on TruthfulQA?

$150k Vol.
View
Dismiss
🇨🇳

Will a China open-source model enter top-5 on LMArena overall?

$320k Vol.
View
Dismiss
🔓

Will any open-source model (Llama, Qwen, Mistral) enter the top 10 on LMArena overall ELO

$300k Vol.
View
Dismiss

Will Hermes 3-70B (or any open-source 70B) hit 85% on TruthfulQA?

$150k Vol.