
Trust Score
83
Top 15% of models
Live Benchmark Scores
HELM Overall
+2.187.3
MMLU
+0.884.6
TruthfulQA
+1.479.2
GSM8K
+3.291.5
HumanEval
-0.573.8
LMArena ELO
+181247
Specialty & Key Metrics
Specialty
Multi-agent Researcher
Primary KPIs
Insight ScoreTask Completion
About This Model
Multi-agent researcher with task completion focus. Specializes in Multi-agent Researcher.
Trust Score
83
Predictability
88
Difficulty
66
Surprise Index
28