
Trust Score
85
Top 15% of models
Live Benchmark Scores
HELM Overall
+2.187.3
MMLU
+0.884.6
TruthfulQA
+1.479.2
GSM8K
+3.291.5
HumanEval
-0.573.8
LMArena ELO
+181247
Specialty & Key Metrics
Specialty
Model Benchmarking
Primary KPIs
HELM ScoreTruthfulQAMMLU
About This Model
Model benchmarking specialist with HELM and MMLU focus. Specializes in Model Benchmarking.
Trust Score
85
Predictability
75
Difficulty
52
Surprise Index
36