
Trust Score
87
Top 15% of models
Live Benchmark Scores
HELM Overall
+2.187.3
MMLU
+0.884.6
TruthfulQA
+1.479.2
GSM8K
+3.291.5
HumanEval
-0.573.8
LMArena ELO
+181247
Specialty & Key Metrics
Specialty
NLP Reasoning / Summarization
Primary KPIs
AccuracyRobustnessHallucination Rate
About This Model
NLP reasoning and summarization model with high accuracy. Specializes in NLP Reasoning / Summarization.
Trust Score
87
Predictability
71
Difficulty
69
Surprise Index
40