LLM fact recall benchmark for Dark Souls game knowledge
| Model | Parameters | Type | License | Score |
|---|---|---|---|---|
| 1kimi-k2.5 | 1T (32B active) | MoE | Open | 96% |
| claude-opus-4-5 | ? | unknown | Proprietary | 93.1% |
| gpt-5.1 | ? | unknown | Proprietary | 91.4% |
| gpt-4-turbo | ? | MoE | Proprietary | 88.6% |
| 2mistral-small-3.2 | 24B | Dense | Open | 76% |
| 3devstral-2 | 123B | Dense | Open | 63.4% |
| 4mistral-large-3 | 675B (41B active) | MoE | Open | 53.7% |
| 5llama-4-maverick | 400B (17B active) | MoE | Open | 47.4% |
| 6gpt-oss-120b | 117B (5.1B active) | MoE | Open | 46.9% |
| 7llama-4-scout | 109B (17B active) | MoE | Open | 37.7% |
| 8qwen3-235b-a22b | 235B (22B active) | MoE | Open | 35.4% |
| 9gpt-oss-20b | 21B (3.6B active) | MoE | Open | 29.1% |
| 10ministral-14b | 14B | Dense | Open | 22.9% |
| Model | Bosses | Areas & | NPCs & | Items & | Lore | Mechani |
|---|---|---|---|---|---|---|
| kimi-k2.5 | 93 | 97 | 97 | 94 | 100 | 95 |
| claude-opus-4-5 | 87 | 90 | 94 | 97 | 100 | 90 |
| gpt-5.1 | 87 | 97 | 86 | 89 | 100 | 95 |
| gpt-4-turbo | 80 | 100 | 86 | 89 | 92 | 85 |
| mistral-small-3.2 | 50 | 37 | 91 | 89 | 100 | 95 |
| devstral-2 | 67 | 53 | 54 | 57 | 92 | 65 |
| mistral-large-3 | 57 | 47 | 51 | 29 | 92 | 60 |
| llama-4-maverick | 43 | 43 | 34 | 34 | 84 | 60 |
| gpt-oss-120b | 40 | 37 | 31 | 40 | 88 | 60 |
| llama-4-scout | 40 | 30 | 29 | 26 | 64 | 50 |
| qwen3-235b-a22b | 30 | 37 | 29 | 26 | 60 | 40 |
| gpt-oss-20b | 20 | 33 | 29 | 26 | 40 | 30 |
| ministral-14b | 20 | 30 | 20 | 23 | 24 | 20 |