filed / technical / template published / 2026/05/25

Project Pidgen: v5 to v7

Three versions of Project Pidgen.

Project Pidgen is an ambitious project, aiming to improve the model’s “thought,” its ability to perceive the “future” of its present action. The model has improved significantly from its previous version.

Overview

	PidgenV5	PidgenV6	PidgenV7
Params	9.34B	7.31B	865M
Tokens seen	8B	12B	5B
GPU memory	34.4 GB	27.2 GB	6.7 GB
Val perplexity	9.91	8.17	8.94

Benchmarks

These five are the benchmarks all three versions report.

PidgenV5 · 9.34B PidgenV6 · 7.31B PidgenV7 · 865M

PIQA

69.1

70.7

64.0

ARC-Easy

54.9

57.8

48.0

Winogrande

51.4

53.4

56.5

HellaSwag

41.6

46.9

40.0

ARC-Challenge

32.8

34.4

29.5

Figure 1. Accuracy (%) on the five shared benchmarks.

Full results below. Dashes mark benchmarks a given version did not report.

Benchmark	PidgenV5	PidgenV6	PidgenV7
PIQA	69.1	70.7	64.0
ARC-Easy	54.9	57.8	48.0
Winogrande	51.4	53.4	56.5
HellaSwag	41.6	46.9	40.0
ARC-Challenge	32.8	34.4	29.5
MMLU (5-shot)	26.0	27.3	—
SciQ	—	—	74.5
OpenBookQA	—	—	30.5
LAMBADA	—	—	22.5
Val perplexity	9.91	8.17	8.94

Reading the results

These are early checkpoints. The numbers are best read against chance and the token budget, not against finished models. They were trained on 5 to 12 billion tokens, far short of what a model this size would normally train on.

Long-context retrieval

PidgenV5 · 9.34B0/10

PidgenV6 · 7.31B1/10

PidgenV7 · 865M10/10

Figure 2. Needle-in-haystack pass rate, 10 trials per version, context ~2k–5k tokens.

← all entries