Project Pidgen: v5 to v7
Three versions of Project Pidgen.
Project Pidgen is an ambitious project, aiming to improve the model’s “thought,” its ability to perceive the “future” of its present action. The model has improved significantly from its previous version.
Overview
| PidgenV5 | PidgenV6 | PidgenV7 | |
|---|---|---|---|
| Params | 9.34B | 7.31B | 865M |
| Tokens seen | 8B | 12B | 5B |
| GPU memory | 34.4 GB | 27.2 GB | 6.7 GB |
| Val perplexity | 9.91 | 8.17 | 8.94 |
Benchmarks
These five are the benchmarks all three versions report.
PidgenV5 · 9.34B PidgenV6 · 7.31B PidgenV7 · 865M
PIQA
ARC-Easy
Winogrande
HellaSwag
ARC-Challenge
Full results below. Dashes mark benchmarks a given version did not report.
| Benchmark | PidgenV5 | PidgenV6 | PidgenV7 |
|---|---|---|---|
| PIQA | 69.1 | 70.7 | 64.0 |
| ARC-Easy | 54.9 | 57.8 | 48.0 |
| Winogrande | 51.4 | 53.4 | 56.5 |
| HellaSwag | 41.6 | 46.9 | 40.0 |
| ARC-Challenge | 32.8 | 34.4 | 29.5 |
| MMLU (5-shot) | 26.0 | 27.3 | — |
| SciQ | — | — | 74.5 |
| OpenBookQA | — | — | 30.5 |
| LAMBADA | — | — | 22.5 |
| Val perplexity | 9.91 | 8.17 | 8.94 |
Reading the results
These are early checkpoints. The numbers are best read against chance and the token budget, not against finished models. They were trained on 5 to 12 billion tokens, far short of what a model this size would normally train on.
Long-context retrieval
PidgenV5 · 9.34B0/10
PidgenV6 · 7.31B1/10
PidgenV7 · 865M10/10
Inspired by
- Geoffrey Hinton deep learning
- Richard Sutton reinforcement learning
- Yann LeCun convolutional networks
- Jürgen Schmidhuber recurrent networks
- Yoshua Bengio neural language models