The Pythia Scaling Suite includes 16 models (2 per size) with sizes ranging from 70M to 12B parameters, trained on the Pile dataset. The models are designed for interpretability research and match or exceed performance of similar models. The suite includes 154 intermediate checkpoints per model.
The Pythia Scaling Suite includes 16 models (2 per size) with sizes ranging from 70M to 12B parameters, trained on the Pile dataset. The models are designed for interpretability research and match or exceed performance of similar models. The suite includes 154 intermediate checkpoints per model.
3fef353ace0849cccae3f4d5b45a4a962217be9d
2023-05-04T23:43:58+00:00