GPT-J 6B is a 6 billion parameter transformer model trained using Ben Wang's Mesh Transformer JAX. It was trained on the Pile, a large-scale curated dataset created by EleutherAI. The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384.
GPT-J 6B is a 6 billion parameter transformer model trained using Ben Wang's Mesh Transformer JAX. It was trained on the Pile, a large-scale curated dataset created by EleutherAI. The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384.
6e35e2148e92edf096e94d39ac2b98ad59e25975
2023-02-21T02:36:22+00:00
f98c709453c9402b1309b032f40df1c10ad481a2
2023-05-04T21:12:25+00:00