EleutherAI/gpt-neo-1.3B cover image

EleutherAI/gpt-neo-1.3B

We present GPT-Neo 1.3B, a transformer model designed using EleutherAI's replication of the GPT-3 architecture. With 1.3B parameters, this model was trained on the large-scale curated dataset, Pile, for 380 billion tokens over 362,000 steps. Its intended use is for text generation, where it learns an inner representation of the English language and can generate texts from a prompt.

We present GPT-Neo 1.3B, a transformer model designed using EleutherAI's replication of the GPT-3 architecture. With 1.3B parameters, this model was trained on the large-scale curated dataset, Pile, for 380 billion tokens over 362,000 steps. Its intended use is for text generation, where it learns an inner representation of the English language and can generate texts from a prompt.

Public
$0.0005 / sec
2k

Input

text to generate from

maximum length of the newly generated generated text.If not set or None defaults to model's max context length minus input length. (Default: 512, 1 ≤ max_new_tokens ≤ 100000)

Temperature

temperature to use for sampling. 0 means the output is deterministic. Values greater than 1 encourage more diversity (Default: 0.7, 0 ≤ temperature ≤ 100)

Sample from the set of tokens with highest probability such that sum of probabilies is higher than p. Lower values focus on the most probable tokens.Higher values sample more low-probability tokens (Default: 0.9, 0 < top_p ≤ 1)

Sample from the best k (number of) tokens. 0 means off (Default: 0, 0 ≤ top_k < 100000)

Repetition Penalty

repetition penalty. Value of 1 means no penalty, values greater than 1 discourage repetition, smaller than 1 encourage repetition. (Default: 1, 0.01 ≤ repetition_penalty ≤ 5)

Up to 16 strings that will terminate generation immediately. Please separate items by comma

Num Responses

Number of output sequences to return. Incompatible with streaming (Default: 1, 1 ≤ num_responses ≤ 2)

How to format the response 2

Presence Penalty

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. (Default: 0, -2 ≤ presence_penalty ≤ 2)

Frequency Penalty

Positive values penalize new tokens based on how many times they appear in the text so far, increasing the model's likelihood to talk about new topics. (Default: 0, -2 ≤ frequency_penalty ≤ 2)

You need to login to use this model

Output

I have this dream about the day I got a job at a tech company. I just woke up on a plane. I sat down on the floor and started getting work done. After getting up around 6 p.m., I looked around and

GPT-Neo 1.3B

Model Description

GPT-Neo 1.3B is a transformer model designed using EleutherAI's replication of the GPT-3 architecture. GPT-Neo refers to the class of models, while 1.3B represents the number of parameters of this particular pre-trained model.

Training data

GPT-Neo 1.3B was trained on the Pile, a large scale curated dataset created by EleutherAI for the purpose of training this model.

Training procedure

This model was trained on the Pile for 380 billion tokens over 362,000 steps. It was trained as a masked autoregressive language model, using cross-entropy loss.

Intended Use and Limitations

This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating texts from a prompt.

Limitations and Biases

GPT-Neo was trained as an autoregressive language model. This means that its core functionality is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work.

GPT-Neo was trained on the Pile, a dataset known to contain profanity, lewd, and otherwise abrasive language. Depending on your usecase GPT-Neo may produce socially unacceptable text. See Sections 5 and 6 of the Pile paper for a more detailed analysis of the biases in the Pile.

As with all language models, it is hard to predict in advance how GPT-Neo will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.

Eval results

Linguistic Reasoning

Model and SizePile BPBPile PPLWikitext PPLLambada PPLLambada AccWinograndeHellaswag
GPT-Neo 1.3B0.75276.15913.107.49857.23%55.01%38.66%
GPT-2 1.5B1.0468-----17.4810.63451.21%59.40%40.03%
GPT-Neo 2.7B0.71655.64611.395.62662.22%56.50%42.73%
GPT-3 Ada0.9631----------9.95451.60%52.90%35.93%

Physical and Scientific Reasoning

Model and SizeMathQAPubMedQAPiqa
GPT-Neo 1.3B24.05%54.40%71.11%
GPT-2 1.5B23.64%58.33%70.78%
GPT-Neo 2.7B24.72%57.54%72.14%
GPT-3 Ada24.29%52.80%68.88%

Down-Stream Applications

TBD

BibTeX entry and citation info

To cite this model, please use

@software{gpt-neo,
  author       = {Black, Sid and
                  Leo, Gao and
                  Wang, Phil and
                  Leahy, Connor and
                  Biderman, Stella},
  title        = {{GPT-Neo: Large Scale Autoregressive Language
                   Modeling with Mesh-Tensorflow}},
  month        = mar,
  year         = 2021,
  note         = {{If you use this software, please cite it using
                   these metadata.}},
  publisher    = {Zenodo},
  version      = {1.0},
  doi          = {10.5281/zenodo.5297715},
  url          = {https://doi.org/10.5281/zenodo.5297715}
}

@article{gao2020pile,
  title={The Pile: An 800GB Dataset of Diverse Text for Language Modeling},
  author={Gao, Leo and Biderman, Stella and Black, Sid and Golding, Laurence and Hoppe, Travis and Foster, Charles and Phang, Jason and He, Horace and Thite, Anish and Nabeshima, Noa and others},
  journal={arXiv preprint arXiv:2101.00027},
  year={2020}
}