GPT-J 6B is a 6 billion parameter transformer model trained using Ben Wang's Mesh Transformer JAX. It was trained on the Pile, a large-scale curated dataset created by EleutherAI. The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384.

GPT-J 6B is a 6 billion parameter transformer model trained using Ben Wang's Mesh Transformer JAX. It was trained on the Pile, a large-scale curated dataset created by EleutherAI. The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384.

Public

$0.0005/sec

text to generate from

maximum length of the newly generated generated text (Default: 2048, 1 ≤ max_new_tokens ≤ 100000)

Temperature

temperature to use for sampling. 0 means the output is deterministic. Values greater than 1 encourage more diversity (Default: 0.7, 0 ≤ temperature ≤ 100)

Sample from the set of tokens with highest probability such that sum of probabilies is higher than p. Lower values focus on the most probable tokens.Higher values sample more low-probability tokens (Default: 0.9, 0 < top_p ≤ 1)

Sample from the best k (number of) tokens. 0 means off (Default: 0, 0 ≤ top_k < 100000)

Repetition Penalty

repetition penalty. Value of 1 means no penalty, values greater than 1 discourage repetition, smaller than 1 encourage repetition. (Default: 1.2, 0.01 ≤ repetition_penalty ≤ 5)

Up to 4 strings that will terminate generation immediately. Please separate items by comma

Num Responses

Number of output sequences to return. Incompatible with streaming (Default: 1, 1 ≤ num_responses ≤ 2)

You need to login to use this model

I have this dream about the day I got a job at a tech company. I just woke up on a plane. I sat down on the floor and started getting work done. After getting up around 6 p.m., I looked around and