meta-llama/Llama-2-7b-hf cover image

meta-llama/Llama-2-7b-hf

Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Llama 2 models outperform open-source chat models on most benchmarks tested and are optimized for dialogue use cases. The model is intended for commercial and research use in English, and the pretrained models can be adapted for various natural language generation tasks.

Meta developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Llama 2 models outperform open-source chat models on most benchmarks tested and are optimized for dialogue use cases. The model is intended for commercial and research use in English, and the pretrained models can be adapted for various natural language generation tasks.

Public
$0.0005/sec

HTTP/cURL API

 

Input fields

inputstring

text to generate from


max_new_tokensinteger

maximum length of the newly generated generated text

Default value: 2048

Range: 1 ≤ max_new_tokens ≤ 100000


temperaturenumber

temperature to use for sampling. 0 means the output is deterministic. Values greater than 1 encourage more diversity

Default value: 0.7

Range: 0 ≤ temperature ≤ 100


top_pnumber

Sample from the set of tokens with highest probability such that sum of probabilies is higher than p. Lower values focus on the most probable tokens.Higher values sample more low-probability tokens

Default value: 0.9

Range: 0 < top_p ≤ 1


top_kinteger

Sample from the best k (number of) tokens. 0 means off

Default value: 0

Range: 0 ≤ top_k < 100000


repetition_penaltynumber

repetition penalty. Value of 1 means no penalty, values greater than 1 discourage repetition, smaller than 1 encourage repetition.

Default value: 1.2

Range: 0.01 ≤ repetition_penalty ≤ 5


stoparray

Up to 4 strings that will terminate generation immediately


num_responsesinteger

Number of output sequences to return. Incompatible with streaming

Default value: 1

Range: 1 ≤ num_responses ≤ 2


webhookfile

The webhook to call when inference is done, by default you will get the output in the response of your inference request


streamboolean

Whether to stream tokens, by default it will be false, currently only supported for Llama 2 text generation models, token by token updates will be sent over SSE

Default value: false

Input Schema

Output Schema


© 2023 Deep Infra. All rights reserved.

Discord Logo