Databricks’ dolly-v2-12b, an instruction-following large language model trained on the Databricks machine learning platform that is licensed for commercial use. Based on pythia-12b, Dolly is trained on ~15k instruction/response fine tuning records

You can use cURL or any other http client to run inferences:

```bash
curl -X POST \
    -d '{"input": "I have this dream"}'  \
    -H "Authorization: bearer $(deepctl auth token)"  \
    -H 'Content-Type: application/json'  \
    'https://api.deepinfra.com/v1/inference/databricks/dolly-v2-12b'
```

which will give you back something similar to:

```json
{
  "results": [
    {
      "generated_text": "I have this dream about the day I got a job at a tech company. I just woke up on a plane. I sat down on the floor and started getting work done. After getting up around 6 p.m., I looked around and"
    }
  ],
  "num_tokens": 42,
  "num_input_tokens": 100,
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

```


You can use our command-line tool [deepctl](/docs/getting-started) to run
inferences:

```bash
deepctl infer \
    -m 'databricks/dolly-v2-12b'  \
    -i 'input=I have this dream'
```

which will give you back something similar to:

```json
{
  "results": [
    {
      "generated_text": "I have this dream about the day I got a job at a tech company. I just woke up on a plane. I sat down on the floor and started getting work done. After getting up around 6 p.m., I looked around and"
    }
  ],
  "num_tokens": 42,
  "num_input_tokens": 100,
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

```


input

maximum length of the newly generated generated text.If not set or None defaults to model's max context length minus input length.

max_new_tokens

temperature to use for sampling. 0 means the output is deterministic. Values greater than 1 encourage more diversity

temperature

Sample from the set of tokens with highest probability such that sum of probabilies is higher than p. Lower values focus on the most probable tokens.Higher values sample more low-probability tokens

top_p

Sample from the best k (number of) tokens. 0 means off

top_k

repetition penalty. Value of 1 means no penalty, values greater than 1 discourage repetition, smaller than 1 encourage repetition.

repetition_penalty

Up to 16 strings that will terminate generation immediately

stop

Number of output sequences to return. Incompatible with streaming

num_responses

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

presence_penalty

Positive values penalize new tokens based on how many times they appear in the text so far, increasing the model's likelihood to talk about new topics.

frequency_penalty

The webhook to call when inference is done, by default you will get the output in the response of your inference request

webhook

Whether to stream tokens, by default it will be false, currently only supported for Llama 2 text generation models, token by token updates will be sent over SSE

stream

Frequency Penalty

Input

Max New Tokens

Num Responses

Presence Penalty

Repetition Penalty

Stop

Stream

Temperature

Top K

Top P

Webhook

TextGenerationIn

I have this dream about the day I got a job at a tech company. I just woke up on a plane. I sat down on the floor and started getting work done. After getting up around 6 p.m., I looked around and

databricks/dolly-v2-12b

HTTP/cURL API

Input fields

`input`string

`max_new_tokens`integer

`temperature`number

`top_p`number

`top_k`integer

`repetition_penalty`number

`stop`array

`num_responses`integer

`presence_penalty`number

`frequency_penalty`number

`webhook`file

`stream`boolean

Input Schema

Output Schema

databricks/dolly-v2-12b

HTTP/cURL API

Input fields

inputstring

max_new_tokensinteger

temperaturenumber

top_pnumber

top_kinteger

repetition_penaltynumber

stoparray

num_responsesinteger

presence_penaltynumber

frequency_penaltynumber

webhookfile

streamboolean