We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Documentation

JSON Mode

In addition to responding in text, the DeepInfra API has an option to request that responsesbe returned in JSON format. To learn more, read our blog.

We provide JSON mode both in our inference API as well as our OpenAI compatible API, supported by a lot of our models.

Using JSON Mode

Activating a JSON response in any of deepinfra's text APIs, including /v1/inference, /v1/openai/completions and /v1/openai/chat/completions is performed in the same way: adding a parameter response_format and setting its value to {"type": "json_object"}

Example

Let's go through some simple example of learning about scientific discoveries.

This is how you set up our endpoint

import openai
import json

client = openai.OpenAI(
    base_url="https://api.deepinfra.com/v1/openai",
    api_key="<Your-DeepInfra-API-Key>",
)
copy

Here is an example of using the openai chat API to invoke a model with JSON mode:

messages = [
    {
        "role": "user",
        "content": "Provide a JSON list of 3 famous scientific breakthroughs in the past century, all of the countries which contributed, and in what year."
    }
]

response = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    messages=messages,
    response_format={"type":"json_object"},
    tool_choice="auto",
)

The resulting response.choices[0].message.content will contain a string with JSON:

{
  "breakthroughs": [
    {
      "name": "Penicillin",
      "country": "UK",
      "year": 1928
    },
    {
      "name": "The Double Helix Structure of DNA",
      "country": "US",
      "year": 1953
    },
    {
      "name": "Artificial Heart",
      "country": "US",
      "year": 2008
    }
  ]
}
copy

Caveats and warnings

It is highly recommended to prompt the model to produce JSON. While this is not strictly necessary, failing to prompt the model to produce JSON can occasionally produce nonsensical responses as the model may misunderstand your intent. For example, a model unaware it is producing JSON may mismatch a quote, leading to stray : characters appearing in strings, which while still technically valid JSON, may degrade the quality of the response.

Currently, the API will not guarantee the resulting JSON object is complete at the end of a response.

For example, if the model stops due to length, the JSON object in the response will be improperly terminated, for example in the middle of a string or object.

A note about JSON and model alignment and accuracy

As a big warning and caveat of JSON mode, JSON mode interferes with model's alignment, or "self-control". In particular, when forced to produce a JSON response, the model will be more likely to make up information rather than explain that it does not know, or it will be more likely to behave in ways that fall outside of its training, producing undesirable output rather than objecting.

Let's take a really simple prompt:

messages = [
    {
        "role": "user",
        "content": "What is the weather in San Francisco?"
    }
]
response = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    messages=messages,
    tool_choice="auto",
)
copy

This prompt, using the default "text" response_format will give a reasonable canned response:

" I don't have real-time updates or location tracking capabilities, so I can't provide current weather information for San Francisco. Please check a reliable weather website or app for this information."

However, now let's add response_format={"type": "json_object"}. The model now merrily produces a made-up weather forecast with no objection:

{
  "location":"San Francisco",
  "weather":[
    {
      "timestamp":163856000,
      "description":"Mostly cloudy",
      "temperature":25,
      "feels_like":26.2,
      "humidity":80,
      "wind":{
        "speed":4.7,
        "degrees":0
      }
    }
  ]
}
copy

Because this output format effectively overly constrains the model in such a way that it cannot produce alignment warnings, it instead responds with the most probable tokens, a wildly inaccurate guess of today's weather.

Function Calling Multimodal Models

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales Get Started

Latest Models

bigcode/

starcoder2-15b

Gryphe/

MythoMax-L2-13b

openai/

whisper-tiny

Phind/

Phind-CodeLlama-34B-v2

openchat/

openchat_3.5

Featured Models

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-Turbo

moonshotai/

Kimi-K2-Instruct

mistralai/

Voxtral-Mini-3B-2507

deepseek-ai/

DeepSeek-R1-0528-Turbo

zai-org/

GLM-4.5V

deepseek-ai/

DeepSeek-V3

Company

Pricing

Docs

Compare

DeepStart

About

Careers

Trust Center

Privacy

Terms

Have questions or need a custom solution?

Contact Sales