We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Guaranteed JSON output on Open-Source LLMs.
Published on 2024.03.08 by Patrick Reiter Horn
Guaranteed JSON output on Open-Source LLMs.

DeepInfra is proud to announce that we have released "JSON mode" across all of our text language models. It is available through the "response_format" object, which currently supports only {"type": "json_object"}

Our JSON mode will guarantee that all tokens returned in the output of a language model completion or chat response conforms to valid JSON (JavaScript Object Notation).

The JSON format carries no performance overhead, and the feature is already available on all of our models for free. Please try it out!

Using JSON mode

Activating a JSON response in any of deepinfra's text APIs, including /v1/inference, /v1/openai/completions and /v1/openai/chat/completions is performed in the same way: adding a parameter response_format and setting its value to {"type": "json_object"}

For the best quality responses, it is also recommended to prompt the model to produce JSON, perhaps also indicating which fields to include in the resulting object.

Example of JSON mode

Here is an example of using the openai chat API to invoke a model with JSON mode:

messages = [
    {
        "role": "user",
        "content": "Provide a JSON list of 3 famous scientific breakthroughs in the past century, all of the countries which contributed, and in what year."
    }
]

response = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    messages=messages,
    response_format={"type":"json_object"},
    tool_choice="auto",
)

The resulting response.choices[0].message.content will contain a string with JSON:

{
  "breakthroughs": [
    {
      "name": "Penicillin",
      "country": "UK",
      "year": 1928
    },
    {
      "name": "The Double Helix Structure of DNA",
      "country": "US",
      "year": 1953
    },
    {
      "name": "Artificial Heart",
      "country": "US",
      "year": 2008
    }
  ]
}
copy

Why JSON?

JSON is an ideal fit for language models due to the combination of its concise structure and the flexability of structured data that can be stored inside. Language models will pick up on the fact that JSON is being output and structure their output, often producing more data-driven responses with less tokens wasted on unwanted explanations or fluff.

JSON support will also open the door to more reliable function calling. Expect to see more improvements as we continue to iterate on this capability.

Like every aspect of inference, it is not without its tradeoffs.

Pros:

  • Jumps straight to the desired information, skipping boilerplate text. Expect meaningful output within the first 10 tokens.
  • Responses will be very data oriented, ideal for things like dates or lists.

Cons:

  • No text before or after the JSON response: models are unlikely to explain their reasoning.
  • JSON formatted responses have a greater tedency to make up information if a question is asked which the model cannot possibly know.

Conclusion

We're excited to finally launch JSON output to our platform. Read our JSON Mode Documentation.

There is still a lot unexplored, and we'd love to hear feedback about your thoughts and use-cases with JSON or other structured output. Join our Discord, Twitter for future updates.

Have fun!

Related articles
GLM-4.7-Flash API Benchmarks: Latency, Throughput & CostGLM-4.7-Flash API Benchmarks: Latency, Throughput & Cost<p>About GLM-4.7-Flash GLM-4.7-Flash is Z.AI&#8217;s open-weights reasoning model released in January 2026. Built on a Mixture-of-Experts (MoE) Transformer architecture, it features 30 billion total parameters with only ~3 billion active per inference — making it exceptionally efficient for its capability class. The model is designed as a lightweight, cost-effective alternative to Z.AI&#8217;s flagship GLM-4.7, optimized [&hellip;]</p>
How to Use OpenClaw with DeepInfra: Setup & Workflow GuideHow to Use OpenClaw with DeepInfra: Setup & Workflow Guide<p>When you first learn how to use OpenClaw, the onboarding flow asks for an API key and points you toward Anthropic or OpenAI. Reasonable starting point. For production agents running dozens of tasks a day, it&#8217;s an expensive one. OpenClaw works with any OpenAI-compatible API, so you can swap the default model for an open-weight [&hellip;]</p>
Building Efficient AI Inference on NVIDIA Blackwell PlatformBuilding Efficient AI Inference on NVIDIA Blackwell PlatformDeepInfra delivers up to 20x cost reductions on NVIDIA Blackwell by combining MoE architectures, NVFP4 quantization, and inference optimizations — with a Latitude case study.