FLUX.2 is live! High-fidelity image generation made simple.

DeepInfra is proud to announce that we have released "JSON mode" across all of our text language models. It is available through the "response_format" object, which currently supports only {"type": "json_object"}
Our JSON mode will guarantee that all tokens returned in the output of a language model completion or chat response conforms to valid JSON (JavaScript Object Notation).
The JSON format carries no performance overhead, and the feature is already available on all of our models for free. Please try it out!
Activating a JSON response in any of deepinfra's text APIs, including /v1/inference, /v1/openai/completions and /v1/openai/chat/completions is performed in the same way: adding a parameter response_format and setting its value to {"type": "json_object"}
For the best quality responses, it is also recommended to prompt the model to produce JSON, perhaps also indicating which fields to include in the resulting object.
Here is an example of using the openai chat API to invoke a model with JSON mode:
messages = [
{
"role": "user",
"content": "Provide a JSON list of 3 famous scientific breakthroughs in the past century, all of the countries which contributed, and in what year."
}
]
response = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.1",
messages=messages,
response_format={"type":"json_object"},
tool_choice="auto",
)
The resulting response.choices[0].message.content will contain a string with JSON:
{
"breakthroughs": [
{
"name": "Penicillin",
"country": "UK",
"year": 1928
},
{
"name": "The Double Helix Structure of DNA",
"country": "US",
"year": 1953
},
{
"name": "Artificial Heart",
"country": "US",
"year": 2008
}
]
}
JSON is an ideal fit for language models due to the combination of its concise structure and the flexability of structured data that can be stored inside. Language models will pick up on the fact that JSON is being output and structure their output, often producing more data-driven responses with less tokens wasted on unwanted explanations or fluff.
JSON support will also open the door to more reliable function calling. Expect to see more improvements as we continue to iterate on this capability.
Like every aspect of inference, it is not without its tradeoffs.
Pros:
Cons:
We're excited to finally launch JSON output to our platform. Read our JSON Mode Documentation.
There is still a lot unexplored, and we'd love to hear feedback about your thoughts and use-cases with JSON or other structured output. Join our Discord, Twitter for future updates.
Have fun!
GLM-4.6 API: Get fast first tokens at the best $/M from Deepinfra's API - Deep Infra<p>GLM-4.6 is a high-capacity, “reasoning”-tuned model that shows up in coding copilots, long-context RAG, and multi-tool agent loops. With this class of workload, provider infrastructure determines perceived speed (first-token time), tail stability, and your unit economics. Using ArtificialAnalysis (AA) provider charts for GLM-4.6 (Reasoning), DeepInfra (FP8) pairs a sub-second Time-to-First-Token (TTFT) (0.51 s) with the […]</p>
Llama 3.1 70B Instruct API from DeepInfra: Snappy Starts, Fair Pricing, Production Fit - Deep Infra<p>Llama 3.1 70B Instruct is Meta’s widely-used, instruction-tuned model for high-quality dialogue and tool use. With a ~131K-token context window, it can read long prompts and multi-file inputs—great for agents, RAG, and IDE assistants. But how “good” it feels in practice depends just as much on the inference provider as on the model: infra, batching, […]</p>
Chat with books using DeepInfra and LlamaIndexAs DeepInfra, we are excited to announce our integration with LlamaIndex.
LlamaIndex is a powerful library that allows you to index and search documents
using various language models and embeddings. In this blog post, we will show
you how to chat with books using DeepInfra and LlamaIndex.
We will ...© 2025 Deep Infra. All rights reserved.