Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!

Modern large language models (LLMs) are incredibly powerful at understanding and generating text, but until recently they were largely static: they could only respond based on patterns in their training data. Function calling changes that. It lets language models interact with external logic — your own code, APIs, utilities, or business systems — while still serving answers in natural language.
DeepInfra now supports function calling across its inference APIs, enabling developers to build intelligent applications that go far beyond static responses. In this post, we’ll break down what function calling is, how it works with DeepInfra, and how you can start using it today.
What Is Function Calling?
At a high level, function calling lets language models decide when and how to call real functions you define — for example, a function that fetches real-time weather, retrieves a user profile, triggers an action in your system, or performs any external computation your application needs.
Traditionally, LLMs only output text. With function calling:
This enables powerful AI workflows that combine natural language reasoning with real-world logic — without you having to hard-code responses.
Why Function Calling Matters
Function calling unlocks a new class of AI applications:
It’s a bridge from static text responses to dynamic, intelligent behaviour.
How DeepInfra Implements Function Calling
DeepInfra provides function calling support as part of its inference endpoints with an OpenAI-compatible API. You simply supply:
When you send these to the DeepInfra API, the model may return a function call response instead of normal text if it decides a function should be invoked.
A Concrete Example: Weather Lookup
Here’s a simplified example of how function calling works once set up with DeepInfra.
You tell the model about your function — its name, what it does, and what inputs it needs:
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}You send a normal chat request, including your function definitions:
messages = [
{"role": "user", "content": "What is the weather in San Francisco?"}
]Instead of text, the model replies with a structured call:
{
"name": "get_current_weather",
"arguments": "{\"location\": \"San Francisco\"}"
}You can now parse this and run your function.
Your application runs the get_current_weather function and returns a real result from your system or an API.
You extend the conversation with the function output and send it back to the model, which completes the answer:
The current temperature in San Francisco, CA is 60 degrees.This pattern lets the model reason independently about when and how to use your logic. Please find the full example, including the code, in our documentation.
To get the most value out of function calling, it’s important to think carefully about how you define and expose your functions to the model. The language model relies heavily on the function descriptions you provide to understand what each function does and when it should be used. Clear, concise, and descriptive explanations significantly improve the model’s ability to select the correct function for a given user request.
Equally important is the structure of your JSON schemas. Well-defined parameter schemas help the model generate valid and usable arguments when invoking a function. Ambiguous or overly permissive schemas can lead to malformed inputs or unexpected behavior, while precise definitions guide the model toward correct and consistent function calls.
When designing your functions, it’s best to keep them small and focused. Functions that try to do too much or handle multiple responsibilities can confuse the model and reduce accuracy. Instead, narrowly scoped functions with a single, clear purpose tend to produce more reliable results and are easier to reason about and maintain.
Model configuration also plays a role in successful function calling. Using a lower temperature setting (typically below 1.0) helps reduce randomness in the model’s output, which is especially important when generating structured data like function arguments. Lower temperature values encourage more deterministic and predictable behavior, improving the overall stability of your application.
Finally, it’s worth being aware of current limitations. For example, nested function calls are not supported in DeepInfra’s current implementation. While the model can decide to call a function, the execution flow must remain linear, with results explicitly passed back to the model between calls.
Once you’re comfortable with the basics of function calling, the possibilities expand quickly. You can build AI agents that orchestrate workflows across multiple services, making decisions and triggering actions based on natural language input. Smart assistants can fetch live data, update systems, or perform operational tasks on behalf of users. Dynamic chatbots can query databases, call internal APIs, and combine multiple data sources to generate meaningful, context-aware responses.
DeepInfra provides the scalable infrastructure needed to run these applications reliably in production. By handling model inference and function calling at scale, DeepInfra allows you to focus on what matters most: designing the logic, integrations, and user experiences that make your product unique.
Function calling is a transformative feature that takes language models from static answer machines to dynamic, interactive components within your software. With DeepInfra’s support for this workflow — and an API that’s compatible with tools you already know — you can seamlessly add intelligent function invocation to your applications.
Whether you’re building assistants, automation tools, or intelligent integrations, function calling opens the door to a new class of AI-powered experiences.
GLM-4.6 vs DeepSeek-V3.2: Performance, Benchmarks & DeepInfra Results<p>The open-source LLM ecosystem has evolved rapidly, and two models stand out as leaders in capability, efficiency, and practical usability: GLM-4.6, Zhipu AI’s high-capacity reasoning model with a 200k-token context window, and DeepSeek-V3.2, a sparsely activated Mixture-of-Experts architecture engineered for exceptional performance per dollar. Both models are powerful. Both are versatile. Both are widely adopted […]</p>
How to OpenAI Whisper with per-sentence and per-word timestamp segmentation using DeepInfraWhisper is a Speech-To-Text model from OpenAI.
Langchain improvements: async and streamingStarting from langchain
v0.0.322 you
can make efficient async generation and streaming tokens with deepinfra.
Async generation
The deepinfra wrapper now supports native async calls, so you can expect more
performance (no more t...© 2026 Deep Infra. All rights reserved.