FLUX.2 is live! High-fidelity image generation made simple.

Modern large language models (LLMs) are incredibly powerful at understanding and generating text, but until recently they were largely static: they could only respond based on patterns in their training data. Function calling changes that. It lets language models interact with external logic — your own code, APIs, utilities, or business systems — while still serving answers in natural language.
DeepInfra now supports function calling across its inference APIs, enabling developers to build intelligent applications that go far beyond static responses. In this post, we’ll break down what function calling is, how it works with DeepInfra, and how you can start using it today.
What Is Function Calling?
At a high level, function calling lets language models decide when and how to call real functions you define — for example, a function that fetches real-time weather, retrieves a user profile, triggers an action in your system, or performs any external computation your application needs.
Traditionally, LLMs only output text. With function calling:
This enables powerful AI workflows that combine natural language reasoning with real-world logic — without you having to hard-code responses.
Why Function Calling Matters
Function calling unlocks a new class of AI applications:
It’s a bridge from static text responses to dynamic, intelligent behaviour.
How DeepInfra Implements Function Calling
DeepInfra provides function calling support as part of its inference endpoints with an OpenAI-compatible API. You simply supply:
When you send these to the DeepInfra API, the model may return a function call response instead of normal text if it decides a function should be invoked.
A Concrete Example: Weather Lookup
Here’s a simplified example of how function calling works once set up with DeepInfra.
You tell the model about your function — its name, what it does, and what inputs it needs:
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}You send a normal chat request, including your function definitions:
messages = [
{"role": "user", "content": "What is the weather in San Francisco?"}
]Instead of text, the model replies with a structured call:
{
"name": "get_current_weather",
"arguments": "{\"location\": \"San Francisco\"}"
}You can now parse this and run your function.
Your application runs the get_current_weather function and returns a real result from your system or an API.
You extend the conversation with the function output and send it back to the model, which completes the answer:
The current temperature in San Francisco, CA is 60 degrees.This pattern lets the model reason independently about when and how to use your logic. Please find the full example, including the code, in our documentation.
To get the most value out of function calling, it’s important to think carefully about how you define and expose your functions to the model. The language model relies heavily on the function descriptions you provide to understand what each function does and when it should be used. Clear, concise, and descriptive explanations significantly improve the model’s ability to select the correct function for a given user request.
Equally important is the structure of your JSON schemas. Well-defined parameter schemas help the model generate valid and usable arguments when invoking a function. Ambiguous or overly permissive schemas can lead to malformed inputs or unexpected behavior, while precise definitions guide the model toward correct and consistent function calls.
When designing your functions, it’s best to keep them small and focused. Functions that try to do too much or handle multiple responsibilities can confuse the model and reduce accuracy. Instead, narrowly scoped functions with a single, clear purpose tend to produce more reliable results and are easier to reason about and maintain.
Model configuration also plays a role in successful function calling. Using a lower temperature setting (typically below 1.0) helps reduce randomness in the model’s output, which is especially important when generating structured data like function arguments. Lower temperature values encourage more deterministic and predictable behavior, improving the overall stability of your application.
Finally, it’s worth being aware of current limitations. For example, nested function calls are not supported in DeepInfra’s current implementation. While the model can decide to call a function, the execution flow must remain linear, with results explicitly passed back to the model between calls.
Once you’re comfortable with the basics of function calling, the possibilities expand quickly. You can build AI agents that orchestrate workflows across multiple services, making decisions and triggering actions based on natural language input. Smart assistants can fetch live data, update systems, or perform operational tasks on behalf of users. Dynamic chatbots can query databases, call internal APIs, and combine multiple data sources to generate meaningful, context-aware responses.
DeepInfra provides the scalable infrastructure needed to run these applications reliably in production. By handling model inference and function calling at scale, DeepInfra allows you to focus on what matters most: designing the logic, integrations, and user experiences that make your product unique.
Function calling is a transformative feature that takes language models from static answer machines to dynamic, interactive components within your software. With DeepInfra’s support for this workflow — and an API that’s compatible with tools you already know — you can seamlessly add intelligent function invocation to your applications.
Whether you’re building assistants, automation tools, or intelligent integrations, function calling opens the door to a new class of AI-powered experiences.
Nemotron 3 Nano Explained: NVIDIA’s Efficient Small LLM and Why It Matters<p>The open-source LLM space has exploded with models competing across size, efficiency, and reasoning capability. But while frontier models dominate headlines with enormous parameter counts, a different category has quietly become essential for real-world deployment: small yet high-performance models optimized for edge devices, private on-prem systems, and cost-sensitive applications. NVIDIA’s Nemotron family brings together open […]</p>
Qwen API Pricing Guide 2026: Max Performance on a Budget<p>If you have been following the AI leaderboards lately, you have likely noticed a new name constantly trading blows with GPT-4o and Claude 3.5 Sonnet: Qwen. Developed by Alibaba Cloud, the Qwen model family (specifically Qwen 2.5 and Qwen 3) has exploded in popularity for one simple reason: unbeatable price-to-performance. In 2025, Qwen is widely […]</p>
© 2026 Deep Infra. All rights reserved.