Function Calling in DeepInfra: Extend Your AI with Real-World Logic

We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!

Published on 2026.02.02 by DeepInfra

Modern large language models (LLMs) are incredibly powerful at understanding and generating text, but until recently they were largely static: they could only respond based on patterns in their training data. Function calling changes that. It lets language models interact with external logic — your own code, APIs, utilities, or business systems — while still serving answers in natural language.

DeepInfra now supports function calling across its inference APIs, enabling developers to build intelligent applications that go far beyond static responses. In this post, we’ll break down what function calling is, how it works with DeepInfra, and how you can start using it today.

What Is Function Calling?

At a high level, function calling lets language models decide when and how to call real functions you define — for example, a function that fetches real-time weather, retrieves a user profile, triggers an action in your system, or performs any external computation your application needs.

Traditionally, LLMs only output text. With function calling:

The model detects when a query would be best answered with an external action.
Instead of replying in natural language, the model returns a structured function call with the function name and arguments.
Your application executes the function.
You send the result back to the model.
The model uses the result to generate its final output.

This enables powerful AI workflows that combine natural language reasoning with real-world logic — without you having to hard-code responses.

Why Function Calling Matters

Function calling unlocks a new class of AI applications:

Real-time data: Get live prices, weather, stock quotes, or system status without hard-coding knowledge into your model.
Actionable assistants: Build AI that doesn’t just say what to do — it does things (e.g., sending emails, scheduling tasks, triggering workflows).
System integration: Connect to your backend systems, databases, or services directly through natural language prompts.
Custom business logic: Let the model reason about how and when to call your logic — reducing boilerplate application code.

It’s a bridge from static text responses to dynamic, intelligent behaviour.

How DeepInfra Implements Function Calling

DeepInfra provides function calling support as part of its inference endpoints with an OpenAI-compatible API. You simply supply:

Function definitions — name, description, and a JSON schema for parameters.
User messages — the prompt or conversation.
Optional tool settings — guiding the model about how and when to use functions.

When you send these to the DeepInfra API, the model may return a function call response instead of normal text if it decides a function should be invoked.

A Concrete Example: Weather Lookup

Here’s a simplified example of how function calling works once set up with DeepInfra.

Define the Function

You tell the model about your function — its name, what it does, and what inputs it needs:

{

  "type": "function",

  "function": {

    "name": "get_current_weather",

    "description": "Get the current weather in a given location",

    "parameters": {

      "type": "object",

      "properties": {

        "location": {

          "type": "string",

          "description": "The city and state, e.g. San Francisco, CA"

        }

      },

      "required": ["location"]

    }

  }

}copy

Send a User Query

You send a normal chat request, including your function definitions:

messages = [

    {"role": "user", "content": "What is the weather in San Francisco?"}

]copy

Model Returns a Function Call

Instead of text, the model replies with a structured call:

{

  "name": "get_current_weather",

  "arguments": "{\"location\": \"San Francisco\"}"

}copy

You can now parse this and run your function.

Execute the Function

Your application runs the get_current_weather function and returns a real result from your system or an API.

Feed the Result Back

You extend the conversation with the function output and send it back to the model, which completes the answer:

The current temperature in San Francisco, CA is 60 degrees.copy

This pattern lets the model reason independently about when and how to use your logic. Please find the full example, including the code, in our documentation.

Tips for Effective Function Calling

To get the most value out of function calling, it’s important to think carefully about how you define and expose your functions to the model. The language model relies heavily on the function descriptions you provide to understand what each function does and when it should be used. Clear, concise, and descriptive explanations significantly improve the model’s ability to select the correct function for a given user request.

Equally important is the structure of your JSON schemas. Well-defined parameter schemas help the model generate valid and usable arguments when invoking a function. Ambiguous or overly permissive schemas can lead to malformed inputs or unexpected behavior, while precise definitions guide the model toward correct and consistent function calls.

When designing your functions, it’s best to keep them small and focused. Functions that try to do too much or handle multiple responsibilities can confuse the model and reduce accuracy. Instead, narrowly scoped functions with a single, clear purpose tend to produce more reliable results and are easier to reason about and maintain.

Model configuration also plays a role in successful function calling. Using a lower temperature setting (typically below 1.0) helps reduce randomness in the model’s output, which is especially important when generating structured data like function arguments. Lower temperature values encourage more deterministic and predictable behavior, improving the overall stability of your application.

Finally, it’s worth being aware of current limitations. For example, nested function calls are not supported in DeepInfra’s current implementation. While the model can decide to call a function, the execution flow must remain linear, with results explicitly passed back to the model between calls.

Putting It Into Practice

Once you’re comfortable with the basics of function calling, the possibilities expand quickly. You can build AI agents that orchestrate workflows across multiple services, making decisions and triggering actions based on natural language input. Smart assistants can fetch live data, update systems, or perform operational tasks on behalf of users. Dynamic chatbots can query databases, call internal APIs, and combine multiple data sources to generate meaningful, context-aware responses.

DeepInfra provides the scalable infrastructure needed to run these applications reliably in production. By handling model inference and function calling at scale, DeepInfra allows you to focus on what matters most: designing the logic, integrations, and user experiences that make your product unique.

Final Thoughts

Function calling is a transformative feature that takes language models from static answer machines to dynamic, interactive components within your software. With DeepInfra’s support for this workflow — and an API that’s compatible with tools you already know — you can seamlessly add intelligent function invocation to your applications.

Whether you’re building assistants, automation tools, or intelligent integrations, function calling opens the door to a new class of AI-powered experiences.

Juggernaut FLUX is live on DeepInfra!Juggernaut FLUX is live on DeepInfra! At DeepInfra, we care about one thing above all: making cutting-edge AI models accessible. Today, we're excited to release the most downloaded model to our platform. Whether you're a visual artist, developer, or building an app that relies on high-fidelity ...

Seed Anchoring and Parameter Tweaking with SDXL Turbo: Create Stunning Cubist ArtIn this blog post, we're going to explore how to create stunning cubist art using SDXL Turbo using some advanced image generation techniques.

From Precision to Quantization: A Practical Guide to Faster, Cheaper LLMs<p>Large language models live and die by numbers—literally trillions of them. How finely we store those numbers (their precision) determines how much memory a model needs, how fast it runs, and sometimes how good its answers are. This article walks from the basics to the deep end: we’ll start with how computers even store a […]</p>

View all