DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

In this blog post, we will query for the details of a recently released expansion pack for Elden Ring, a critically acclaimed game released in 2022, using the Tavily tool with the ChatDeepInfra model.
Using this boilerplate, one can automate the process of searching for information with well-written responses. This is a great way to create a chatbot that can interact with users and provide them with the information they need.
First, let's create a virtual environment and activate it:
python3 -m venv venv
source venv/bin/activate
Next, install the required packages:
pip install python-dotenv langchain langchain-community
Before we start, we need to load our DeepInfra API key and Tavily API key. You can get your DeepInfra API key from here. After obtaining the API key, create a .env file in the root directory of your project and add the following line:
DEEPINFRA_API_TOKEN=YOUR_DEEPINFRA_API_KEY
TAVILY_API_KEY=YOUR_TAVILY_API_KEY
After installing the required packages and setting up the environment, we can create a LangChain agent that uses the ChatDeepInfra model and the Tavily tool to search for information on the web. The ChatDeepInfra model is a powerful conversational model that can generate human-like responses to user queries. The Tavily tool allows us to search the web for information and retrieve the search results.
Here's the complete Python script to create and run a LangChain agent using the ChatDeepInfra model:
from dotenv import load_dotenv, find_dotenv
from langchain_community.chat_models import ChatDeepInfra
_ = load_dotenv(find_dotenv())
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.prompts import ChatPromptTemplate
model_name = "meta-llama/Meta-Llama-3-70B-Instruct"
if __name__ == "__main__":
tools = [TavilySearchResults(max_results=1)]
llm = ChatDeepInfra(
model = model_name
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant. Make sure to use the tavily_search_results_json tool for information.",
),
("placeholder", "{chat_history}"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
]
)
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, stream_runnable=False)
question = "Why is the hype for Shadow of the Erdtree so high?"
result = agent_executor.invoke({"input": question})
print(result["output"])
# According to the search results, the new DLC for Elden Ring is called "Shadow of the Erdtree".
One crucial point is to use stream_runnable=False in the AgentExecutor.
Stage is yours now! You can futher extend the agent to include more tools and models to improve your workflows which will be topic for another blog post.
Stay tuned for more updates and happy coding!
Kimi K2.6 Pricing Guide 2026: Compare Costs & Deployment Strategies<p>Kimi K2.6 matters because it sits in a rare spot: open weights, broad provider availability, and a real spread in pricing and runtime performance depending on where you buy it. Artificial Analysis tracks the model across nine API providers, with blended pricing ranging from $1.15 to $2.15 per 1M tokens and major differences in throughput […]</p>
Best OpenClaw Alternatives: Hermes Agent, ZeroClaw & NemoClaw<p>OpenClaw has 362,000 GitHub stars and a skill marketplace with over 44,000 community contributions. That kind of adoption doesn’t happen by accident. Still, the same teams running it in production keep running into the same complaint: the model list is fixed. OpenClaw’s guided setup wizard covers OpenAI, Anthropic, Google, DeepSeek, and local Ollama. You can […]</p>
Qwen3.5 0.8B API Benchmarks: Latency, Throughput & Cost<p>About Qwen3.5 0.8B (Reasoning) Qwen3.5 0.8B is part of Alibaba Cloud’s Qwen3.5 Small Model Series, released on March 2, 2026. Designed under the philosophy of “More Intelligence, Less Compute,” it targets edge devices, mobile phones, and low-latency applications where battery life and memory constraints are critical. It employs an Efficient Hybrid Architecture combining Gated Delta […]</p>
© 2026 DeepInfra. All rights reserved.