We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

The easiest way to build AI applications with Llama 2 LLMs.
Published on 2023.08.02 by Nikola Borisov
The easiest way to build AI applications with Llama 2 LLMs.

The long awaited Llama 2 models are finally here! We are excited to show you how to use them with DeepInfra. These collection of models represent the state of the art in open source language models. They are made available by Meta AI and the license allows you to use them for commercial purposes. So now is the time to build your next AI application with Llama 2 hosted by DeepInfra, and save a ton of money compared to OpenAI's API.

Picking the right model

There are 3 different sizes of Llama 2 models as well as chat variants of each size:

Depending on the application you are building, you might want to use a different model. Smaller models are faster and cheaper to run, per token generated. Larger models take longer to run and cost more per token generated, but they are more accurate.

Getting started

Simply create an account on DeepInfra and get yourself an API Key.

# set the API key as an environment variable
AUTH_TOKEN=<your-api-key>
copy

Each model has a detailed API documentation page that will guide you through the process of using it. For example, here is the API documentation for the llama-2-7b-chat model.

Running inference

Making an inference request is as easy as making a POST request to our API.

curl -X POST \
    -d '{"input": "Who is Bill Clinton?"}'  \
    -H "Authorization: bearer $AUTH_TOKEN"  \
    -H 'Content-Type: application/json'  \
    'https://api.deepinfra.com/v1/inference/meta-llama/Llama-2-7b-chat-hf'
copy

And you will get output like this:

{
   "inference_status" : {
      "cost" : 0.00454849982634187,
      "runtime_ms" : 9097,
      "status" : "succeeded"
   },
   "request_id" : "RKQsJyO5n7ZLif------",
   "results" : [
      {
         "generated_text" : "Who is Bill Clinton?\n\nAnswer: Bill Clinton is an American politician who served as the 42nd President of the United States from 1993 to 2001. He was born on August 19, 1946, in Hope, Arkansas, and grew up in a poor family. Clinton graduated from Georgetown University and received a Rhodes Scholarship to study at Oxford University. He later attended Yale Law School and became a professor of law at the University of Arkansas.\n\nClinton entered politics in the 1970s and served as Attorney General of Arkansas from 1979 to 1981. He was elected Governor of Arkansas in 1982 and served four terms, from 1983 to 1992. In 1992, Clinton was elected President of the United States, defeating incumbent President George H.W. Bush.\n\nDuring his presidency, Clinton implemented several notable policies, including the Don't Ask, Don't Tell Repeal Act, which allowed LGBT individuals to serve openly in the military, and the North American Free"
      }
   ]
}
copy

It is easy to build AI applications with Llama 2 models hosted by DeepInfra.

If you need any help, just reach out to us on our Discord server.

Related articles
Open-Source vs Closed-Source AI Models: Is the Gap Worth It?Open-Source vs Closed-Source AI Models: Is the Gap Worth It?<p>The Artificial Analysis Intelligence Index sits at a ceiling of 57. Three frontier models — Claude Opus 4.7, Gemini 3.1 Pro Preview, and GPT-5.5 — all land in that band. Meanwhile, four open-weight models released between February and April 2026 now score 50 or above on the same index. A year ago, the best open-weight [&hellip;]</p>
Kimi K2.5 API Benchmarks: Latency, Throughput & CostKimi K2.5 API Benchmarks: Latency, Throughput & Cost<p>About Kimi K2.5 Kimi K2.5 is Moonshot AI&#8217;s flagship open-source reasoning model, released in January 2026. It is a native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens. The model features a Mixture-of-Experts (MoE) architecture with 1 trillion total parameters and 32 billion activated parameters. Kimi K2.5 [&hellip;]</p>
DeepInfra Launches Access to NVIDIA Nemotron Models for Vision, Retrieval, and AI SafetyDeepInfra Launches Access to NVIDIA Nemotron Models for Vision, Retrieval, and AI SafetyDeepInfra is serving the new, open NVIDIA Nemotron vision language and OCR AI models from day zero of their release. As a leading inference provider committed to performance and cost-efficiency, we're making these cutting-edge models available at the industry's best prices, empowering developers to build specialized AI agents without compromising on budget or performance.