Developed by Meta, Llama (Large Language Model Meta AI) is a family of state-of-the-art open-weight models designed for efficiency and performance. The latest versions feature Mixture-of-Experts (MoE) architectures, enabling cost-effective inference by dynamically activating subsets of parameters. With support for multimodal inputs (text + images) and extended context windows (up to 10M tokens), Llama excels in tasks like code generation, multilingual understanding, and long-form reasoning. The models support FP8 quantization and batch inference, ensuring low-latency, high-throughput performance for production workloads. With permissive licensing and robust tooling (e.g., Llama Guard for safety), Llama is ideal for developers seeking powerful, customizable AI with minimal overhead.
The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts
Price per 1M input tokens
$0.08
Price per 1M output tokens
$0.30
Release Date
04/5/2025
Context Size
327,680
Quantization
bfloat16
# Assume openai>=1.0.0
from openai import OpenAI
# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
chat_completion = openai.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": "Hello"}],
)
print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)
# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts
Price per 1M input tokens
$0.50
Price per 1M output tokens
$0.50
Release Date
05/16/2025
Context Size
8,192
Quantization
fp8
# Assume openai>=1.0.0
from openai import OpenAI
# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
api_key="$DEEPINFRA_TOKEN",
base_url="https://api.deepinfra.com/v1/openai",
)
chat_completion = openai.chat.completions.create(
model="meta-llama/Llama-4-Maverick-17B-128E-Instruct-Turbo",
messages=[{"role": "user", "content": "Hello"}],
)
print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)
# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences.
Model | Context | $ per 1M input tokens | $ per 1M output tokens | Actions |
---|---|---|---|---|
Llama-4-Scout-17B-16E | 320k | $0.08 | $0.30 | |
Llama-4-Maverick-17B-128E | 1024k | $0.15 | $0.60 | |
Llama-4-Maverick-17B-128E-Turbo | 8k | $0.50 | $0.50 | |
Llama-Guard-4-12B | 160k | $0.05 | $0.05 |
Meta Llama 3 are a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes.
Model | Context | $ per 1M input tokens | $ per 1M output tokens | Actions |
---|---|---|---|---|
Llama-3.3-70B-Instruct | 128k | $0.23 | $0.40 | |
Llama-3.3-70B-Instruct-Turbo | 128k | $0.038 | $0.12 | |
Llama-3.2-11B-Vision-Instruct | 128k | $0.049 | $0.049 | |
Llama-3.2-3B-Instruct | 128k | $0.003 | $0.006 | |
Llama-3.2-1B-Instruct | 128k | $0.005 | $0.01 | |
Meta-Llama-3.1-405B-Instruct | 32k | $0.80 | $0.80 | |
Meta-Llama-3.1-70B-Instruct | 128k | $0.23 | $0.40 | |
Meta-Llama-3.1-70B-Instruct-Turbo | 128k | $0.10 | $0.28 | |
Meta-Llama-3.1-8B-Instruct | 128k | $0.03 | $0.05 | |
Meta-Llama-3.1-8B-Instruct-Turbo | 128k | $0.015 | $0.02 | |
Meta-Llama-3-70B-Instruct | 8k | $0.30 | $0.40 | |
Meta-Llama-3-8B-Instruct | 8k | $0.03 | $0.06 |