We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

featured

text-generation

deepseek-ai/DeepSeek-OCR cover image

DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping. DeepSeek-OCR consists of two components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. Specifically, DeepEncoder serves as the core engine, designed to maintain low activations under high-resolution input while achieving high compression ratios to ensure an optimal and manageable number of vision tokens. Experiments show that when the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10x), the model can achieve decoding (OCR) precision of 97%. Even at a compression ratio of 20x, the OCR accuracy still remains at about 60%. This shows considerable promise for research areas such as historical long-context compression and memory forgetting mechanisms in LLMs.

$0.03 in, $0.10 out / 1M

featured

text-generation

olmOCR-2-7B-1025

allenai/olmOCR-2-7B-1025 cover image

olmOCR is a specialized AI tool that converts PDF documents into clean, structured text while preserving important formatting and layout information. What makes olmOCR particularly valuable for developers is its ability to handle challenging PDFs that traditional OCR tools struggle with—including complex layouts, poor-quality scans, handwritten text, and documents with mixed content types. Built on a fine-tuned 7B vision-language model, olmOCR provides enterprise-grade PDF processing at a fraction of the cost of proprietary solutions.

$0.09 in, $0.19 out / 1M

featured

text-generation

PaddleOCR-VL-0.9B

PaddlePaddle/PaddleOCR-VL-0.9B cover image

PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios.

$0.14 in, $0.80 out / 1M

featured

text-generation

Qwen3-Coder-30B-A3B-Instruct

Qwen/Qwen3-Coder-30B-A3B-Instruct cover image

Qwen3-Coder-30B-A3B-Instruct is a high-performance code generation model optimized for agentic coding and complex programming tasks. With 30.5B total parameters and 3.3B activated through Mixture-of-Experts architecture, it delivers exceptional efficiency. The model features native support for 256K token context (extendable to 1M), making it ideal for repository-scale code understanding. It excels at tool calling, browser automation, and multi-step coding workflows.

$0.07 in, $0.26 out / 1M

featured

text-generation

zai-org/GLM-4.6 cover image

Compared with GLM-4.5, GLM-4.6 brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

$0.11 cached, $0.45 in, $1.90 out / 1M

featured

text-generation

DeepSeek-V3.2-Exp

deepseek-ai/DeepSeek-V3.2-Exp cover image

DeepSeek-V3.2-Exp is an intermediate step toward the next-generation architecture of the DeepSeek models by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.

$0.27 in, $0.40 out / 1M

featured

text-generation

DeepSeek-V3.1-Terminus

deepseek-ai/DeepSeek-V3.1-Terminus cover image

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

$0.216 cached, $0.27 in, $1.00 out / 1M

featured

text-generation

Qwen3-Next-80B-A3B-Instruct

Qwen/Qwen3-Next-80B-A3B-Instruct cover image

Over the past few months, we have observed increasingly clear trends toward scaling both total parameters and context lengths in the pursuit of more powerful and agentic artificial intelligence (AI). We are excited to share our latest advancements in addressing these demands, centered on improving scaling efficiency through innovative model architecture. We call this next-generation foundation models Qwen3-Next.

$0.14 in, $1.10 out / 1M

featured

text-generation

Kimi-K2-Instruct-0905

moonshotai/Kimi-K2-Instruct-0905 cover image

Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.

$0.40 cached, $0.50 in, $2.00 out / 1M

featured

text-generation

deepseek-ai/DeepSeek-V3.1 cover image

DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.

$0.216 cached, $0.27 in, $1.00 out / 1M

featured

text-generation

openai/gpt-oss-120b cover image

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

$0.05 in, $0.24 out / 1M

featured

text-generation

openai/gpt-oss-20b cover image

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

$0.03 in, $0.14 out / 1M

featured

text-generation

Qwen3-Coder-480B-A35B-Instruct-Turbo

Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo cover image

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

$0.29 in, $1.20 out / 1M

featured

text-generation

Qwen3-235B-A22B-Thinking-2507

Qwen/Qwen3-235B-A22B-Thinking-2507 cover image

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.

$0.30 in, $2.90 out / 1M

featured

text-generation

Qwen3-Coder-480B-A35B-Instruct

Qwen/Qwen3-Coder-480B-A35B-Instruct cover image

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

$0.40 in, $1.60 out / 1M

featured

speech-recognition

Voxtral-Small-24B-2507

mistralai/Voxtral-Small-24B-2507 cover image

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

$0.00300 / minute

featured

speech-recognition

Voxtral-Mini-3B-2507

mistralai/Voxtral-Mini-3B-2507 cover image

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

$0.00100 / minute

featured

text-generation

DeepSeek-R1-0528-Turbo

deepseek-ai/DeepSeek-R1-0528-Turbo cover image

The DeepSeek R1 0528 turbo model is a state of the art reasoning model that can generate very quick responses

$1.00 in, $3.00 out / 1M

featured

text-generation

Qwen3-235B-A22B-Instruct-2507

Qwen/Qwen3-235B-A22B-Instruct-2507 cover image

Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.

$0.09 in, $0.57 out / 1M

featured

text-generation

Qwen/Qwen3-30B-A3B cover image

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

$0.08 in, $0.29 out / 1M

featured

text-generation

Qwen/Qwen3-32B cover image

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

$0.10 in, $0.28 out / 1M

featured

text-generation

Qwen/Qwen3-14B cover image

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

$0.06 in, $0.24 out / 1M

featured

text-generation

Llama-4-Maverick-17B-128E-Instruct-FP8

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 cover image

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts

$0.15 in, $0.60 out / 1M

featured

text-generation

Llama-4-Scout-17B-16E-Instruct

meta-llama/Llama-4-Scout-17B-16E-Instruct cover image

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts

$0.08 in, $0.30 out / 1M

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

deepseek-ai/DeepSeek-V3.2-Exp zai-org/GLM-4.6 anthropic/claude-3-7-sonnet-latest moonshotai/Kimi-K2-Instruct-0905 deepseek-ai/DeepSeek-V3.1

Featured Models

mistralai/Voxtral-Small-24B-2507 meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 deepseek-ai/DeepSeek-R1-Distill-Llama-70B Qwen/Qwen3-30B-A3B canopylabs/orpheus-3b-0.1-ft

Built With Love in Palo Alto

© 2025 Deep Infra. All rights reserved.

Privacy Policy Terms of Service