We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

FLUX.2 is live! High-fidelity image generation made simple.

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

featured

text-generation

moonshotai/Kimi-K2.5 cover image

Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. It seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms.

$0.07 cached, $0.45 in, $2.25 out / 1M

featured

text-generation

zai-org/GLM-4.7-Flash cover image

GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.

$0.01 cached, $0.06 in, $0.40 out / 1M

featured

text-generation

Nemotron-3-Nano-30B-A3B

nvidia/Nemotron-3-Nano-30B-A3B cover image

NVIDIA Nemotron 3 Nano is an open reasoning model optimized for fast, cost-efficient inference. Built with a hybrid MoE and Mamba architecture and trained on NVIDIA-curated synthetic reasoning data, it delivers strong multi-step reasoning with stable latency and predictable performance for agentic and production workloads.

$0.05 in, $0.20 out / 1M

featured

text-generation

deepseek-ai/DeepSeek-V3.2 cover image

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.

$0.13 cached, $0.26 in, $0.38 out / 1M

featured

Bria/fibo_edit cover image

🥳 For a limited time, Fibo Edit is free on DeepInfra 🥳 YOUR AI, YOUR RULES. Visual Generation for Production-Grade. FIBO Edit. An open-source image editing model with native masking and a lightweight 8B architecture.

featured

Bria/video_eraser cover image

Remove unwanted objects or regions from video using a mask, reconstructs the background with intelligent content-aware fill.

featured

video_foreground_mask

Bria/video_foreground_mask cover image

Automatically identify and segment foreground objects across video frames and generate a mask. No prompts, just a video.

featured

video_increase_resolution

Bria/video_increase_resolution cover image

Increase video resolution up to 8K with advanced AI upscaling. Bring your videos to the big screen, ready for the screens of tomorrow.

featured

video_mask_by_key_points

Bria/video_mask_by_key_points cover image

Identify and segment objects across video frames using specific coordinate points. Just point in the right direction and the model will figure out by itself which object should be masked.

featured

video_mask_by_prompt

Bria/video_mask_by_prompt cover image

Identify and segment objects across video frames using a text prompt. The easiest way to create a mask to modify your videos.

featured

video_remove_background

Bria/video_remove_background cover image

Light and fast. Remove the background of your videos to bring the foreground elements to focus. No more unwanted distractions.

featured

PrunaAI/p-image cover image

P-Image is a state-of-the-art real-time generation model with exceptional text rendering, fine-detail accuracy, and rock-solid prompt adherence. It’s built for instant creativity at high-fidelity images in about one second at a fraction of typical model costs.

featured

PrunaAI/p-image-Edit cover image

P-Image-Edit is a high-precision image editing model that applies complex transformations, insertions, removals, and style adjustments in under a second. It delivers state-of-the-art accuracy, clean boundaries, and reliable prompt alignment, making multi-step edits fast, consistent, and production-ready.

featured

bosonai/HiggsAudioV2.5 cover image

HiggsAudioV2.5 is a high-quality neural text-to-speech (TTS) model designed for natural-sounding voice generation across a wide range of use cases. It focuses on clarity, stable prosody, and consistent pacing, making it suitable for both short prompts and longer narration.

$20.00 per 1M characters

featured

chatterbox-turbo

ResembleAI/chatterbox-turbo cover image

Chatterbox is a family of three state-of-the-art, open-source text-to-speech models by Resemble AI. We are excited to introduce Chatterbox-Turbo, our most efficient model yet. Built on a streamlined 350M parameter architecture, Turbo delivers high-quality speech with less compute and VRAM than our previous models. We have also distilled the speech-token-to-mel decoder, previously a bottleneck, reducing generation from 10 steps to just one, while retaining high-fidelity audio output. Paralinguistic tags are now native to the Turbo model, allowing you to use [cough], [laugh], [chuckle], and more to add distinct realism. While Turbo was built primarily for low-latency voice agents, it excels at narration and creative workflows. If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link).

$1.00 per 1M characters

featured

black-forest-labs/

FLUX-2-klein-4b

black-forest-labs/FLUX-2-klein-4b cover image

The fastest model of the Flux 2 family. Frontier visual intelligence — state-of-the-art image generation and editing from Black Forest Labs

$0.014 x (width / 1024) x (height / 1024)

featured

black-forest-labs/

FLUX-2-klein-9b

black-forest-labs/FLUX-2-klein-9b cover image

The best quality-to-latency ratio, production apps model of the Flux 2 family. Frontier visual intelligence — state-of-the-art image generation and editing from Black Forest Labs

$0.015 x (width / 1024) x (height / 1024)

bge-base-en-v1.5

BAAI/bge-base-en-v1.5 cover image

BGE embedding is a general Embedding Model. It is pre-trained using retromae and trained on large-scale pair data using contrastive learning. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned

$0.005 / 1M tokens

BAAI/bge-en-icl cover image

A LLM-based embedding model with in-context learning capabilities that achieves SOTA performance on BEIR and AIR-Bench. It leverages few-shot examples to enhance task performance.

$0.010 / 1M tokens

bge-large-en-v1.5

BAAI/bge-large-en-v1.5 cover image

BGE embedding is a general Embedding Model. It is pre-trained using retromae and trained on large-scale pair data using contrastive learning. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned

$0.010 / 1M tokens

BAAI/bge-m3 cover image

BGE-M3 is a versatile text embedding model that supports multi-functionality, multi-linguality, and multi-granularity, allowing it to perform dense retrieval, multi-vector retrieval, and sparse retrieval in over 100 languages and with input sizes up to 8192 tokens. The model can be used in a retrieval pipeline with hybrid retrieval and re-ranking to achieve higher accuracy and stronger generalization capabilities. BGE-M3 has shown state-of-the-art performance on several benchmarks, including MKQA, MLDR, and NarritiveQA, and can be used as a drop-in replacement for other embedding models like DPR and BGE-v1.5.

$0.010 / 1M tokens

BAAI/bge-m3-multi cover image

BGE-M3 is a multilingual text embedding model developed by BAAI, distinguished by its Multi-Linguality (supporting 100+ languages), Multi-Functionality (unified dense, multi-vector, and sparse retrieval), and Multi-Granularity (handling inputs from short queries to 8192-token documents). It achieves state-of-the-art retrieval performance across diverse benchmarks while maintaining a single model for multiple retrieval modes.

$0.010 / 1M tokens

Bria/Bria-3.2 cover image

Bria 3.2 is the next-generation commercial-ready text-to-image model. With just 4 billion parameters, it provides exceptional aesthetics and text rendering, evaluated to be on par to leading open-source models, and outperforming other licensed models.

Bria-3.2-vector

Bria/Bria-3.2-vector cover image

Bria 3.2 is the next-generation commercial-ready text-to-image model. With just 4 billion parameters, it provides exceptional aesthetics and text rendering, evaluated to be on par to leading open-source models, and outperforming other licensed models.

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

anthropic/claude-3-7-sonnet-latest deepseek-ai/DeepSeek-V3.1 deepseek-ai/DeepSeek-V3.2-Exp moonshotai/Kimi-K2-Instruct-0905 zai-org/GLM-4.6

Featured Models

Bria/video_foreground_mask deepseek-ai/DeepSeek-V3.2 zai-org/GLM-4.7-Flash moonshotai/Kimi-K2.5 black-forest-labs/FLUX-2-klein-4b

Built With Love in Palo Alto

© 2026 Deep Infra. All rights reserved.

Privacy Policy Terms of Service