DeepStart | Production-Ready Machine Learning Models

We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepStart

Deep Infra Startup program

1,000,000,000free tokens*

To qualify your startup need to meet the following

Have raised between 250K and 10M USD

Founded in the last 2 years

* at Mixtral 8x7b prices

Featured models:

What we loved, used and implemented the most last month:

$0.09/$0.45 in/out Mtoken

openai/

gpt-oss-120b

text-generation

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

$0.50/$1.70 in/out Mtoken

zai-org/

GLM-4.5V

text-generation

GLM-4.5V is based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air (106B parameters, 12B active). It continues the technical approach of GLM-4.1V-Thinking, achieving SOTA performance among models of the same scale on 42 public vision-language benchmarks. It covers common tasks such as image, video, and document understanding, as well as GUI agent operations.

$0.04/$0.16 in/out Mtoken

openai/

gpt-oss-20b

text-generation

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

$0.30/$1.20 in/out Mtoken

Qwen/

Qwen3-Coder-480B-A35B-Instruct-Turbo

text-generation

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

$0.55/$2.00 in/out Mtoken

zai-org/

GLM-4.5

text-generation

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

$0.50/$2.00 in/out Mtoken

moonshotai/

Kimi-K2-Instruct

text-generation

Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks.

$0.30/$1.80 in/out Mtoken

allenai/

olmOCR-7B-0725-FP8

text-generation

olmOCR is a specialized AI tool that converts PDF documents into clean, structured text while preserving important formatting and layout information. What makes olmOCR particularly valuable for developers is its ability to handle challenging PDFs that traditional OCR tools struggle with—including complex layouts, poor-quality scans, handwritten text, and documents with mixed content types. Built on a fine-tuned 7B vision-language model, olmOCR provides enterprise-grade PDF processing at a fraction of the cost of proprietary solutions.

$0.13/$0.60 in/out Mtoken

Qwen/

Qwen3-235B-A22B-Thinking-2507

text-generation

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.

$0.40/$1.60 in/out Mtoken

Qwen/

Qwen3-Coder-480B-A35B-Instruct

text-generation

$0.20/$1.10 in/out Mtoken

zai-org/

GLM-4.5-Air

text-generation

$0.00300 / minute

mistralai/

Voxtral-Small-24B-2507

automatic-speech-recognition

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

$0.00100 / minute

mistralai/

Voxtral-Mini-3B-2507

automatic-speech-recognition

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

$1.00/$3.00 in/out Mtoken

deepseek-ai/

DeepSeek-R1-0528-Turbo

text-generation

The DeepSeek R1 0528 turbo model is a state of the art reasoning model that can generate very quick responses

$0.13/$0.60 in/out Mtoken

Qwen/

Qwen3-235B-A22B-Instruct-2507

text-generation

Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.

$0.08/$0.29 in/out Mtoken

Qwen/

Qwen3-30B-A3B

text-generation

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

$0.10/$0.30 in/out Mtoken

Qwen/

Qwen3-32B

text-generation

$0.06/$0.24 in/out Mtoken

Qwen/

Qwen3-14B

text-generation

$1.00/$3.00 in/out Mtoken

deepseek-ai/

DeepSeek-V3-0324-Turbo

text-generation

$0.50 / Mtoken

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-Turbo

text-generation

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts

$0.15/$0.60 in/out Mtoken

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-FP8

text-generation

$0.08/$0.30 in/out Mtoken

meta-llama/

Llama-4-Scout-17B-16E-Instruct

text-generation

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts

$0.50/$2.15 in/out Mtoken

deepseek-ai/

DeepSeek-R1-0528

text-generation

The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528.

$0.28/$0.88 in/out Mtoken

deepseek-ai/

DeepSeek-V3-0324

text-generation

DeepSeek-V3-0324, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token, an improved iteration over DeepSeek-V3.

$0.07/$0.28 in/out Mtoken

mistralai/

Devstral-Small-2507

text-generation

Devstral is an agentic LLM for software engineering tasks, making it a great choice for software engineering agents.

$0.05/$0.10 in/out Mtoken

mistralai/

Mistral-Small-3.2-24B-Instruct-2506

text-generation

Mistral-Small-3.2-24B-Instruct is a drop-in upgrade over the 3.1 release, with markedly better instruction following, roughly half the infinite-generation errors, and a more robust function-calling interface—while otherwise matching or slightly improving on all previous text and vision benchmarks.

$0.18 / Mtoken

meta-llama/

Llama-Guard-4-12B

text-generation

Llama Guard 4 is a natively multimodal safety classifier with 12 billion parameters trained jointly on text and multiple images. Llama Guard 4 is a dense architecture pruned from the Llama 4 Scout pre-trained model and fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It itself acts as an LLM: it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

$0.075/$0.15 in/out Mtoken

Qwen/

QwQ-32B

text-generation

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

$16.50/$82.50 in/out Mtoken

anthropic/

claude-4-opus

text-generation

Anthropic’s most powerful model yet and the state-of-the-art coding model. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Claude Opus 4 is ideal for powering frontier agent products and features.

$3.30/$16.50 in/out Mtoken

anthropic/

claude-4-sonnet

text-generation

Anthropic's mid-size model with superior intelligence for high-volume uses in coding, in-depth research, agents, & more.

$0.21/$1.75 in/out Mtoken

google/

gemini-2.5-flash

text-generation

Gemini 2.5 Flash is Google's latest thinking model, designed to tackle increasingly complex problems. It's capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. Gemini 2.5 Flash: best for balancing reasoning and speed.

$0.875/$7.00 in/out Mtoken

google/

gemini-2.5-pro

text-generation

Gemini 2.5 Pro is Google's the most advanced thinking model, designed to tackle increasingly complex problems. Gemini 2.5 Pro leads common benchmarks by meaningful margins and showcases strong reasoning and code capabilities. Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. The Gemini 2.5 Pro model is now available on DeepInfra.

$0.09/$0.17 in/out Mtoken

google/

gemma-3-27b-it

text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

$0.05/$0.10 in/out Mtoken

google/

gemma-3-12b-it

text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

$0.02/$0.04 in/out Mtoken

google/

gemma-3-4b-it

text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

$0.62 per M characters

hexgrad/

Kokoro-82M

text-to-speech

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

$7.00 per M characters

canopylabs/

orpheus-3b-0.1-ft

text-to-speech

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

$7.00 per M characters

sesame/

csm-1b

text-to-speech

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

$0.10/$0.40 in/out Mtoken

deepseek-ai/

DeepSeek-R1-Distill-Llama-70B

text-generation

DeepSeek-R1-Distill-Llama-70B is a highly efficient language model that leverages knowledge distillation to achieve state-of-the-art performance. This model distills the reasoning patterns of larger models into a smaller, more agile architecture, resulting in exceptional results on benchmarks like AIME 2024, MATH-500, and LiveCodeBench. With 70 billion parameters, DeepSeek-R1-Distill-Llama-70B offers a unique balance of accuracy and efficiency, making it an ideal choice for a wide range of natural language processing tasks.

$0.38/$0.89 in/out Mtoken

deepseek-ai/

DeepSeek-V3

text-generation

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

$0.038/$0.12 in/out Mtoken

meta-llama/

Llama-3.3-70B-Instruct-Turbo

text-generation

Llama 3.3-70B Turbo is a highly optimized version of the Llama 3.3-70B model, utilizing FP8 quantization to deliver significantly faster inference speeds with a minor trade-off in accuracy. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

$0.23/$0.40 in/out Mtoken

meta-llama/

Llama-3.3-70B-Instruct

text-generation

Llama 3.3-70B is a multilingual LLM trained on a massive dataset of 15 trillion tokens, fine-tuned for instruction-following and conversational dialogue. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

$0.07/$0.14 in/out Mtoken

microsoft/

phi-4

text-generation

Phi-4 is a model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

$0.00020 / minute

openai/

whisper-large-v3-turbo

automatic-speech-recognition

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

View all models

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales Get Started

Latest Models

openai/

whisper-tiny

openchat/

openchat_3.5

bigcode/

starcoder2-15b

Phind/

Phind-CodeLlama-34B-v2

Gryphe/

MythoMax-L2-13b

Featured Models

mistralai/

Devstral-Small-2507

Qwen/

Qwen3-14B

zai-org/

GLM-4.5V

canopylabs/

orpheus-3b-0.1-ft

meta-llama/

Llama-4-Scout-17B-16E-Instruct

deepseek-ai/

DeepSeek-R1-Distill-Llama-70B

Company

Pricing

Docs

Compare

DeepStart

About

Careers

Trust Center

Privacy

Terms

Have questions or need a custom solution?

Contact Sales