We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

Viewing all

featured

text-generation

automatic-speech-recognition

text-to-speech

embeddings

text-to-video

text-to-image

reranker

zero-shot-image-classification

multimodal

Category/text-generation

Text generation AI models can generate coherent and natural-sounding human language text, making them useful for a variety of applications from language translation to content creation.

There are several types of text generation AI models, including rule-based, statistical, and neural models. Neural models, and in particular transformer-based models like GPT, have achieved state-of-the-art results in text generation tasks. These models use artificial neural networks to analyze large text corpora and learn the patterns and structures of language.

While text generation AI models offer many exciting possibilities, they also present some challenges. For example, it's essential to ensure that the generated text is ethical, unbiased, and accurate, to avoid potential harm or negative consequences.

fp8

40k

$0.13/$0.60 in/out Mtoken

Qwen/

Qwen3-235B-A22B

text-generation

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

fp8

Replaced

Sao10K/

L3-70B-Euryale-v2.1

text-generation

Euryale 70B v2.1 is a model focused on creative roleplay from Sao10k

bfloat16

Replaced

Sao10K/

L3-8B-Lunaris-v1

text-generation

A generalist / roleplaying model merge based on Llama 3. Sao10K has carefully selected the values based on extensive personal experimentation and has fine-tuned them to create a customized recipe.

fp8

$0.02/$0.05 in/out Mtoken

Sao10K/

L3-8B-Lunaris-v1-Turbo

text-generation

Sao10K/L3.1-70B-Euryale-v2.2 cover image

fp8

128k

$0.65/$0.75 in/out Mtoken

Sao10K/

L3.1-70B-Euryale-v2.2

text-generation

Euryale 3.1 - 70B v2.2 is a model focused on creative roleplay from Sao10k

Sao10K/L3.3-70B-Euryale-v2.3 cover image

fp8

128k

$0.65/$0.75 in/out Mtoken

Sao10K/

L3.3-70B-Euryale-v2.3

text-generation

L3.3-70B-Euryale-v2.3 is a model focused on creative roleplay from Sao10k

195k

$3.30/$16.50 in/out Mtoken

anthropic/

claude-3-7-sonnet-latest

text-generation

bigcode/starcoder2-15b-instruct-v0.1 cover image

fp16

Replaced

bigcode/

starcoder2-15b-instruct-v0.1

text-generation

We introduce StarCoder2-15B-Instruct-v0.1, the very first entirely self-aligned code Large Language Model (LLM) trained with a fully permissive and transparent pipeline. Our open-source pipeline uses StarCoder2-15B to generate thousands of instruction-response pairs, which are then used to fine-tune StarCoder-15B itself without any human annotations or distilled data from huge and proprietary LLMs.

cognitivecomputations/dolphin-2.6-mixtral-8x7b cover image

bfloat16

32k

Replaced

cognitivecomputations/

dolphin-2.6-mixtral-8x7b

text-generation

The Dolphin 2.6 Mixtral 8x7b model is a finetuned version of the Mixtral-8x7b model, trained on a variety of data including coding data, for 3 days on 4 A100 GPUs. It is uncensored and requires trust_remote_code. The model is very obedient and good at coding, but not DPO tuned. The dataset has been filtered for alignment and bias. The model is compliant with user requests and can be used for various purposes such as generating code or engaging in general chat.

cognitivecomputations/dolphin-2.9.1-llama-3-70b cover image

bfloat16

Replaced

cognitivecomputations/

dolphin-2.9.1-llama-3-70b

text-generation

Dolphin 2.9.1, a fine-tuned Llama-3-70b model. The new model, trained on filtered data, is more compliant but uncensored. It demonstrates improvements in instruction, conversation, coding, and function calling abilities.

fp16

Replaced

deepinfra/

airoboros-70b

text-generation

Latest version of the Airoboros model fine-tunned version of llama-2-70b using the Airoboros dataset. This model is currently running jondurbin/airoboros-l2-70b-2.2.1

deepseek-ai/DeepSeek-Prover-V2-671B cover image

fp8

160k

Replaced

deepseek-ai/

DeepSeek-Prover-V2-671B

text-generation

DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning.

fp4

160k

$0.45/$2.15 in/out Mtoken

deepseek-ai/

DeepSeek-R1

text-generation

We introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B cover image

fp8

128k

$0.075/$0.15 in/out Mtoken

deepseek-ai/

DeepSeek-R1-Distill-Qwen-32B

text-generation

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: AIME 2024: 72.6 | MATH-500: 94.3 | CodeForces Rating: 1691.

deepseek-ai/DeepSeek-R1-Turbo cover image

fp4

160k

$1.00/$3.00 in/out Mtoken

deepseek-ai/

DeepSeek-R1-Turbo

text-generation

We introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

fp16

Replaced

google/

codegemma-7b-it

text-generation

CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models and are available as a 7 billion pretrained variant that specializes in code completion and code generation tasks, a 7 billion parameter instruction-tuned variant for code chat and instruction following and a 2 billion parameter pretrained variant for fast code completion.

976k

Deprecated

google/

gemini-1.5-flash

text-generation

Gemini 1.5 Flash is Google's foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter.