Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

meta-llama/Meta-Llama-3-70B-Instruct cover image
featured
bfloat16
8k
$0.52/$0.75 in/out Mtoken
  • text-generation

Model Details Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.

google/gemma-2-27b-it cover image
featured
4k
$0.27 / Mtoken
  • text-generation

Gemma is a family of lightweight, state-of-the-art open models from Google. Gemma-2-27B delivers the best performance for its size class, and even offers competitive alternatives to models more than twice its size.

google/gemma-2-9b-it cover image
featured
4k
$0.09 / Mtoken
  • text-generation

Gemma is a family of lightweight, state-of-the-art open models from Google. The 9B Gemma 2 model delivers class-leading performance, outperforming Llama 3 8B and other open models in its size category.

nvidia/Nemotron-4-340B-Instruct cover image
featured
bfloat16
4k
$4.20 / Mtoken
  • text-generation

Nemotron-4-340B-Instruct is a chat model intended for use for the English language, designed for Synthetic Data Generation

Qwen/Qwen2-72B-Instruct cover image
featured
bfloat16
32k
$0.56/$0.77 in/out Mtoken
  • text-generation

The 72 billion parameter Qwen2 excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.

microsoft/Phi-3-medium-4k-instruct cover image
featured
bfloat16
4k
$0.14 / Mtoken
  • text-generation

The Phi-3-Medium-4K-Instruct is a powerful and lightweight language model with 14 billion parameters, trained on high-quality data to excel in instruction following and safety measures. It demonstrates exceptional performance across benchmarks, including common sense, language understanding, and logical reasoning, outperforming models of similar size.

openchat/openchat-3.6-8b cover image
featured
bfloat16
8k
$0.064 / Mtoken
  • text-generation

Openchat 3.6 is a LLama-3-8b fine tune that outperforms it on multiple benchmarks.

mistralai/Mistral-7B-Instruct-v0.3 cover image
featured
bfloat16
32k
$0.06 / Mtoken
  • text-generation

Mistral-7B-Instruct-v0.3 is an instruction-tuned model, next iteration of of Mistral 7B that has larger vocabulary, newer tokenizer and supports function calling.

meta-llama/Meta-Llama-3-8B-Instruct cover image
featured
bfloat16
8k
$0.06 / Mtoken
  • text-generation

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.

mistralai/Mixtral-8x22B-Instruct-v0.1 cover image
featured
bfloat16
64k
$0.65 / Mtoken
  • text-generation

This is the instruction fine-tuned version of Mixtral-8x22B - the latest and largest mixture of experts large language model (LLM) from Mistral AI. This state of the art machine learning model uses a mixture 8 of experts (MoE) 22b models. During inference 2 experts are selected. This architecture allows large models to be fast and cheap at inference.

microsoft/WizardLM-2-8x22B cover image
featured
bfloat16
64k
$0.63 / Mtoken
  • text-generation

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to those leading proprietary models.

microsoft/WizardLM-2-7B cover image
featured
fp16
32k
$0.07 / Mtoken
  • text-generation

WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger open-source leading models

google/gemma-1.1-7b-it cover image
featured
bfloat16
8k
$0.07 / Mtoken
  • text-generation

Gemma is an open-source model designed by Google. This is Gemma 1.1 7B (IT), an update over the original instruction-tuned Gemma release. Gemma 1.1 was trained using a novel RLHF method, leading to substantial gains on quality, coding capabilities, factuality, instruction following and multi-turn conversation quality.

mistralai/Mixtral-8x7B-Instruct-v0.1 cover image
featured
bfloat16
32k
$0.24 / Mtoken
  • text-generation

Mixtral is mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 7b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks.

lizpreciatior/lzlv_70b_fp16_hf cover image
featured
fp16
4k
$0.59/$0.79 in/out Mtoken
  • text-generation

A Mythomax/MLewd_13B-style merge of selected 70B models A multi-model merge of several LLaMA2 70B finetunes for roleplaying and creative work. The goal was to create a model that combines creativity with intelligence for an enhanced experience.

llava-hf/llava-1.5-7b-hf cover image
featured
fp16
4k
$0.34 / Mtoken
  • text-generation

LLaVa is a multimodal model that supports vision and language models combined.

stability-ai/sdxl cover image
featured
$0.0005 / sec
  • text-to-image

SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/) specialized for the final denoising steps. Note that the base model can be used as a standalone module.

openai/whisper-large cover image
featured
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.