Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1 cover image
featured
64k
$0.65 / Mtoken
  • text-generation

Zephyr 141B-A35B is an instruction-tuned (assistant) version of Mixtral-8x22B. It was fine-tuned on a mix of publicly available, synthetic datasets. It achieves strong performance on chat benchmarks.

mistralai/Mixtral-8x22B-v0.1 cover image
featured
64k
$0.65 / Mtoken
  • text-generation

Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. This model is not instruction tuned.

google/gemma-1.1-7b-it cover image
featured
8k
$0.13 / Mtoken
  • text-generation

Gemma is an open-source model designed by Google. This is Gemma 1.1 7B (IT), an update over the original instruction-tuned Gemma release. Gemma 1.1 was trained using a novel RLHF method, leading to substantial gains on quality, coding capabilities, factuality, instruction following and multi-turn conversation quality.

databricks/dbrx-instruct cover image
featured
32k
$0.60 / Mtoken
  • text-generation

DBRX is an open source LLM created by Databricks. It uses mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input. It outperforms existing open source LLMs like Llama 2 70B and Mixtral-8x7B on standard industry benchmarks for language understanding, programming, math, and logic.

mistralai/Mixtral-8x7B-Instruct-v0.1 cover image
featured
32k
$0.27 / Mtoken
  • text-generation

Mixtral is mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 7b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks.

mistralai/Mistral-7B-Instruct-v0.2 cover image
featured
32k
$0.13 / Mtoken
  • text-generation

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.2 generative text model using a variety of publicly available conversation datasets.

meta-llama/Llama-2-70b-chat-hf cover image
featured
4k
$0.70/$0.90 in/out Mtoken
  • text-generation

LLaMa 2 is a collections of LLMs trained by Meta. This is the 70B chat optimized version. This endpoint has per token pricing.

cognitivecomputations/dolphin-2.6-mixtral-8x7b cover image
featured
32k
$0.27 / Mtoken
  • text-generation

The Dolphin 2.6 Mixtral 8x7b model is a finetuned version of the Mixtral-8x7b model, trained on a variety of data including coding data, for 3 days on 4 A100 GPUs. It is uncensored and requires trust_remote_code. The model is very obedient and good at coding, but not DPO tuned. The dataset has been filtered for alignment and bias. The model is compliant with user requests and can be used for various purposes such as generating code or engaging in general chat.

lizpreciatior/lzlv_70b_fp16_hf cover image
featured
4k
$0.70/$0.90 in/out Mtoken
  • text-generation

A Mythomax/MLewd_13B-style merge of selected 70B models A multi-model merge of several LLaMA2 70B finetunes for roleplaying and creative work. The goal was to create a model that combines creativity with intelligence for an enhanced experience.

google/gemma-7b-it cover image
featured
8k
$0.13 / Mtoken
  • text-generation

Gemma is an open-source model designed by Google. Gemma 7B is a really strong model, with performance comparable to the best models in the 7B weight, including Mistral 7B. Gemma is provided under and subject to the Gemma Terms of Use found at ai.google.dev/gemma/terms

openchat/openchat_3.5 cover image
featured
8k
$0.13 / Mtoken
  • text-generation

OpenChat is a library of open-source language models that have been fine-tuned with C-RLFT, a strategy inspired by offline reinforcement learning. These models can learn from mixed-quality data without preference labels and have achieved exceptional performance comparable to ChatGPT. The developers of OpenChat are dedicated to creating a high-performance, commercially viable, open-source large language model and are continuously making progress towards this goal.

llava-hf/llava-1.5-7b-hf cover image
featured
4k
$0.34 / Mtoken
  • text-generation

LLaVa is a multimodal model that supports vision and language models combined.

bigcode/starcoder2-15b cover image
featured
16k
$0.40 / Mtoken
  • text-generation

StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages. It specializes in code completion.

DeepInfra/pygmalion-13b-4bit-128g cover image
featured
2k
$0.22 / Mtoken
  • text-generation

A model for fictional writing and entertainment purposes

codellama/CodeLlama-70b-Instruct-hf cover image
featured
4k
$0.70/$0.90 in/out Mtoken
  • text-generation

CodeLlama-70b is the largest and latest code generation from the Code Llama collection.

deepinfra/airoboros-70b cover image
featured
4k
$0.70/$0.90 in/out Mtoken
  • text-generation

Latest version of the Airoboros model fine-tunned version of llama-2-70b using the Airoboros dataset. This model is currently running jondurbin/airoboros-l2-70b-2.2.1

stability-ai/sdxl cover image
featured
$0.0005 / sec
  • text-to-image

SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/) specialized for the final denoising steps. Note that the base model can be used as a standalone module.

meta-llama/Llama-2-7b-chat-hf cover image
featured
4k
$0.13 / Mtoken
  • text-generation

Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.