Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
​Search

Category/all

deepinfra/airoboros-70b cover image
fp16
4k
Replaced
  • text-generation

Latest version of the Airoboros model fine-tunned version of llama-2-70b using the Airoboros dataset. This model is currently running jondurbin/airoboros-l2-70b-2.2.1

google/codegemma-7b-it cover image
fp16
8k
Replaced
  • text-generation

CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models and are available as a 7 billion pretrained variant that specializes in code completion and code generation tasks, a 7 billion parameter instruction-tuned variant for code chat and instruction following and a 2 billion parameter pretrained variant for fast code completion.

google/gemma-1.1-7b-it cover image
bfloat16
8k
Replaced
  • text-generation

Gemma is an open-source model designed by Google. This is Gemma 1.1 7B (IT), an update over the original instruction-tuned Gemma release. Gemma 1.1 was trained using a novel RLHF method, leading to substantial gains on quality, coding capabilities, factuality, instruction following and multi-turn conversation quality.

intfloat/e5-base-v2 cover image
512
$0.005 / Mtoken
  • embeddings

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Model has 24 layers and 1024 out dim.

intfloat/e5-large-v2 cover image
512
$0.010 / Mtoken
  • embeddings

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Model has 24 layers and 1024 out dim.

intfloat/multilingual-e5-large cover image
fp32
512
$0.010 / Mtoken
  • embeddings

The Multilingual-E5-large model is a 24-layer text embedding model with an embedding size of 1024, trained on a mixture of multilingual datasets and supporting 100 languages. The model achieves state-of-the-art results on the Mr. TyDi benchmark, outperforming other models such as BM25 and mDPR. The model is intended for use in text retrieval and semantic similarity tasks, and should be used with the "query: " and "passage: " prefixes for input texts to achieve optimal performance.

mattshumer/Reflection-Llama-3.1-70B cover image
bfloat16
8k
Replaced
  • text-generation

Reflection Llama-3.1 70B is trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course. The model was trained on synthetic data.

meta-llama/Llama-2-13b-chat-hf cover image
fp16
4k
Replaced
  • text-generation

Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

meta-llama/Llama-2-70b-chat-hf cover image
fp16
4k
Replaced
  • text-generation

LLaMa 2 is a collections of LLMs trained by Meta. This is the 70B chat optimized version. This endpoint has per token pricing.

meta-llama/Llama-2-7b-chat-hf cover image
fp16
4k
Replaced
  • text-generation

Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

microsoft/Phi-3-medium-4k-instruct cover image
bfloat16
4k
Replaced
  • text-generation

The Phi-3-Medium-4K-Instruct is a powerful and lightweight language model with 14 billion parameters, trained on high-quality data to excel in instruction following and safety measures. It demonstrates exceptional performance across benchmarks, including common sense, language understanding, and logical reasoning, outperforming models of similar size.

mistralai/Mistral-7B-Instruct-v0.1 cover image
fp16
32k
Replaced
  • text-generation

The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.

mistralai/Mistral-7B-Instruct-v0.2 cover image
fp16
32k
Replaced
  • text-generation

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.2 generative text model using a variety of publicly available conversation datasets.

mistralai/Mixtral-8x22B-Instruct-v0.1 cover image
bfloat16
64k
Replaced
  • text-generation

This is the instruction fine-tuned version of Mixtral-8x22B - the latest and largest mixture of experts large language model (LLM) from Mistral AI. This state of the art machine learning model uses a mixture 8 of experts (MoE) 22b models. During inference 2 experts are selected. This architecture allows large models to be fast and cheap at inference.

mistralai/Mixtral-8x22B-v0.1 cover image
fp16
64k
Replaced
  • text-generation

Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. This model is not instruction tuned.

nvidia/Nemotron-4-340B-Instruct cover image
bfloat16
4k
Replaced
  • text-generation

Nemotron-4-340B-Instruct is a chat model intended for use for the English language, designed for Synthetic Data Generation

openai/clip-vit-base-patch32 cover image
$0.0005 / sec
  • zero-shot-image-classification

The CLIP model was developed by OpenAI to investigate the robustness of computer vision models. It uses a Vision Transformer architecture and was trained on a large dataset of image-caption pairs. The model shows promise in various computer vision tasks but also has limitations, including difficulties with fine-grained classification and potential biases in certain applications.

openai/clip-vit-large-patch14-336 cover image
$0.0005 / sec
  • zero-shot-image-classification

A zero-shot-image-classification model released by OpenAI. The clip-vit-large-patch14-336 model was trained from scratch on an unknown dataset and achieves unspecified results on the evaluation set. The model's intended uses and limitations, as well as its training and evaluation data, are not provided. The training procedure used an unknown optimizer and precision, and the framework versions included Transformers 4.21.3, TensorFlow 2.8.2, and Tokenizers 0.12.1.