We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Long Context models incoming
Published on 2023.11.21 by Iskren Chernev
Long Context models incoming

Many users requested longer context models to help them summarize bigger chunks of text or write novels with ease.

We're proud to announce our long context model selection that will grow bigger in the comming weeks.

Models

Mistral-based models have a context size of 32k, and amazon recently released a model fine-tuned specifically on longer contexts.

We also recently released the highly praised Yi models. Keep in mind they don't support chat, just the old-school text completion (new models are in the works):

Context FAQ

  • What does context mean? - Context is the number of tokens the model can look at at the same time. In practise this is the limit of the sent tokens + max new tokens (2k default) that can be generated. There is no 1:1 correspondence between tokens and words, general rule-of-thumb is 100 tokens is 75 words.
  • Can I go above the context? - Technically -- no. Any API to a 4k model that lets you pass more will one way or another truncate the tokens given to the LLM to it's context size. You (as users) can also try reducing a long input (by removing some from the middle, for example) and re-submit.
  • My input fits the context size, but the model doesn't take it into account - The context size of a model is a hard creation-specified limit. Just because a model is listed as having a certain context size doesn't mean you can cram it with information and expect excellent results. Models differ in many parameters, and one such parameter is how far back they can recall information.Check MistalLite HF for some metrics shared by Amazon.
  • What can I do to make the model take into account more of the data I sent - There is no one answer to this question, and it varies by model, and each model capacity varies. The best you can do is test a particular model with a single task (like summarization, question answering), with different context lengths and find what works for you. Also try placing system prompt at the start and/or end. For most control you can utilize the text-completion endpoint (not the chat one).
  • Is a longer context model guaranteed to understand more context than a short-context model - Unfortunately -- no. For example a very bad model with huge context size may fail to act on a single sentence, whereas a good model with shorter context length could comprehend a few paragraphs or even pages.
  • I can't find a good model for my large context needs! - Don't loose hope! This is a rapidly evolving ecosystem with models being released every day. We try our best to provide the best open-source models to you, as quickly as possible. Let us know which models you like by emailing feedback@deepinfra.com or drop us a message in discord
Related articles
DeepSeek V4 Pro: Model Overview, Features & Performance GuideDeepSeek V4 Pro: Model Overview, Features & Performance Guide<p>DeepSeek V4 Pro is a 1.6-trillion parameter Mixture-of-Experts (MoE) model from DeepSeek, released on April 24, 2026 under the MIT license. It is designed for advanced reasoning, complex software engineering, and long-running agentic tasks, and arrives alongside DeepSeek-V4-Flash, a lighter 284B-parameter variant built for faster, lower-cost inference. The V4 series is DeepSeek&#8217;s first two-tier lineup [&hellip;]</p>
NVIDIA Nemotron 3 Super: Model Overview & Integration GuideNVIDIA Nemotron 3 Super: Model Overview & Integration Guide<p>The NVIDIA Nemotron 3 Super is a state-of-the-art 120-billion parameter hybrid Mixture-of-Experts (MoE) model designed to bridge the gap between high-compute efficiency and extreme accuracy. Engineered specifically for the next generation of AI development, Nemotron 3 Super excels in multi-agent applications, specialized agentic systems, and complex reasoning tasks. By utilizing a sophisticated architecture that activates [&hellip;]</p>
The easiest way to build AI applications with Llama 2 LLMs.The easiest way to build AI applications with Llama 2 LLMs.The long awaited Llama 2 models are finally here! We are excited to show you how to use them with DeepInfra. These collection of models represent the state of the art in open source language models. They are made available by Meta AI and the l...