We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Long Context models incoming

Published on 2023.11.21 by Iskren Chernev

Many users requested longer context models to help them summarize bigger chunks of text or write novels with ease.

We're proud to announce our long context model selection that will grow bigger in the comming weeks.

Models

Mistral-based models have a context size of 32k, and amazon recently released a model fine-tuned specifically on longer contexts.

Mistral-7B -- 32k context
MistralLite -- 32k context, fine-tuned for longer context

We also recently released the highly praised Yi models. Keep in mind they don't support chat, just the old-school text completion (new models are in the works):

Yi-6B-200K -- 200K context
Yi-34B-200K -- 200K context

Context FAQ

What does context mean? - Context is the number of tokens the model can look at at the same time. In practise this is the limit of the sent tokens + max new tokens (2k default) that can be generated. There is no 1:1 correspondence between tokens and words, general rule-of-thumb is 100 tokens is 75 words.
Can I go above the context? - Technically -- no. Any API to a 4k model that lets you pass more will one way or another truncate the tokens given to the LLM to it's context size. You (as users) can also try reducing a long input (by removing some from the middle, for example) and re-submit.
My input fits the context size, but the model doesn't take it into account - The context size of a model is a hard creation-specified limit. Just because a model is listed as having a certain context size doesn't mean you can cram it with information and expect excellent results. Models differ in many parameters, and one such parameter is how far back they can recall information.Check MistalLite HF for some metrics shared by Amazon.
What can I do to make the model take into account more of the data I sent - There is no one answer to this question, and it varies by model, and each model capacity varies. The best you can do is test a particular model with a single task (like summarization, question answering), with different context lengths and find what works for you. Also try placing system prompt at the start and/or end. For most control you can utilize the text-completion endpoint (not the chat one).
Is a longer context model guaranteed to understand more context than a short-context model - Unfortunately -- no. For example a very bad model with huge context size may fail to act on a single sentence, whereas a good model with shorter context length could comprehend a few paragraphs or even pages.
I can't find a good model for my large context needs! - Don't loose hope! This is a rapidly evolving ecosystem with models being released every day. We try our best to provide the best open-source models to you, as quickly as possible. Let us know which models you like by emailing feedback@deepinfra.com or drop us a message in discord

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales Get Started

Latest Models

Phind/

Phind-CodeLlama-34B-v2

Gryphe/

MythoMax-L2-13b

bigcode/

starcoder2-15b

openchat/

openchat_3.5

openai/

whisper-tiny

Featured Models

deepseek-ai/

DeepSeek-V3-0324-Turbo

Qwen/

Qwen3-235B-A22B-Instruct-2507

sesame/

csm-1b

zai-org/

GLM-4.5-Air

meta-llama/

Llama-3.3-70B-Instruct-Turbo

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-Turbo

Company

Pricing

Docs

Compare

DeepStart

About

Careers

Trust Center

Privacy

Terms

Have questions or need a custom solution?

Contact Sales