DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
intfloat/
$0.010
/ 1M tokens
The Multilingual-E5 models, initialized from XLM-RoBERTa, support up to 512 tokens per input — any longer text will be silently truncated. To ensure optimal performance, always prefix inputs with “query:” or “passage:”, as the model was explicitly trained with this format.
© 2026 DeepInfra. All rights reserved.