DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

MiMo-V2.5 is a native omnimodal model developed by XiaomiMiMo, designed to process and understand text, image, video, and audio through a unified architecture rather than relying on “bolted-on” components for each modality.
Built on a 310-billion-parameter Sparse Mixture of Experts (MoE) architecture — with only 15 billion parameters activated during inference — MiMo-V2.5 offers a strong balance of high-tier reasoning and computational efficiency. With a 1-million-token context window and agentic capabilities, it is engineered for complex multimodal perception, long-context reasoning, and autonomous workflows.
MiMo-V2.5 represents a meaningful step forward from its predecessor, MiMo-V2-Flash. By utilizing native, dedicated encoders for diverse data types, the model achieves a level of cohesion not commonly seen in large-scale models.
Key Technical Features
Configuration Notice: Developers who downloaded the model prior to recent repository updates should re-pull the config.json and tokenizer_config.json files to ensure optimal performance and avoid degraded behavior.
MiMo-V2.5 demonstrates competitive performance against frontier closed-source models, particularly in coding, temporal video reasoning, and agentic decision-making.
The model’s use of Reinforcement Learning (RL) places it near the Pareto frontier for daily agentic tasks.
| Benchmark | Category | MiMo-V2.5 Score | Claude Opus 4.6 | Gemini 3.1 Pro |
|---|---|---|---|---|
| Coding (General) | Programming/Logic | 71.8 | 77.1 | 67.8 |
| Claw-Eval Text | General Agentic | 65.8 | 70.8 | 68.5 |
| Terminal-Bench 2.0 | CLI Operations | 56.1 | 57.3 | 54.2 |
MiMo-V2.5 shows sharp perception for temporal reasoning, matching or approaching industry leaders in video and image understanding.
| Benchmark | Modality | MiMo-V2.5 Score | Gemini 3 Pro | Kimi K2.6 |
|---|---|---|---|---|
| Image Understanding | Vision-Language | 81.0 | 81.4 | 80.4 |
| Video-MME | Video | 83.5 | 84.2 | — |
| MMMU-Pro | Multi-discipline | 88.5 | — | — |
| CharXiv RQ | Chart/Diagram | 77.9 | 81.0 | 79.4 |
The model supports up to 1,000,000 tokens, validated through benchmarks like Graphwalks for path-finding and retrieval. A learnable attention sink bias helps reasoning accuracy remain stable even at the 1M token limit.
MiMo-V2.5 is hosted on DeepInfra, providing high-performance, low-latency inference via an OpenAI-compatible API.
Retrieve your API key from your DeepInfra Dashboard and include it in your HTTP headers:
Authorization: Bearer <YOUR_DEEPINFRA_API_KEY>
Using cURL
curl -X POST https://api.deepinfra.com/v1/openai/chat/completions \
-H "Authorization: Bearer $DEEPINFRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "XiaomiMiMo/MiMo-V2.5",
"messages": [
{
"role": "user",
"content": "Explain the advantages of a hybrid attention architecture in 2 sentences."
}
]
}'Using Python
import os
import requests
url = "https://api.deepinfra.com/v1/openai/chat/completions"
api_key = os.getenv("DEEPINFRA_API_KEY")
payload = {
"model": "XiaomiMiMo/MiMo-V2.5",
"messages": [{"role": "user", "content": "Explain the advantages of a hybrid attention architecture."}]
}
response = requests.post(url, headers={"Authorization": f"Bearer {api_key}"}, json=payload)
print(response.json())Pricing is usage-based, calculated per 1 million tokens. DeepInfra offers two tiers to balance cost and priority.
| Tier | Input Price | Output Price | Cached Input Price |
|---|---|---|---|
| Standard | $0.40 | $2.00 | $0.08 |
| Priority (1.5×) | $0.60 | $3.00 | $0.12 |
XiaomiMiMo’s MiMo-V2.5 is a capable and versatile model for the next generation of AI applications. By combining a 1M token context window with native omnimodal understanding and an efficient MoE architecture, it gives developers frontier-model capabilities at a comparatively lower resource cost.
Whether you are building agentic workflows, analyzing hour-long videos, or processing large document sets, MiMo-V2.5 offers the performance and flexibility for professional-grade deployment.
Best API Providers for GLM-5.1 in 2026<p>GLM-5.1 is available across a growing number of API providers, and the choice between them materially affects cost, latency, and what features you can actually use. The benchmark spread is real: blended pricing runs from $0.74 to $1.70 per 1M tokens across tracked providers, output speed ranges from 33 to 175 t/s, and not every […]</p>
Fork of Text Generation Inference.The text generation inference open source project by huggingface looked like a promising
framework for serving large language models (LLM). However, huggingface announced that they
will change the license of code with version v1.0.0. While the previous license Apache 2.0
was permissive, the new on...
OpenCode: Open-Source Claude Code Alternative<p>Open your cloud bill after a month of heavy agent use and the number stops being abstract. Teams report coding-assistant costs in the hundreds of dollars per developer, and some now set token budgets the way they once rationed cloud compute. Then in June 2026 the US government barred non-Americans from Anthropic’s Fable 5, and […]</p>
© 2026 DeepInfra. All rights reserved.