DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

MiMo-V2.5 is worth paying attention to because it puts three things developers usually have to trade off into the same conversation: open weights, a 1 million-token model design, and pricing that can be unusually low depending on where you buy it. On Xiaomi’s first-party API, Artificial Analysis lists MiMo-V2.5 at $0.14 per 1M input tokens and $0.28 per 1M output tokens, with a $0.003 per 1M cache-hit price and a blended rate of $0.06 per 1M tokens under a 7:2:1 cache/input/output mix. That is the kind of pricing profile that makes engineers stop treating “large multimodal reasoning model” as automatically expensive.
At a basic level, MiMo-V2.5 is a Xiaomi model released on April 22, 2026, and published as open weights under the MIT license. It is a sparse Mixture of Experts model with 310 billion total parameters and 15 billion active parameters per inference. The model family is positioned as multimodal: the research consistently supports text and image input, while DeepInfra’s technical page describes it as a native omnimodal system with text, image, video, and audio support. The headline context number is 1 million tokens for the MiMo-V2.5 model itself, though provider implementations can differ — DeepInfra’s API spec lists a 262,144-token context window on its endpoint.
What makes MiMo-V2.5 interesting is not just that it is big, but that it looks unusually practical. Artificial Analysis estimates an Intelligence Index score of 40, versus a median of 25 for comparable open-weight models, and reports 87.2 tokens per second output speed on Xiaomi’s API, above the comparable-model median of 68.7 t/s. It also supports reasoning-style extended thinking, and because the weights are openly available on Hugging Face under MIT, teams can evaluate hosted access against self-hosting without changing model family.
If you are evaluating this model for production, the real question is not whether MiMo-V2.5 is “good” in the abstract. It is whether its price, context length, multimodal support, and deployment options line up with your workload. Teams optimizing raw token economics will care about Xiaomi’s first-party rates and OpenRouter’s lower listed input price; teams that need managed endpoints, private deployment, JSON mode, function calling, and a straightforward way to operationalize the model will probably care more about what DeepInfra offers than about chasing the absolute cheapest token.
MiMo-V2.5 spans a wide pricing range depending on provider: Xiaomi’s first-party API is listed by Artificial Analysis at $0.14 input / $0.28 output per 1M tokens with a $0.06 blended rate in a cache-heavy mix; OpenRouter lists $0.105 input / $0.28 output per 1M tokens; and DeepInfra prices it at $0.40 input / $2.00 output on standard tier. In practice, this model is best suited for developers who want an open-weight Xiaomi model with long-context and multimodal capabilities. DeepInfra is the stronger fit when deployment control and platform features matter more than minimum token price, while Xiaomi and OpenRouter are the better pure-cost benchmarks.
| Best For | Provider | Why |
|---|---|---|
| Proprietary or managed model access | DeepInfra Private Endpoint | DeepInfra offers private endpoint deployment via its dashboard and supports JSON mode, function calling, and multimodal features. |
| RAG, document-heavy, or high-throughput use cases | DeepInfra Standard Endpoint | DeepInfra exposes MiMo-V2.5 on a public endpoint with cached input pricing and a 262,144-token API context window, useful for long prompts and structured production workloads. |
| Lowest price / cost-sensitive workloads | Xiaomi first-party API | Artificial Analysis lists the lowest fully detailed cost structure here: $0.14 input, $0.28 output, $0.003 cache-hit pricing, and a $0.06 blended rate under a cache-heavy usage mix. |
| Easiest onboarding / fastest time-to-first-call | OpenRouter | OpenAI-compatible API; integration can be as simple as swapping the model slug to xiaomi/mimo-v2.5. |
| Lowest listed input price | OpenRouter | Lists input at $0.105 per 1M tokens, lower than Xiaomi’s $0.14 and DeepInfra’s $0.40 standard-tier price. |
Token pricing is where model comparisons get deceptively messy. “Cheap per million” does not always mean “cheap for my workload,” especially once long prompts, repeated context, and output-heavy tasks get involved.
For MiMo-V2.5, there are four token buckets to think about:
| Token type | What it is | Why it matters |
|---|---|---|
| Input tokens | The tokens you send in the request: system prompt, user prompt, tool schemas, documents, chat history, and multimodal text-side payload | This is your baseline prompt cost. Long-context RAG, agent state, and big instruction blocks push this up fast. |
| Output tokens | The tokens the model generates back | Reasoning-heavy or verbose tasks can make output cost dominate, especially on providers with a large output/input price gap. |
| Cached input tokens | Prompt tokens the provider can reuse from earlier requests instead of billing as full fresh input | This is where repeated prefixes get much cheaper. It matters a lot for chat apps, agents, and document workflows with stable context. |
| Context window tokens | The maximum total tokens the model can attend to in one request | Not a separate billing category, but it controls whether you can actually use the giant prompts you are planning to pay for. Provider limits matter here. |
Different providers make MiMo-V2.5 look like a different economic proposition. The model is the same family. The bill usually is not.
| Provider | Token cost advantages | Token cost disadvantages |
|---|---|---|
| Xiaomi first-party API | Lowest fully detailed pricing in the sources: $0.14/M input, $0.28/M output, $0.003/M cache hit. Artificial Analysis reports a $0.06/M blended rate under a 7:2:1 cache/input/output mix. Strong fit for repeated-context workloads where caching does real work. | Cheapest only if your usage pattern matches the pricing strengths. If your workload is output-heavy and not cache-friendly, the blended number stops being useful and you are back to raw input/output rates. |
| OpenRouter | Lowest listed input price at $0.105/M. Effective pricing after prompt caching can be 60–80% cheaper than listed provider price based on rolling 30-day averages. Good option when you want low entry cost and OpenAI-compatible access. | Pricing is routing-dependent and less explicit than Xiaomi’s full breakdown. No separate cache-hit line item in publicly listed rates, so forecasting repeated-prefix savings is less clean. |
| DeepInfra Standard | Clear public pricing and platform-friendly billing: $0.40/M input, $2.00/M output, $0.08/M cached input. Useful if you care more about managed deployment, API features, and predictable integration than absolute token minimums. | Much more expensive on raw tokens, especially output. The 5x output-to-input ratio punishes verbose generations, reasoning traces, and agent loops. |
| DeepInfra Priority | Same operational model as standard tier with higher service priority. Straightforward pricing: $0.60/M input, $3.00/M output, $0.12/M cached input. | Most expensive option in the set. The 1.5× multiplier over standard tier compounds fast on output-heavy traffic. |
If you want MiMo-V2.5 with more operational control, DeepInfra is the power-user option. It runs on bare-metal infrastructure, which matters because cutting out virtualization overhead can help with more predictable performance and better cost efficiency at scale. DeepInfra is also typically 50–80% cheaper than major cloud competitors, which is exactly why it tends to appeal to developers, high-volume API users, and cost-conscious teams that still want managed deployment instead of rolling everything themselves. For teams that care about throughput, platform features, and production readiness — not just the lowest sticker price — it is an easy provider to shortlist. The broader multimodal model catalog is worth scanning to see how MiMo-V2.5 fits alongside related options.
| Model Name | Best Use Case | Context Window | Input ($/1M) | Output ($/1M) |
|---|---|---|---|---|
| MiMo-V2.5 (Standard) | Long-context multimodal production workloads on a public endpoint | 262,144 tokens | $0.40 | $2.00 |
| MiMo-V2.5 (Priority) | Higher-priority traffic where you want the same model with faster service tiering | 262,144 tokens | $0.60 | $3.00 |
DeepInfra lists MiMo-V2.5 at $0.40 per 1M input tokens and $2.00 per 1M output tokens on standard tier. That is a much stronger cost story than GPT-4o-class pricing for teams that need to run large volumes of requests, especially when paired with DeepInfra’s managed deployment options. If your workload is big enough, this is the kind of pricing gap that can materially change what is feasible in production.
Below are practical scenarios where DeepInfra makes sense for MiMo-V2.5 — not because it is the absolute cheapest token source, but because it pairs managed deployment, multimodal support, JSON mode, function calling, and private endpoint options with still-reasonable model economics.
An internal support copilot that reads policy docs, ticket history, and tool schemas on every call, then returns structured JSON for downstream automation. This is the kind of workload where DeepInfra’s managed endpoint and JSON mode are more valuable than chasing the lowest raw token rate.
| Metric | Value |
|---|---|
| Volume | 5,000 requests/month |
| Model | MiMo-V2.5 |
| Provider | DeepInfra Standard |
| Input Tokens | 250M |
| Output Tokens | 25M |
| Monthly Cost | $150.00 |
Cost breakdown:
DeepInfra Priority would cost $225.00/month — $75.00 more.
Why DeepInfra fits: JSON mode helps when the output needs to land in ticketing or workflow systems cleanly; function calling helps when the copilot needs to trigger internal tools; the 262,144-token API context window is a practical fit for document-heavy prompts without self-hosting complexity.
An agent that reviews product text plus images, classifies issues, and generates short action summaries. This plays directly into DeepInfra’s multimodal positioning for MiMo-V2.5, while keeping deployment simple through one managed API.
| Metric | Value |
|---|---|
| Volume | 2,000,000 items/month |
| Model | MiMo-V2.5 |
| Provider | DeepInfra Standard |
| Input Tokens | 400M |
| Output Tokens | 40M |
| Monthly Cost | $240.00 |
Cost breakdown:
DeepInfra Priority would cost $360.00/month — $120.00 more.
Why DeepInfra fits: DeepInfra describes MiMo-V2.5 as supporting text, image, video, and audio on its platform. One provider for multimodal inference is simpler than stitching together separate model endpoints.
Legal intake, insurance ops, or enterprise document review where teams want MiMo-V2.5 behind a private endpoint rather than on a shared public path.
| Metric | Value |
|---|---|
| Volume | 20,000 document runs/month |
| Model | MiMo-V2.5 |
| Provider | DeepInfra Standard (Private Endpoint) |
| Input Tokens | 600M |
| Output Tokens | 60M |
| Monthly Cost | $360.00 |
Cost breakdown:
DeepInfra Priority would cost $540.00/month — $180.00 more.
Why DeepInfra fits: Private endpoint availability is the real differentiator here. You keep the MIT-licensed MiMo-V2.5 model family while moving toward a more controlled production setup, with a clear path to compare managed hosting vs. self-hosting later without changing models.
A production agent stack with stable system prompts, tool definitions, and reused context across many requests. DeepInfra is not as cheap as Xiaomi on cache economics, but it still offers a meaningful cached-input discount versus fresh input while keeping the API operationally straightforward.
| Metric | Value |
|---|---|
| Volume | 50M cached input + 100M fresh input + 20M output tokens/month |
| Model | MiMo-V2.5 |
| Provider | DeepInfra Standard |
| Monthly Cost | $84.00 |
Cost breakdown:
DeepInfra Priority would cost $126.00/month — $42.00 more.
Why DeepInfra fits: Cached input is still much cheaper than fresh input; function calling support matters for agent loops; good option when you want a managed endpoint and predictable integration, not just lowest possible token pricing.
A small team starts with a public endpoint for fast iteration, then moves to a private deployment path later if the product sticks. DeepInfra lets you operationalize MiMo-V2.5 early without committing to self-hosting from day one.
| Metric | Value |
|---|---|
| Volume | 10M input + 5M output tokens/month |
| Model | MiMo-V2.5 |
| Provider | DeepInfra Standard |
| Monthly Cost | $14.00 |
Cost breakdown:
DeepInfra Priority would cost $21.00/month — $7.00 more.
Why DeepInfra fits: Easy path from prototype to managed production. Same provider supports public access and private endpoint deployment. Strong fit for teams that value platform features and deployment flexibility more than shaving every last cent off token costs.
Choosing a provider for MiMo-V2.5 is not really a question of which number looks smallest in a pricing table. It is a question of which cost structure fits your actual usage pattern — how much of your context is reused, how verbose your outputs are, how much operational scaffolding you want to manage yourself, and whether platform features like JSON mode, function calling, or private endpoints are load-bearing parts of your architecture or nice-to-haves.
The practical decision criteria come down to a few things worth being honest about before you commit. If your workload is cache-heavy and you are optimizing for raw token economics, Xiaomi’s first-party pricing is the benchmark to beat. If you want OpenAI-compatible routing with low input cost and minimal setup friction, OpenRouter is a reasonable starting point. But if you need a managed endpoint with multimodal support, predictable API behavior, and a clear path from public access to private deployment — without switching model families mid-build — DeepInfra’s positioning for MiMo-V2.5 is harder to argue with. The output pricing is higher than the alternatives, so output-heavy workloads need to be sized carefully, but the platform features absorb real engineering cost that does not show up in a per-token comparison.
One thing worth flagging before you finalize your architecture: the MiMo-V2.5 model family extends beyond the base model covered in this guide. If you need a larger reasoning model, MiMo-V2.5-Pro runs 1.02T total parameters with 42B active, with full MiMo-V2.5-Pro API documentation available for integration planning. If your pipeline touches audio, DeepInfra also hosts the MiMo-V2.5-tts API for speech synthesis, along with a voice configuration interface for tuning output characteristics. The model family is broader than the base omnimodal endpoint alone.
If you are ready to test the model directly, visit the MiMo-V2.5 model page to run a real prompt against the endpoint — the fastest way to validate fit before writing a line of integration code.
DeepSeek’s $10.29B Financing Round Explained<p>DeepSeek has not taken outside money since it was founded in 2023. For two years it turned down every venture capital firm and major tech company that came calling, funding its research entirely from the returns of its parent hedge fund, Zhejiang High-Flyer Asset Management, which reportedly posted a 56.6% return in 2025. That era […]</p>
Inference Economics: True AI Costs at Scale<p>Most teams discover their inference economics the same way: a production bill arrives that looks nothing like the number they expected. The per-token price seemed small enough during testing. Then real traffic showed up, agents started chaining calls, RAG pipelines bloated the context window, and suddenly the math looked completely different. Token prices have fallen […]</p>
DeepInfra Launches Access to NVIDIA Cosmos 3 World Foundation Models for Physical AIDeepInfra is serving NVIDIA Cosmos 3, the first open world foundation model for physical AI that reasons before it generates, from day zero of its release. Available as two variants—Cosmos 3 Nano and Cosmos 3 Super—these models give developers a cost-efficient foundation for building robots, autonomous vehicles, simulation workflows, and synthetic data generation at scale.© 2026 DeepInfra. All rights reserved.