DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

The text generation inference open source project by huggingface looked like a promising framework for serving large language models (LLM). However, huggingface announced that they will change the license of code with version v1.0.0. While the previous license Apache 2.0 was permissive, the new one is restrictive for our use cases.
We decided to fork the project and continue to maintain it under the Apache 2.0 license. We will continue to contribute to the project and keep it up to date. We will accept pull requests from the community, and we will keep the project truly open source and free to use.
Here is a link to the code: https://github.com/deepinfra/text-generation-inference
We hope that in time a community of other developers and organizations that want to keep this project truly open source will form around it.
Sadly it is becoming more and more common for popular open source projects to change their license after they gain some traction. This happened with MongoDB, Grafana, ElasticSearch, and many others. As a developer, when you decide to adopt a particular open source project, you start investing time and effort into using it. You build your application around it, and you start depending on it. Then, suddenly, the license changes, and you might be forced to find an alternative.
Imagine if meta changes the license of pytorch. Or if tomorrow huggingface decides to change the license of transformers in a similar way to prohibit commercial use.
We believe that the changing of the license of open source projects mid-flight is a unfriendly move towards the community.
If you need any help, just reach out to us on our Discord server.
Kimi K2.5 API Benchmarks: Latency, Throughput & Cost<p>About Kimi K2.5 Kimi K2.5 is Moonshot AI’s flagship open-source reasoning model, released in January 2026. It is a native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens. The model features a Mixture-of-Experts (MoE) architecture with 1 trillion total parameters and 32 billion activated parameters. Kimi K2.5 […]</p>
Best API for Kimi K2.5: Why DeepInfra Leads in Speed, TTFT, and Scalability<p>Kimi K2.5 is positioned as Moonshot AI’s “do-it-all” model for modern product workflows: native multimodality (text + vision/video), Instant vs. Thinking modes, and support for agentic / multi-agent (“swarm”) execution patterns. In real applications, though, model capability is only half the story. The provider’s inference stack determines the things your users actually feel: time-to-first-token (TTFT), […]</p>
Best SaaS Platforms for Deploying Gemma 4 in 2026<p>Gemma 4 is available across a range of platforms — from fully managed API providers to local runners and no-code builders. The right choice depends on what you’re optimizing for: cost, latency, data privacy, local execution, or zero infrastructure overhead. This guide breaks down the top options by use case so you can match the […]</p>
© 2026 DeepInfra. All rights reserved.