Nemotron 3 Nano Omni — the first multimodal model in the Nemotron 3 family, now on DeepInfra!

The text generation inference open source project by huggingface looked like a promising framework for serving large language models (LLM). However, huggingface announced that they will change the license of code with version v1.0.0. While the previous license Apache 2.0 was permissive, the new one is restrictive for our use cases.
We decided to fork the project and continue to maintain it under the Apache 2.0 license. We will continue to contribute to the project and keep it up to date. We will accept pull requests from the community, and we will keep the project truly open source and free to use.
Here is a link to the code: https://github.com/deepinfra/text-generation-inference
We hope that in time a community of other developers and organizations that want to keep this project truly open source will form around it.
Sadly it is becoming more and more common for popular open source projects to change their license after they gain some traction. This happened with MongoDB, Grafana, ElasticSearch, and many others. As a developer, when you decide to adopt a particular open source project, you start investing time and effort into using it. You build your application around it, and you start depending on it. Then, suddenly, the license changes, and you might be forced to find an alternative.
Imagine if meta changes the license of pytorch. Or if tomorrow huggingface decides to change the license of transformers in a similar way to prohibit commercial use.
We believe that the changing of the license of open source projects mid-flight is a unfriendly move towards the community.
If you need any help, just reach out to us on our Discord server.
Nemotron 3 Nano Explained: NVIDIA’s Efficient Small LLM and Why It Matters<p>The open-source LLM space has exploded with models competing across size, efficiency, and reasoning capability. But while frontier models dominate headlines with enormous parameter counts, a different category has quietly become essential for real-world deployment: small yet high-performance models optimized for edge devices, private on-prem systems, and cost-sensitive applications. NVIDIA’s Nemotron family brings together open […]</p>
Qwen3.5 122B A10B API Benchmarks: Latency, Throughput & Cost<p>About Qwen3.5 122B A10B Qwen3.5 122B A10B is Alibaba Cloud’s mid-tier multimodal foundation model, released in February 2026. It is a multimodal vision-language Mixture-of-Experts model supporting text, image, and video inputs, designed for native multimodal agent applications. It features 122 billion total parameters with 10 billion activated per token through a hybrid architecture that integrates […]</p>
DeepSeek V4 Pro Pricing Guide 2026: Pricing, Providers & Cost Comparison<p>DeepSeek V4 Pro matters because it pushes two levers developers actually care about at the same time: open-weight availability and a very competitive provider market. As of the research here, DeepSeek V4 Pro Max is tracked across six API providers, and five of them cluster at the same blended price of $2.17 per 1M tokens […]</p>
© 2026 Deep Infra. All rights reserved.