DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
google/
$0.25
in
$1.50
out
/ 1M tokens
Bring any idea to life with state-of-the-art reasoning to help you learn, build, and plan anything. Best for high-volume tasks that need efficiency and intelligence.

Ask me anything
Settings
Introducing Gemini 3.1 Flash-Lite, a scalable thinking model for high-volume tasks at low cost and latency.
Handles tasks of varying complexity like coding, UI generation and translation with high quality
Flexible reasoning levels Delivers improved reasoning and output quality, allowing users to select the level of thinking they want to use.
Low latency Tackles high-volume tasks with faster response times.
Tool use Delivers high throughput with quality using search grounding and enhanced instruction following.
Cost-efficient Our most cost efficient model yet in the 3 series.
© 2026 DeepInfra. All rights reserved.