We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Gemini Model Family

Developed by Google DeepMind, Gemini is a family of state-of-the-art thinking models with native multimodal capabilities, designed for advanced reasoning, complex problem-solving, and comprehensive understanding across text, audio, video, and images. Built with revolutionary thinking architecture, Gemini models reason through problems step-by-step before responding, delivering enhanced accuracy and performance for sophisticated applications.

Gemini 2.5 Pro sets new standards for complex reasoning and coding excellence, while Gemini 2.5 Flash provides optimal price-performance for high-volume tasks. With massive context windows up to 1 million tokens, native multimodal processing that handles hours of video and audio, and transparent reasoning capabilities that show step-by-step thinking processes, Gemini excels at document analysis, code generation, scientific research, and agentic workflows.

Perfect for building intelligent applications that require deep reasoning, multimodal understanding, long-context processing, and transparent AI decision-making with Google's enterprise-grade reliability and performance.

Featured Model: google/gemini-2.5-pro

Gemini 2.5 Pro is Google's most advanced thinking model, leading in complex reasoning, advanced coding, and multimodal understanding—with transparent step-by-step reasoning and state-of-the-art performance across academic and real-world benchmarks.

Price per 1M input tokens

$1.25

Price per 1M output tokens

$10.00

Release Date

04/17/2025

Context Size

1,000,000

# Assume openai>=1.0.0
from openai import OpenAI

# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

chat_completion = openai.chat.completions.create(
    model="google/gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello"}],
)

print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)

# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
copy

Available Gemini Models

DeepInfra provides access to Google's latest Gemini models, featuring advanced thinking capabilities, native multimodal processing, and industry-leading performance for complex reasoning and development tasks.

Model	Context	$ per 1M input tokens	$ per 1M output tokens	Actions
gemini-2.5-pro	976k	$1.25	$10.00	View more
gemini-2.5-flash	976k	$0.30	$2.50	View more
gemini-2.0-flash-001	976k	$0.10	$0.40	View more

FAQ

What is Gemini AI?

Gemini is a family of state-of-the-art thinking models developed by Google DeepMind, designed with native multimodal capabilities and advanced reasoning architecture. Built as thinking models, Gemini can reason through complex problems step-by-step before responding, resulting in enhanced accuracy and performance.

Available in multiple variants including Gemini 2.5 Pro for maximum reasoning capabilities, Gemini 2.5 Flash for optimal price-performance, and Gemini 2.0 Flash for next-generation features, Gemini models excel at complex coding, scientific reasoning, document analysis, and multimodal understanding across text, audio, video, and images with transparent reasoning processes and enterprise-grade reliability.

What tasks are Gemini models best suited for?

Advanced reasoning and problem-solving with transparent step-by-step thinking processes for complex logical tasks
Complex coding and software development with state-of-the-art performance on coding benchmarks and repository analysis
Multimodal content analysis processing text, audio, video, and images simultaneously with native understanding
Long-context document processing analyzing up to 1 million tokens including entire codebases, research papers, and datasets
Scientific research and mathematics with exceptional performance on STEM benchmarks and complex calculations
Agentic workflows and automation with sophisticated reasoning for multi-step task execution and decision-making
Enterprise document analysis extracting insights from legal contracts, medical records, and business reports
Video and audio understanding processing hours of multimedia content for comprehensive analysis and Q&A
Structured output generation with precise JSON formatting and function calling capabilities

Are the Gemini models on Deepinfra optimized for low latency?

Yes. DeepInfra's infrastructure delivers optimized performance for Gemini models with intelligent load balancing and efficient resource allocation. Gemini 2.5 Flash is specifically designed for low-latency, high-volume tasks while maintaining thinking capabilities. The models feature adjustable thinking budgets that automatically calibrate processing time based on query complexity—providing faster responses for simple requests and deeper reasoning for complex problems.

What makes Gemini's thinking capabilities unique?

Gemini's thinking capabilities represent a breakthrough in AI reasoning through several key innovations:

Transparent Step-by-Step Processing - Observe the model's reasoning process in real-time as it works through problems
Adaptive Thinking Budgets - Automatically adjust processing time based on query complexity or manually control for optimal cost-performance balance
Parallel Thinking Strategies - Explore multiple hypotheses simultaneously leading to more accurate outcomes
Native Multimodal Reasoning - Combine visual, audio, and text understanding in unified thinking processes
Deep Think Mode - Available in Gemini 2.5 Pro, uses cutting-edge reinforcement learning for the most complex problems
Enterprise Transparency - Provides traceable decision-making crucial for compliance and trust

This combination enables sophisticated problem-solving, strategic planning, and complex coding tasks while maintaining full visibility into the AI's reasoning process.

How do I integrate Gemini models into my application?

You can integrate Gemini models seamlessly using DeepInfra’s OpenAI-compatible API. Just replace your existing base URL with DeepInfra’s endpoint and use your DeepInfra API key—no infrastructure setup required. DeepInfra also supports integration through libraries like openai, litellm, and other SDKs, making it easy to switch or scale your workloads instantly.