We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Gemini Model Family

Developed by Google DeepMind, Gemini is a family of state-of-the-art thinking models with native multimodal capabilities, designed for advanced reasoning, complex problem-solving, and comprehensive understanding across text, audio, video, and images. Built with revolutionary thinking architecture, Gemini models reason through problems step-by-step before responding, delivering enhanced accuracy and performance for sophisticated applications.

Gemini 2.5 Pro sets new standards for complex reasoning and coding excellence, while Gemini 2.5 Flash provides optimal price-performance for high-volume tasks. With massive context windows up to 1 million tokens, native multimodal processing that handles hours of video and audio, and transparent reasoning capabilities that show step-by-step thinking processes, Gemini excels at document analysis, code generation, scientific research, and agentic workflows.

Perfect for building intelligent applications that require deep reasoning, multimodal understanding, long-context processing, and transparent AI decision-making with Google's enterprise-grade reliability and performance.

Featured Model: google/gemini-2.5-pro

Gemini 2.5 Pro is Google's most advanced thinking model, leading in complex reasoning, advanced coding, and multimodal understanding—with transparent step-by-step reasoning and state-of-the-art performance across academic and real-world benchmarks.

Price per 1M input tokens

$0.875


Price per 1M output tokens

$7.00


Release Date

04/17/2025


Context Size

1,000,000


Quantization


# Assume openai>=1.0.0
from openai import OpenAI

# Create an OpenAI client with your deepinfra token and endpoint
openai = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)

chat_completion = openai.chat.completions.create(
    model="google/gemini-2.5-pro",
    messages=[{"role": "user", "content": "Hello"}],
)

print(chat_completion.choices[0].message.content)
print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)

# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
copy

Available Gemini Models

DeepInfra provides access to Google's latest Gemini models, featuring advanced thinking capabilities, native multimodal processing, and industry-leading performance for complex reasoning and development tasks.

ModelContext$ per 1M input tokens$ per 1M output tokens
Actions
gemini-2.5-pro976k$0.875$7.00
gemini-2.5-flash976k$0.21$1.75
gemini-2.0-flash-001976k$0.10$0.40

FAQ

What is Gemini AI?

Gemini is a family of state-of-the-art thinking models developed by Google DeepMind, designed with native multimodal capabilities and advanced reasoning architecture. Built as thinking models, Gemini can reason through complex problems step-by-step before responding, resulting in enhanced accuracy and performance.

Available in multiple variants including Gemini 2.5 Pro for maximum reasoning capabilities, Gemini 2.5 Flash for optimal price-performance, and Gemini 2.0 Flash for next-generation features, Gemini models excel at complex coding, scientific reasoning, document analysis, and multimodal understanding across text, audio, video, and images with transparent reasoning processes and enterprise-grade reliability.

What tasks are Gemini models best suited for?

  • Advanced reasoning and problem-solving with transparent step-by-step thinking processes for complex logical tasks
  • Complex coding and software development with state-of-the-art performance on coding benchmarks and repository analysis
  • Multimodal content analysis processing text, audio, video, and images simultaneously with native understanding
  • Long-context document processing analyzing up to 1 million tokens including entire codebases, research papers, and datasets
  • Scientific research and mathematics with exceptional performance on STEM benchmarks and complex calculations
  • Agentic workflows and automation with sophisticated reasoning for multi-step task execution and decision-making
  • Enterprise document analysis extracting insights from legal contracts, medical records, and business reports
  • Video and audio understanding processing hours of multimedia content for comprehensive analysis and Q&A
  • Structured output generation with precise JSON formatting and function calling capabilities

Are the Gemini models on Deepinfra optimized for low latency?

Yes. DeepInfra's infrastructure delivers optimized performance for Gemini models with intelligent load balancing and efficient resource allocation. Gemini 2.5 Flash is specifically designed for low-latency, high-volume tasks while maintaining thinking capabilities. The models feature adjustable thinking budgets that automatically calibrate processing time based on query complexity—providing faster responses for simple requests and deeper reasoning for complex problems.

What makes Gemini's thinking capabilities unique?

Gemini's thinking capabilities represent a breakthrough in AI reasoning through several key innovations:

  • Transparent Step-by-Step Processing - Observe the model's reasoning process in real-time as it works through problems
  • Adaptive Thinking Budgets - Automatically adjust processing time based on query complexity or manually control for optimal cost-performance balance
  • Parallel Thinking Strategies - Explore multiple hypotheses simultaneously leading to more accurate outcomes
  • Native Multimodal Reasoning - Combine visual, audio, and text understanding in unified thinking processes
  • Deep Think Mode - Available in Gemini 2.5 Pro, uses cutting-edge reinforcement learning for the most complex problems
  • Enterprise Transparency - Provides traceable decision-making crucial for compliance and trust

This combination enables sophisticated problem-solving, strategic planning, and complex coding tasks while maintaining full visibility into the AI's reasoning process.

How do I integrate Gemini models into my application?

You can integrate Gemini models seamlessly using DeepInfra’s OpenAI-compatible API. Just replace your existing base URL with DeepInfra’s endpoint and use your DeepInfra API key—no infrastructure setup required. DeepInfra also supports integration through libraries like openai, litellm, and other SDKs, making it easy to switch or scale your workloads instantly.

What are the pricing details for using Gemini models on DeepInfra?

Pricing is usage-based:
  • Input Tokens: between $0.10 and $0.875 per million
  • Output Tokens: between $0.40 and $7.00 per million
Prices vary slightly by model. There are no upfront fees, and you only pay for what you use.

How do I get started using Gemini on DeepInfra?

Sign in with GitHub at deepinfra.com
  • Get your API key
  • Test models directly from the browser, cURL, or SDKs
  • Review pricing on your usage dashboard
Within minutes, you can deploy apps using Gemini models—without any infrastructure setup.