We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Introducing GPU Instances: On-Demand GPU Compute for AI Workloads

Published on 2025.06.09 by DeepInfra Team

We're excited to announce GPU Instances, a new feature that provides on-demand access to high-performance GPU compute resources in the cloud. With GPU Instances, you can quickly spin up containers with dedicated GPU access for machine learning training, inference, data processing, and other compute-intensive workloads.

What are GPU Instances?

GPU Instances allow you to launch containers with dedicated GPU resources when you need them. Each instance provides full SSH access to your container, giving you complete control over your environment while benefiting from our optimized GPU infrastructure.

The feature addresses a common challenge in AI development: accessing powerful GPU hardware without the overhead of managing physical infrastructure. Whether you're training a new model, running inference workloads, or experimenting with different configurations, GPU Instances provide the flexibility to scale your compute resources on demand.

Key Features

GPU Instances offer flexible configurations to match your performance and budget requirements. You can choose from our latest B200 GPU configurations, with options for single or multi-GPU setups depending on your workload needs.

The setup process is streamlined to get you started quickly. Simply select your desired GPU configuration, provide a container name and SSH key, and accept the licensing agreements. Your container will be ready in minutes with GPU access fully configured.

Security and access control are built into the platform. Each container is isolated and accessible only through SSH using your provided public key. The containers run Ubuntu with the ubuntu user account pre-configured for immediate use.

Getting Started

Creating a new GPU Instance is straightforward through our web interface. Navigate to the GPU Instances section in your dashboard and click "New Container" to begin. The interface guides you through selecting your GPU configuration, entering container details, and accepting the necessary license agreements.

For developers who prefer programmatic access, we also provide a comprehensive HTTP API. You can create, manage, and monitor your containers using standard REST endpoints, making it easy to integrate GPU Instances into your existing workflows and automation scripts.

Once your container is running, you'll receive an IP address for SSH access. Connect using your preferred SSH client and start working with your dedicated GPU resources immediately. The environment comes pre-configured with NVIDIA drivers and CUDA toolkit, so you can focus on your work rather than setup.

Use Cases

GPU Instances excel in scenarios requiring intensive computation. Machine learning practitioners use them for training models that would be impractical on local hardware. The ability to scale up to multi-GPU configurations means you can tackle larger datasets and more complex models efficiently.

Research teams benefit from the flexibility to experiment with different GPU configurations without long-term commitments. You can test how your workload performs on different hardware configurations and optimize your approach before committing to larger deployments.

Development teams use GPU Instances for prototyping AI applications and running inference workloads that require GPU acceleration. The pay-per-use model means you only pay for the compute time you actually need, making it cost-effective for both experimentation and production workloads.

Pricing and Availability

GPU Instances follow a simple pay-per-use pricing model. You're charged only for the time your containers are running, with no upfront costs or long-term commitments. Pricing varies by GPU configuration, allowing you to choose the option that best fits your performance requirements and budget.

Container management is designed to be intuitive. You can monitor your active instances, view connection details, and terminate containers when your work is complete. All data is stored within the container during its lifetime, and you're responsible for backing up any important results before termination.

GPU Instances represent our commitment to making powerful AI infrastructure accessible to developers and researchers. By removing the barriers to GPU access, we're enabling more teams to push the boundaries of what's possible with artificial intelligence.

Ready to get started? Visit your dashboard and create your first GPU Instance today. For detailed instructions and API documentation, check out our comprehensive GPU Instances documentation.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales Get Started

Latest Models

openchat/

openchat_3.5

Gryphe/

MythoMax-L2-13b

openai/

whisper-tiny

Phind/

Phind-CodeLlama-34B-v2

bigcode/

starcoder2-15b

Featured Models

microsoft/

Phi-4-multimodal-instruct

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-FP8

deepseek-ai/

DeepSeek-R1-0528

Qwen/

Qwen3-235B-A22B-Instruct-2507

google/

gemma-3-12b-it

Qwen/

Qwen3-30B-A3B

Company

Pricing

Docs

Compare

DeepStart

About

Careers

Trust Center

Privacy

Terms

Have questions or need a custom solution?

Contact Sales