We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Introducing GPU Instances: On-Demand GPU Compute for AI Workloads
Published on 2025.06.09 by DeepInfra Team
Introducing GPU Instances: On-Demand GPU Compute for AI Workloads

We're excited to announce GPU Instances, a new feature that provides on-demand access to high-performance GPU compute resources in the cloud. With GPU Instances, you can quickly spin up containers with dedicated GPU access for machine learning training, inference, data processing, and other compute-intensive workloads.

What are GPU Instances?

GPU Instances allow you to launch containers with dedicated GPU resources when you need them. Each instance provides full SSH access to your container, giving you complete control over your environment while benefiting from our optimized GPU infrastructure.

The feature addresses a common challenge in AI development: accessing powerful GPU hardware without the overhead of managing physical infrastructure. Whether you're training a new model, running inference workloads, or experimenting with different configurations, GPU Instances provide the flexibility to scale your compute resources on demand.

Key Features

GPU Instances offer flexible configurations to match your performance and budget requirements. You can choose from our latest B200 GPU configurations, with options for single or multi-GPU setups depending on your workload needs.

The setup process is streamlined to get you started quickly. Simply select your desired GPU configuration, provide a container name and SSH key, and accept the licensing agreements. Your container will be ready in minutes with GPU access fully configured.

Security and access control are built into the platform. Each container is isolated and accessible only through SSH using your provided public key. The containers run Ubuntu with the ubuntu user account pre-configured for immediate use.

Getting Started

Creating a new GPU Instance is straightforward through our web interface. Navigate to the GPU Instances section in your dashboard and click "New Container" to begin. The interface guides you through selecting your GPU configuration, entering container details, and accepting the necessary license agreements.

For developers who prefer programmatic access, we also provide a comprehensive HTTP API. You can create, manage, and monitor your containers using standard REST endpoints, making it easy to integrate GPU Instances into your existing workflows and automation scripts.

Once your container is running, you'll receive an IP address for SSH access. Connect using your preferred SSH client and start working with your dedicated GPU resources immediately. The environment comes pre-configured with NVIDIA drivers and CUDA toolkit, so you can focus on your work rather than setup.

Use Cases

GPU Instances excel in scenarios requiring intensive computation. Machine learning practitioners use them for training models that would be impractical on local hardware. The ability to scale up to multi-GPU configurations means you can tackle larger datasets and more complex models efficiently.

Research teams benefit from the flexibility to experiment with different GPU configurations without long-term commitments. You can test how your workload performs on different hardware configurations and optimize your approach before committing to larger deployments.

Development teams use GPU Instances for prototyping AI applications and running inference workloads that require GPU acceleration. The pay-per-use model means you only pay for the compute time you actually need, making it cost-effective for both experimentation and production workloads.

Pricing and Availability

GPU Instances follow a simple pay-per-use pricing model. You're charged only for the time your containers are running, with no upfront costs or long-term commitments. Pricing varies by GPU configuration, allowing you to choose the option that best fits your performance requirements and budget.

Container management is designed to be intuitive. You can monitor your active instances, view connection details, and terminate containers when your work is complete. All data is stored within the container during its lifetime, and you're responsible for backing up any important results before termination.

GPU Instances represent our commitment to making powerful AI infrastructure accessible to developers and researchers. By removing the barriers to GPU access, we're enabling more teams to push the boundaries of what's possible with artificial intelligence.

Ready to get started? Visit your dashboard and create your first GPU Instance today. For detailed instructions and API documentation, check out our comprehensive GPU Instances documentation.

Related articles
Guaranteed JSON output on Open-Source LLMs.Guaranteed JSON output on Open-Source LLMs.DeepInfra is proud to announce that we have released "JSON mode" across all of our text language models. It is available through the "response_format" object, which currently supports only {"type": "json_object"} Our JSON mode will guarantee that all tokens returned in the output of a langua...
Search That Actually Works: A Guide to LLM RerankersSearch That Actually Works: A Guide to LLM RerankersSearch relevance isn’t a nice-to-have feature for your site or app. It can make or break the entire user experience. When a customer searches "best laptop for video editing" and gets results for gaming laptops or budget models, they leave empty-handed. Embeddings help you find similar content, bu...
A Milestone on Our Journey Building Deep Infra and Scaling Open Source AI InfrastructureA Milestone on Our Journey Building Deep Infra and Scaling Open Source AI InfrastructureToday we're excited to share that Deep Infra has raised $18 million in Series A funding, led by Felicis and our earliest believer and advisor Georges Harik.