We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!

Introducing GPU Instances: On-Demand GPU Compute for AI Workloads
Published on 2025.06.09 by DeepInfra Team
Introducing GPU Instances: On-Demand GPU Compute for AI Workloads

We're excited to announce GPU Instances, a new feature that provides on-demand access to high-performance GPU compute resources in the cloud. With GPU Instances, you can quickly spin up containers with dedicated GPU access for machine learning training, inference, data processing, and other compute-intensive workloads.

What are GPU Instances?

GPU Instances allow you to launch containers with dedicated GPU resources when you need them. Each instance provides full SSH access to your container, giving you complete control over your environment while benefiting from our optimized GPU infrastructure.

The feature addresses a common challenge in AI development: accessing powerful GPU hardware without the overhead of managing physical infrastructure. Whether you're training a new model, running inference workloads, or experimenting with different configurations, GPU Instances provide the flexibility to scale your compute resources on demand.

Key Features

GPU Instances offer flexible configurations to match your performance and budget requirements. You can choose from our latest B200 GPU configurations, with options for single or multi-GPU setups depending on your workload needs.

The setup process is streamlined to get you started quickly. Simply select your desired GPU configuration, provide a container name and SSH key, and accept the licensing agreements. Your container will be ready in minutes with GPU access fully configured.

Security and access control are built into the platform. Each container is isolated and accessible only through SSH using your provided public key. The containers run Ubuntu with the ubuntu user account pre-configured for immediate use.

Getting Started

Creating a new GPU Instance is straightforward through our web interface. Navigate to the GPU Instances section in your dashboard and click "New Container" to begin. The interface guides you through selecting your GPU configuration, entering container details, and accepting the necessary license agreements.

For developers who prefer programmatic access, we also provide a comprehensive HTTP API. You can create, manage, and monitor your containers using standard REST endpoints, making it easy to integrate GPU Instances into your existing workflows and automation scripts.

Once your container is running, you'll receive an IP address for SSH access. Connect using your preferred SSH client and start working with your dedicated GPU resources immediately. The environment comes pre-configured with NVIDIA drivers and CUDA toolkit, so you can focus on your work rather than setup.

Use Cases

GPU Instances excel in scenarios requiring intensive computation. Machine learning practitioners use them for training models that would be impractical on local hardware. The ability to scale up to multi-GPU configurations means you can tackle larger datasets and more complex models efficiently.

Research teams benefit from the flexibility to experiment with different GPU configurations without long-term commitments. You can test how your workload performs on different hardware configurations and optimize your approach before committing to larger deployments.

Development teams use GPU Instances for prototyping AI applications and running inference workloads that require GPU acceleration. The pay-per-use model means you only pay for the compute time you actually need, making it cost-effective for both experimentation and production workloads.

Pricing and Availability

GPU Instances follow a simple pay-per-use pricing model. You're charged only for the time your containers are running, with no upfront costs or long-term commitments. Pricing varies by GPU configuration, allowing you to choose the option that best fits your performance requirements and budget.

Container management is designed to be intuitive. You can monitor your active instances, view connection details, and terminate containers when your work is complete. All data is stored within the container during its lifetime, and you're responsible for backing up any important results before termination.

GPU Instances represent our commitment to making powerful AI infrastructure accessible to developers and researchers. By removing the barriers to GPU access, we're enabling more teams to push the boundaries of what's possible with artificial intelligence.

Ready to get started? Visit your dashboard and create your first GPU Instance today. For detailed instructions and API documentation, check out our comprehensive GPU Instances documentation.

Related articles
How to deploy google/flan-ul2 - simple. (open source ChatGPT alternative)How to deploy google/flan-ul2 - simple. (open source ChatGPT alternative)Flan-UL2 is probably the best open source model available right now for chatbots. In this post we will show you how to get started with it very easily. Flan-UL2 is large - 20B parameters. It is fine tuned version of the UL2 model using Flan dataset. Because this is quite a large model it is not eas...
NVIDIA Nemotron API Pricing Guide 2026NVIDIA Nemotron API Pricing Guide 2026<p>While everyone knows Llama 3 and Qwen, a quieter revolution has been happening in NVIDIA&#8217;s labs. They have been taking standard Llama models and &#8220;supercharging&#8221; them using advanced alignment techniques and pruning methods. The result is Nemotron—a family of models that frequently tops the &#8220;Helpfulness&#8221; leaderboards (like Arena Hard), often beating GPT-4o while being significantly [&hellip;]</p>
Power the Next Era of Image Generation with FLUX.2 Visual Intelligence on DeepInfraPower the Next Era of Image Generation with FLUX.2 Visual Intelligence on DeepInfraDeepInfra is excited to support FLUX.2 from day zero, bringing the newest visual intelligence model from Black Forest Labs to our platform at launch. We make it straightforward for developers, creators, and enterprises to run the model with high performance, transparent pricing, and an API designed for productivity.