We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

GLM-5.1 - state-of-the-art agentic engineering, now available on DeepInfra!

Getting Started
Published on 2023.03.02 by Nikola Borisov
Getting Started

Getting an API Key

To use DeepInfra's services, you'll need an API key. You can get one by signing up on our platform.

  1. Sign up or log in to your DeepInfra account at deepinfra.com
  2. Navigate to the Dashboard and select API Keys
  3. Create a new API key and save it securely

Your API key will be used to authenticate all your requests to the DeepInfra API.

Deployment

Now lets actually deploy some models to production and use them for inference. It is really easy.

You can deploy models through the web dashboard or by using our API. Models are automatically deployed when you first make an inference request.

Inference

Once a model is deployed on DeepInfra, you can use it with our REST API. Here's how to use it with curl:

curl -X POST \
  -F "audio=@/path/to/audio.mp3" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  'https://api.deepinfra.com/v1/inference/openai/whisper-small'
copy
Related articles
Step 3.5 Flash API Benchmarks: Latency, Throughput & CostStep 3.5 Flash API Benchmarks: Latency, Throughput & Cost<p>About Step 3.5 Flash Step 3.5 Flash is an open-weights reasoning model released in February 2026 by StepFun. It leverages a sparse Mixture of Experts (MoE) architecture with 196 billion total parameters and only 11 billion active parameters per token during inference — delivering state-of-the-art performance at a fraction of the cost of dense models. [&hellip;]</p>
Qwen3.5 0.8B API Benchmarks: Latency, Throughput & CostQwen3.5 0.8B API Benchmarks: Latency, Throughput & Cost<p>About Qwen3.5 0.8B (Reasoning) Qwen3.5 0.8B is part of Alibaba Cloud&#8217;s Qwen3.5 Small Model Series, released on March 2, 2026. Designed under the philosophy of &#8220;More Intelligence, Less Compute,&#8221; it targets edge devices, mobile phones, and low-latency applications where battery life and memory constraints are critical. It employs an Efficient Hybrid Architecture combining Gated Delta [&hellip;]</p>
Best API for Kimi K2.5: Why DeepInfra Leads in Speed, TTFT, and ScalabilityBest API for Kimi K2.5: Why DeepInfra Leads in Speed, TTFT, and Scalability<p>Kimi K2.5 is positioned as Moonshot AI’s “do-it-all” model for modern product workflows: native multimodality (text + vision/video), Instant vs. Thinking modes, and support for agentic / multi-agent (“swarm”) execution patterns. In real applications, though, model capability is only half the story. The provider’s inference stack determines the things your users actually feel: time-to-first-token (TTFT), [&hellip;]</p>