Simple, scalable and cost-effective inference API is the main feature of Deep Infra. We package state-of-the-art models into a simple rest API that you can use to build your applications.

Inference Endpoints

Each model has a dedicated inference endpoint.{model_name}'

for example for the model runwayml/stable-diffusion-v1-5 the endpoint is'

POST request

Our inference API supports POST requests with 2 possible content types: multipart/form-data and application/json.


Using multipart/form-data makes sense when you want to send binary data such as images, audio or video files. Using this content type requires less bandwidth and is more efficient for large files.


Using application/json makes sense when you want to send text data. You can also use this content type for binary data, using data urls. For example:

  "image": "..."


To authenticate your requests, you need to pass your API token in the Authorization header with type bearer. Like this:

Authorization: bearer $AUTH_TOKEN

You can get your API token using deepctl auth token command, or by creating a new API Key in the dashboard.


HTTP Status Codes

We use standard HTTP status codes to indicate the status of the request. Here is the list of possible status codes:

  • 200 - OK. The request was successful.
  • 4xx - Bad Request. The request was invalid or cannot be served.
  • 5xx - Internal Server Error. Something went wrong on our side.

Response Body

The response body is always a JSON object containing the model output. In addition to the model output, the response body contains some metadata about the inference request, like request_id, cost and duration.

Example response:

  "request_id": "RfMWDr1NXCd7cnaegcm3A8q0",
  "inference_status": {
    "cost": 0.004639499820768833,
    "runtime_ms": 1285,
    "status": "succeeded"
  "text": "Hello World"