Documentation
Contents
Simple, scalable and cost-effective inference API is the main feature of Deep Infra. We package state-of-the-art models into a simple rest API that you can use to build your applications.
Each model has a dedicated inference endpoint.
https://api.deepinfra.com/v1/inference/{model_name}'
for example for the model runwayml/stable-diffusion-v1-5
the endpoint is
https://api.deepinfra.com/v1/inference/runwayml/stable-diffusion-v1-5'
Our inference API supports POST
requests with 2 possible content types:
multipart/form-data
and application/json
.
Using multipart/form-data
makes sense when you want to send binary data
such as images, audio or video files. Using this content type requires
less bandwidth and is more efficient for large files.
Using application/json
makes sense when you want to send text data.
You can also use this content type for binary data, using data urls.
For example:
{
"image": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBD..."
}
To authenticate your requests, you need to pass your API token in the
Authorization
header with type bearer
.
Like this:
Authorization: bearer $AUTH_TOKEN
You can get your API token using deepctl auth token
command, or by creating
a new API Key in the dashboard.
We use standard HTTP status codes to indicate the status of the request. Here is the list of possible status codes:
200
- OK. The request was successful.4xx
- Bad Request. The request was invalid or cannot be served.5xx
- Internal Server Error. Something went wrong on our side.The response body is always a JSON object containing the model output.
In addition to the model output, the response body contains some metadata
about the inference request, like request_id
, cost
and duration
.
Example response:
{
"request_id": "RfMWDr1NXCd7cnaegcm3A8q0",
"inference_status": {
"cost": 0.004639499820768833,
"runtime_ms": 1285,
"status": "succeeded"
},
"text": "Hello World"
}