Documentation

Rate Limits

200 concurrent requests


If you need more just let us know and we can raise it. Send email to feedback@deepinfra.com (include your user id and reason)

Rate limits are applied per model. When querying multiple models (2 or more), separate rate limits are enforced for each model.

Purpose of rate limits

Rate limits are established protocols designed to prevent abuse or misuse of the API. They ensure fair and consistent access to the API for all users while maintaining reliable performance.

How do you check for rate limits?

You will be getting the HTTP 429 response status code with Rate limited message.

Actions to take:

  • retry in a bit
  • or slow down your requests
  • or apply for increase by contacting us

Note: sometimes you might get 429 errors when the model gets too busy. Typically, the auto-scaling logic will kick in. So if you retry in just a bit, it should get resolved.