Run the top AI models using a simple API, pay per use. Low cost, scalable and production ready infrastructure.
Sign up for Deep Infra account using github or Login using github
Choose among hundreds of the most popular ML models
Use a simple rest API to call your model.
Model is deployed in multiple regions
Close to the user
Fast network
Autoscaling
Share resources
Pay per use
Simple pricing
No ML Ops needed
Better cost efficiency
Hassle free ML infrastructure
No ML Ops needed
Better cost efficiency
Hassle free ML infrastructure
Fast scaling infrastructure
Maintain low latency
Scale down when not needed
Run costs
$0.70
/ 1M input tokensModel | Context | $ per 1M input tokens | $ per 1M output tokens |
---|---|---|---|
Llama-2-70b-chat | - | - | - |
CodeLlama-34b-Instruct | - | - | - |
Llama-2-13b-chat | - | - | - |
Llama-2-7b-chat | - | - | - |
Mistral-7B | - | - | - |
Airoboros-70b | - | - | - |
Lzlv-70b | - | - | - |
$0.0005
/secondbilled per millisecond of inference execution time
only pay for the inference time not idle time
1 hour free
All models run on A100 GPUs, optimized for inference performance and low latency.
Our system will automatically scale the model to more hardware based on your needs. To eliminate any cold starts you can also reserve GPU memory at $0.04 per GB / hour
Each inference request time is calculated with millisecond precision and added to your account. Once per month we charge you for the time you've used. You can find your current usage in your account.