$0.70
/ 1M input tokensModel | Context | $ per 1M input tokens | $ per 1M output tokens |
---|---|---|---|
Llama-2-70b-chat | - | - | - |
CodeLlama-34b-Instruct | - | - | - |
Llama-2-13b-chat | - | - | - |
Llama-2-7b-chat | - | - | - |
Mistral-7B | - | - | - |
Airoboros-70b | - | - | - |
Lzlv-70b | - | - | - |
$0.0005
/secondbilled per millisecond of inference execution time
only pay for the inference time not idle time
1 hour free
All models run on A100 GPUs, optimized for inference performance and low latency.
Our system will automatically scale the model to more hardware based on your needs. To eliminate any cold starts you can also reserve GPU memory at $0.04 per GB / hour
Each inference request time is calculated with millisecond precision and added to your account. Once per month we charge you for the time you've used. You can find your current usage in your account.