LoRA model name: model name used to reference the deployment
Hugging Face Model Name: Hugging Face model name
Hugging Face Token: (optional) Hugging Face token if the LoRA adapter model is private
Click on the 'Upload' button
Note: The list of supported base models is listed on the same page. If you need a base model that is not listed, please contact us at feedback@deepinfra.com
Rate limits on LoRA adapter model
Rate limit will apply on combined traffic of all LoRA adapter models with the same base model. For example, if you have 2 LoRA adapter models with the same base model, and have rate limit of 200. Those 2 LoRA adapter models combined will have rate limit of 200.
Pricing on LoRA adapter model
Pricing is 50% higher than base model.
How is LoRA adapter model speed compared to base model speed?
LoRA adapter model speed is lower than base model, because there is additional compute and memory overhead to apply the LoRA adapter. From our benchmarks, the LoRA adapter model speed is about 50-60% slower than base model.
How to make LoRA adapter model faster?
You could merge the LoRA adapter with the base model to reduce the overhead. And use custom deployment, the speed will be close to the base model.