meta-llama/Llama-3.3-70B-Instruct-Turbo cover image
featured

meta-llama/Llama-3.3-70B-Instruct-Turbo

Llama 3.3-70B Turbo is a highly optimized version of the Llama 3.3-70B model, utilizing FP8 quantization to deliver significantly faster inference speeds with a minor trade-off in accuracy. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

Llama 3.3-70B Turbo is a highly optimized version of the Llama 3.3-70B model, utilizing FP8 quantization to deliver significantly faster inference speeds with a minor trade-off in accuracy. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

Public
$0.13/$0.40 in/out Mtoken
fp8
131,072
JSON
Function
ProjectLicense
demoapi

38ff4e01a70559264c95945aa04b900a11e68422

2024-12-06T18:20:14+00:00