๐ New model available: openai/gpt-oss-120b ๐
deepseek-ai/
DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.
ยฉ 2025 Deep Infra. All rights reserved.