Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labeled data and demonstrates strong abilities to generalize to various datasets and domains without fine-tuning. The model is based on a Transformer encoder-decoder architecture.
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labeled data and demonstrates strong abilities to generalize to various datasets and domains without fine-tuning. The model is based on a Transformer encoder-decoder architecture.