Hive
About Hive’s Speech-to-Text Model
About Hive’s Speech-to-Text Model
Hive's Speech-to-Text Model ingests an audio stream and returns each word that was spoken, along with a confidence score and timestamp for that wo
We additionally return a fully punctuated transcript of the entire text. If you wish to use multiple languages, we also offer automatic language detection where you can pass in any audio clip and we'll identify/transcribe to the correct language automatically.
To learn about our moderation solutions, please see the Audio Moderation page.
Hive's Speech-to-Text Model ingests an audio stream and returns each word that was spoken, along with a confidence score and timestamp for that wo
We additionally return a fully punctuated transcript of the entire text. If you wish to use multiple languages, we also offer automatic language detection where you can pass in any audio clip and we'll identify/transcribe to the correct language automatically.
To learn about our moderation solutions, please see the Audio Moderation page.

Comprehensive coverage for diverse use cases
Comprehensive coverage for diverse use cases
Our deep learning model accurately detects and transcribes speech in several widely spoken languages.
Input : Audio, Video (mp4, webm, avi, flv, mkv, wmv, mov)
Response : Language classification, Punctuated transcript, Confidence scores and timestamps for each word
Language Support
Language Support
English
Spanish
Portuguese
French
Hindi
German
Arabic
Japanese
Simple usage based pricing so you only pay for what you use
Simple usage based pricing so you only pay for what you use
Speech-to-Text Model Pricing Details
Speech-to-Text Model Pricing Details
Model
Pricing
Unit
Speech to Text
$0.02
$0.02
Minute