STT API Tutorial
STT API Tutorial
This notebook provides a step-by-step guide on how to use the STT API for speech-to-text tasks. It includes instructions for installation, setting up the API key, uploading audio files, and using the API for transcription and translation.
1. Installation
Before you begin, ensure you have the necessary Python libraries installed. Run the following commands to install the required packages:
1. Import Required Libraries
This section imports the necessary Python libraries for making HTTP requests, handling audio files, and managing data.
2. Set Up the API Endpoint and Payload
To use the Saaras API, you need an API subscription key. Follow these steps to set up your API key:
- Obtain your API key: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
- Replace the placeholder key: In the code below, replace “YOUR_SARVAM_AI_API_KEY” with your actual API key.
2.1 Setting Up the API Endpoint and Payload
This section defines the API endpoint and the payload for the translation request. Replace the placeholder values with your actual API key and desired parameters.
3. Uploading Audio Files
To translate audio, you need to upload a .wav
file. Follow these steps:
- Prepare your audio file: Ensure your audio file is in
.wav
format. If your file is in a different format, you can use tools likepydub
to convert it. - Upload the file: If you’re using Google Colab, you can upload the file using the file uploader:
4. Define the split_audio
Function
This function splits an audio file into smaller chunks of a specified duration. This is useful for processing long audio files that exceed the API’s input length limit.
5. Define the transcribe_audio_chunks
Function
This function transcribes audio chunks using the Saaras API. It handles the API request for each chunk and collates the results.
6. Transcribe the Audio
This section calls the transcribe_audio_chunks
function to transcribe the audio file. Replace audio_file_path
with the path to your audio file.
6. Explanation of the Output
The output of the transcribe_audio_chunks
function is a dictionary containing the collated transcript of the entire audio file. If the audio was split into multiple chunks, the transcripts from all chunks are combined into a single string.
Example output:
7. Conclusion
This tutorial demonstrated how to use the STT API for speech-to-text transcription. By following the steps, you can transcribe audio files, even long ones, by splitting them into smaller chunks. The process involves installing required libraries, setting up your API key, uploading audio, and transcribing it using the provided functions.
8. Additional Resources
For more details, refer to the our official documentation and we are always there to support and help you on our Discord Server:
- Documentation: docs.sarvam.ai
- Community: Join the Discord Community
9. Final Notes
- Keep your API key secure.
- Use clear audio for best results.
- Explore advanced features like diarization and translation.
Keep Building! 🚀