Speech-to-Text Translation API Using Saaras Tutorial
Speech-to-Text Translation API Using Saaras Tutorial
This notebook provides a step-by-step guide on how to use the STT-Translate API for translating audio files into text using Saaras. It includes instructions for installation, setting up the API key, uploading audio files, and translating audio using the API.
0. Installation
Before you begin, ensure you have the necessary Python libraries installed. Run the following commands to install the required packages:
1. Import Required Libraries
This section imports the necessary Python libraries for making HTTP requests, handling audio files, and managing data.
- requests: For making HTTP requests to the API.
- pandas: For data manipulation (optional, depending on your use case).
2. Set Up the API Endpoint and Payload
To use the Saaras API, you need an API subscription key. Follow these steps to set up your API key:
- Obtain your API key: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
- Replace the placeholder key: In the code below, replace “YOUR_SARVAM_AI_API_KEY” with your actual API key.
2.1 Setting Up the API Endpoint and Payload
This section defines the API endpoint and the payload for the translation request. Replace the placeholder values with your actual API key and desired parameters.
3. Uploading Audio Files
To translate audio, you need to upload a .wav
file. Follow these steps:
- Prepare your audio file: Ensure your audio file is in
.wav
format. If your file is in a different format, you can use tools likepydub
to convert it. - Upload the file: If you’re using Google Colab, you can upload the file using the file uploader:
If you’re working locally, ensure the file is in the same directory as your notebook and specify the file name:
4. Speech-to-Text Translation API
This section demonstrates how to use the STT-Translate API for translating audio files into text using Saaras. The API automatically identifies the language of the audio and supports long audio files by splitting them into chunks.
4.1. Splitting Audio into Chunks
The split_audio
function splits an audio file into smaller chunks of a specified duration. This is useful for processing long audio files that exceed the API’s input length limit.
4.2. Translating Audio
The translate_audio
function translates audio chunks using the Saaras API. It handles the API request for each chunk and collates the results.
4.3 Translating the Audio
This section calls the translate_audio
function to translate the audio file. Replace audio_file_path
with the path to your audio file.
Example output:
5. Conclusion
This tutorial demonstrated how to use the Saaras API for translating audio files into text. By following the steps, you can easily translate audio, even long files, by splitting them into smaller chunks. The process involves installing required libraries, setting up your API key, uploading audio, and translating it using the provided functions.
6. Additional Resources
For more details, refer to the official Saaras API documentation and join the community for support:
- Documentation: docs.sarvam.ai
- Community: Join the Discord Community
7. Final Notes
- Keep your API key secure.
- Use clear audio for best results.
- Explore advanced features like diarization and word-level timestamps.
Keep Building! 🚀