STT API Tutorial

STT API Tutorial

This notebook provides a step-by-step guide on how to use the STT API for speech-to-text tasks. It includes instructions for installation, setting up the API key, uploading audio files, and using the API for transcription and translation.

1. Installation

Before you begin, ensure you have the necessary Python libraries installed. Run the following commands to install the required packages:

1pip install requests pandas pydub

1. Import Required Libraries

This section imports the necessary Python libraries for making HTTP requests, handling audio files, and managing data.

1import requests
2import pandas as pd
3from pydub import AudioSegment
4import io

2. Set Up the API Endpoint and Payload

To use the Saaras API, you need an API subscription key. Follow these steps to set up your API key:

  1. Obtain your API key: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
  2. Replace the placeholder key: In the code below, replace “YOUR_SARVAM_AI_API_KEY” with your actual API key.
1SARVAM_AI_API="YOUR_SARVAM_AI_API_KEY"

2.1 Setting Up the API Endpoint and Payload

This section defines the API endpoint and the payload for the translation request. Replace the placeholder values with your actual API key and desired parameters.

1# API endpoint for speech-to-text
2api_url = "https://api.sarvam.ai/speech-to-text"
3
4# Headers containing the API subscription key
5headers = {
6 "api-subscription-key": SARVAM_AI_API # Replace with your API key
7}
8
9# Data payload for the transcription request
10data = {
11 "language_code": "hi-IN", # Specify the language of the audio (e.g., 'hi-IN' for Hindi)
12 "model": "saarika:v2", # Specify the model to be used for transcription
13 "with_timestamps": False # Set to True if you want word-level timestamps
14}

3. Uploading Audio Files

To translate audio, you need to upload a .wav file. Follow these steps:

  1. Prepare your audio file: Ensure your audio file is in .wav format. If your file is in a different format, you can use tools like pydub to convert it.
  2. Upload the file: If you’re using Google Colab, you can upload the file using the file uploader:
1from google.colab import files
2
3uploaded = files.upload()
4audio_file_path = list(uploaded.keys())[0] # Get the name of the uploaded file

4. Define the split_audio Function

This function splits an audio file into smaller chunks of a specified duration. This is useful for processing long audio files that exceed the API’s input length limit.

1def split_audio(audio_path, chunk_duration_ms):
2 """
3 Splits an audio file into smaller chunks of specified duration.
4
5 Args:
6 audio_path (str): Path to the audio file to be split.
7 chunk_duration_ms (int): Duration of each chunk in milliseconds.
8
9 Returns:
10 list: A list of AudioSegment objects representing the audio chunks.
11 """
12 audio = AudioSegment.from_file(audio_path) # Load the audio file
13 chunks = []
14 if len(audio) > chunk_duration_ms:
15 # Split the audio into chunks of the specified duration
16 for i in range(0, len(audio), chunk_duration_ms):
17 chunks.append(audio[i:i + chunk_duration_ms])
18 else:
19 # If the audio is shorter than the chunk duration, use the entire audio
20 chunks.append(audio)
21 return chunks

5. Define the transcribe_audio_chunks Function

This function transcribes audio chunks using the Saaras API. It handles the API request for each chunk and collates the results.

1def transcribe_audio_chunks(audio_file_path, api_url, headers, data, chunk_duration_ms=5*60*1000):
2 """
3 Transcribes audio chunks using the Speech-to-Text API.
4
5 Args:
6 audio_file_path (str): Path to the audio file.
7 api_url (str): The API endpoint URL for Speech-to-Text.
8 headers (dict): Headers containing authentication information.
9 data (dict): Data payload for the transcription API.
10 chunk_duration_ms (int): Duration of each audio chunk in milliseconds.
11
12 Returns:
13 dict: Collated response containing the transcript.
14 """
15 # Split the audio into chunks
16 chunks = split_audio(audio_file_path, chunk_duration_ms)
17 responses = [] # List to store the transcription results
18
19 # Process each chunk
20 for idx, chunk in enumerate(chunks):
21 # Export the chunk to a BytesIO object (in-memory binary stream)
22 chunk_buffer = io.BytesIO()
23 chunk.export(chunk_buffer, format="wav")
24 chunk_buffer.seek(0) # Reset the pointer to the start of the stream
25
26 # Prepare the file for the API request
27 files = {'file': ('audiofile.wav', chunk_buffer, 'audio/wav')}
28
29 try:
30 # Make the POST request to the API
31 response = requests.post(api_url, headers=headers, files=files, data=data)
32 if response.status_code == 200 or response.status_code == 201:
33 print(f"Chunk {idx} POST Request Successful!")
34 response_data = response.json()
35 transcript = response_data.get("transcript", "")
36 responses.append({"transcript": transcript})
37 else:
38 # Handle failed requests
39 print(f"Chunk {idx} POST Request failed with status code: {response.status_code}")
40 print("Response:", response.text)
41 except Exception as e:
42 # Handle any exceptions during the request
43 print(f"Error processing chunk {idx}: {e}")
44 finally:
45 # Ensure the buffer is closed after processing
46 chunk_buffer.close()
47
48 # Collate the transcriptions from all chunks
49 collated_responses = {"collated_transcript": " ".join([i["transcript"] for i in responses])}
50 return collated_responses

6. Transcribe the Audio

This section calls the transcribe_audio_chunks function to transcribe the audio file. Replace audio_file_path with the path to your audio file.

1# Path to the audio file to be transcribed
2# audio_file_path = "test.wav" # Replace with your file path
3
4# Transcribe the audio
5transcriptions = transcribe_audio_chunks(audio_file_path, api_url, headers, data)
6
7# Display the transcription results
8transcriptions

6. Explanation of the Output

The output of the transcribe_audio_chunks function is a dictionary containing the collated transcript of the entire audio file. If the audio was split into multiple chunks, the transcripts from all chunks are combined into a single string.

Example output:

1{
2 "collated_transcript": "This is the transcribed text from the audio file."
3}

7. Conclusion

This tutorial demonstrated how to use the STT API for speech-to-text transcription. By following the steps, you can transcribe audio files, even long ones, by splitting them into smaller chunks. The process involves installing required libraries, setting up your API key, uploading audio, and transcribing it using the provided functions.


8. Additional Resources

For more details, refer to the our official documentation and we are always there to support and help you on our Discord Server:


9. Final Notes

  • Keep your API key secure.
  • Use clear audio for best results.
  • Explore advanced features like diarization and translation.

Keep Building! 🚀