This cookbook demonstrates a robust, production-ready call analytics pipeline using Sarvam’s SDK. It leverages the Sarvam’s Speech-to-Text Translate Batch API with diarization, parses speaker-wise transcripts, and uses Sarvam’s LLM for deep analysis. All outputs are saved in structured files for further review.
You can find sample audio files in the GitHub cookbook.
Before you begin, ensure you have the necessary Python libraries installed. Run the following command in your terminal or notebook:
To use the API, you need an API subscription key. Follow these steps to set up your API key:
Set up your imports and create an output directory for all generated files (transcripts, analysis, summaries, etc.):
This class encapsulates the full workflow: splitting audio, batch transcription with diarization, parsing, analysis, Q&A, and summary generation.
split_audio
Splits a long audio file into smaller chunks if its duration exceeds 1 hour, since the Batch API can process up to 1 hour per file.
4.1 Class definition and initialization
Creates a transcription job using Sarvam’s STT Batch API, waits for job completion, downloads and parses transcription output, and calls analysis on the parsed conversation.
Paste each method below into the CallAnalytics class (same indentation as init). Snippets are dedented so the docs code runner does not raise IndentationError.
For longer audio files, make sure to set the timeout parameter in upload_files to a sufficiently high value to allow the upload to complete successfully.
Reads downloaded JSON transcription files, extracts speaker-wise lines, writes a .txt file with clean conversation format, and calculates total speaking time per speaker.
timing.json: Tracks the total speaking time per speaker in seconds. Beneficial in identifying the dominant speaker and helps in monitoring Agent Talk-Time vs. Listening Time.Reads the conversation file and sends it to Sarvam LLM with a detailed analysis prompt to extract structured insights.
Answers a user-defined question based on the parsed conversation transcript and saves the answer to a file.
Generates a concise summary for each call analysis and saves it to a summary file for easy review.
This code block runs the full call analytics workflow: transcribes the audio, analyzes the conversation, answers a specific user question, and generates a concise summary.
This is the sample output of the analysis you will get if you upload the file Sample_product_refund.mp3.
For more details, refer to our official documentation and join our community for support:
outputs/ directory for easy review.Keep Building!