Meta Prompt Guide
What is a Meta-Prompt?
A meta-prompt is a detailed instruction or template given to an AI, telling it how to think, act, or respond in specific scenarios. It sets the rules and context for the AI to follow consistently throughout the conversation.
Why Use a Meta-Prompt?
- Consistency
Ensures the AI understands your goals and behaves in a reliable way. - Clarity
Provides clear instructions, making the AI’s responses more accurate and relevant. - Efficiency
Saves time by reducing the need to explain the same context repeatedly. - Customization
Adjusts the AI to fit specific tasks or workflows based on your needs.
Meta-prompts are especially helpful when working on complex tasks or integrating APIs, as they align the AI’s responses with your requirements.
Meta-Prompt Usage Guide
Follow these steps to effectively use the meta-prompt with your favorite AI assistant (e.g., ChatGPT, Gemini, or similar tools):
Step 1: Load the Meta-Prompt
Copy and paste the meta-prompt into the AI assistant’s input field. This will provide the AI with the necessary context to help you use Sarvam’s APIs effectively.
Step 2: Provide Context for Your Use Case
In the next message, let the AI know that the meta-prompt should be taken as context for assisting you in building projects with Sarvam’s APIs. You can use the following example:
“Hey, you have to take the above meta-prompt as your context and help me build things using Sarvam’s API. I will provide the details in further prompts.”
Step 3: Share Your Specific Requirement
In subsequent messages, provide the specific details of the project or task you want to build. For example, if you want to create a translator app, you can say:
“I want to build a translator app that can translate English to Kannada. Please help me implement this using Sarvam’s API.”
With this implementation, you can input any English text, and the app will return the translated text in Kannada. Ensure you replace "your-api-key" with your actual Sarvam API subscription key.
By following this guide, you can seamlessly use the meta-prompt to leverage Sarvam’s APIs for building various projects.
Sarvam AI Meta Prompt
Note:- This meta-prompt is designed for Large Language Models (LLMs) like ChatGPT or Gemini, not for human users. It provides context and instructions to guide the AI in assisting with tasks using Sarvam’s API effectively.
-
Supported Languages:
- English (
en-IN), Hindi (hi-IN), Bengali (bn-IN), Kannada (kn-IN), Malayalam (ml-IN), Marathi (mr-IN), Odia (od-IN), Punjabi (pa-IN), Tamil (ta-IN), Telugu (te-IN), Gujarati (gu-IN).
- English (
-
Speech to Text:
- Module:
client.speech_to_text.transcribe() - Purpose: Convert speech (audio file) into text in the specified language.
- Behavior: Converting spoken language from an audio file to written text in languages like Hindi and others.
- Method: SDK function call
- Authorization: Uses API key stored in
SARVAM_API_KEYenvironment variable via SarvamAI SDK.
- Module:
-
Request Body Schema:
-
language_code: Specifies the language of the speech input (e.g.,"hi-IN"for Hindi). -
model: Specifies the model version for speech-to-text conversion (e.g.,"saarika:v2.5"). -
with_timestamps: Boolean flag indicating if timestamps should be included in the output (TrueorFalse). -
file: The audio file to transcribe. Supported formats:.wav.mp3- Works best at 16kHz.
- Multiple channels will be merged.
-
-
Example Code:
-
Supported File Formats:
.wav(recommended at 16kHz).mp3(recommended at 16kHz)- The API will merge multiple audio channels.
-Example Response: The response will contain the converted text in JSON format, typically without timestamps unless specified.
-
Speech to Text Translate API:
- Module:
client.speech.to_text_translate() - Purpose: Combine speech recognition and translation to detect the spoken language and returns the transcript and the BCP-47 code of the most predominant language.
- Best for: Detecting the language from spoken input and returning the transcript in English and the corresponding BCP-47 language code.
- Method: SDK function call
- Authorization: Uses API key stored in
SARVAM_API_KEYenvironment variable via SarvamAI SDK.
- Module:
-
Request Body Schema:
file: The path to the speech input (audio file) in which the language needs to be detected.model: Specifies the model version for speech-to-text and translation (e.g., “saaras:v2.5”).
- Example Code:
-
Response:
- The API will return the BCP-47 language code of the language spoken in the input (e.g.,
hi-INfor Hindi,en-USfor English). - transcript: The transcribed and translated text in English.
- If multiple languages are detected, it will return the code for the most predominant language.
- If no language is detected, the response will be
null.
- The API will return the BCP-47 language code of the language spoken in the input (e.g.,
-
Example Response:
- Supported Language Codes:
The language codes returned will follow the BCP-47 standard for various Indic and English languages, such as:
hi-IN,en-US,pa-IN,ta-IN, etc.
-
Text to Speech API:
- Purpose: Convert written text into spoken words using a specified voice and various customization options.
- Best for: Generating speech from text with configurable attributes like pitch, pace, loudness, and more, ideal for creating custom audio outputs in multiple languages.
- Method: SDK-based function call
- Authorization: API key required via
SARVAM_API_KEYenvironment variable.
-
Parameters:
inputs: List of strings to be converted to speech.target_language_code: The language code for the output language (e.g.,"hi-IN").speaker: Specifies the voice to use. Options:"anushka","manisha","vidya","arya","abhilash","karun","hitesh".pitch: Number controlling pitch (-0.75to0.75). Default:0.pace: Number controlling speed (0.3to3). Default:1.0.loudness: Number controlling volume (0to3). Default:1.0.speech_sample_rate: Audio sample rate (8000,16000,22050). Higher values offer better quality.enable_preprocessing: Boolean to enable preprocessing of English and numeric entities. Default:False.model: default is:"bulbul:v2".
- Example Code:
-
Response:
The function returns the synthesized audio content (e.g., in WAV format). You can write it directly to a file as shown above.
-
Call Analytics API:
- Endpoint:
https://api.sarvam.ai/call-analytics - Purpose: Analyze audio call recordings to extract transcripts and automatically answer custom questions based on the conversation content.
- Best for: Call transcription, question answering, and extracting hotwords from speech.
- Method: POST
- Authorization: API key required via
SARVAM_API_KEYenvironment variable.
- Endpoint:
-
Parameters:
file_path(required): Path to the audio file (.wav,.mp3). Max 10MB, max 10 mins, optimal at 16kHz.questions(required): List of structured questions in JSON format.- Each question must include:
id: Unique identifiertext: The question texttype: One ofboolean,enum,short answer,long answer,number- Optional:
description,properties(e.g., options forenum)
- Each question must include:
hotwords(optional): List of comma-separated domain-specific keywords to highlight/retain in the transcript.
-
Example Code:
-
Response:
Returns a JSON with:
file_name(optional): Unique ID for the audio file processed.transcript(required): Full transcript of the call.answers(optional): List of answers mapped to thequestions. Can benullif no answers were found.duration_in_seconds: Total length of the call audio.
-
Text Analytics API:
- Endpoint:
https://api.sarvam.ai/text-analytics - Purpose: Analyze a given piece of text and automatically answer structured questions based on its content.
- Best for: Extracting insights or summarizing knowledge from documents, articles, reports, or transcripts using predefined questions.
- Method: POST
- Authorization: API key via
SARVAM_API_KEYenvironment variable.
- Endpoint:
-
Parameters:
text(required): A string of text to analyze. Must be a valid, non-empty paragraph or document.questions(required): A list of structured questions. Each question should follow this format:
- type options:
boolean,enum(requiresoptionsinsideproperties),short answer,long answer,number
-
Example Code:
-
Response:
The response will be a JSON object containing:
answers: A list of extracted answers matched to the input questionid. If no answers are found, it may returnnull.
-
Chat Completion API:
- Purpose: Generate conversational or instructional responses using Sarvam’s chat model.
- Best for: Building interactive AI assistants, chatbots, or instruction-following agents that understand and respond in natural language.
- Method: SDK function call.
- Authorization: API key required via
SARVAM_API_KEYenvironment variable.
-
Parameters:
messages(required): A list of message objects that form the conversation. Each message must include:role: One of"system","user", or"assistant".content: The message text.
model(optional, default: “sarvam m”): Name of the chat model to use.temperature(optional): Controls randomness of output. Default is0.7.top_p(optional): Controls nucleus sampling. Default is1.0.max_tokens(optional): Limits the length of the generated response.stream(optional): Boolean. Set toTrueto receive partial responses in real-time.
- Example Code:
-
Response:
-
Returns a JSON object with:
-
choices: List of generated message completions. Each contains:message: Includes the assistant’s response in"content"and the"role"set to"assistant".finish_reason: Indicates why the response was completed (e.g.,"stop").
-
usage(optional): Includes token usage stats (prompt_tokens,completion_tokens,total_tokens).
-
-
-
Integration Guidelines
- Ensure secure SDK communication by using environment variables like
SARVAM_API_KEYto avoid exposing keys in code. - Handle rate limiting errors using SDK response codes or retry logic to maintain stability.
- Always validate input data (e.g., text, language codes, file formats) before invoking SDK methods.
- Wrap all SDK calls in
try-exceptblocks to gracefully handle errors and debug effectively. - Set up authentication using environment variables or secrets managers to securely pass API keys.
- Implement logging around SDK usage (input, output, exceptions) for monitoring and diagnostics.
- Ensure SDK supports versioning, and use specific versions of models (e.g., “bulbul:v2”, “sarvam m”) in your calls to maintain backward compatibility.
- Follow consistent parameter naming when creating payloads for SDK functions to improve maintainability.
- Cache static responses (like language code lookups) to reduce redundant SDK calls and improve performance.
- Regularly audit SDK integration and logs to detect latency, usage anomalies, or potential security flaws.
- Ensure secure SDK communication by using environment variables like
-
Tips for Responding to User Requests
- Analyze the task to determine which SDK modules (e.g.,
speech_to_text,chat_completion) are needed. - If multiple SDK components are required, outline the purpose of each:
speech_to_text: Converts audio to transcript.text_analytics: Extracts answers from text.
- For each API module, define separate SDK wrapper functions:
- Keep code modular and reusable.
- Handle input validation and output parsing within each function.
- Correctly parse SDK responses:
- For Speech-to-Text:
transcript = response["transcript"]
- For Text Analytics:
answers = response["answers"]
- For Speech-to-Text:
- Write a main script that:
- Accepts or loads user input.
- Invokes the SDK function(s) as needed.
- Saves, displays, or returns the output.
- Analyze the task to determine which SDK modules (e.g.,
We hope this guide helps you get started with Sarvam’s API! If you encounter any issues or have questions along the way, feel free to reach out to us on our Discord. Our community is ready to assist you in any way possible!