Build workflows with Sarvam AI in n8n
Video walkthrough
Watch this first for a full click-through: installing the n8n-nodes-sarvam community node, creating Sarvam AI API credentials in n8n, and wiring a TTS → STT → Chat sample chain.
Then continue with Overview and Quick start for prerequisites, install commands, and importable JSON.
Overview
This guide shows how to use n8n as your automation layer and Sarvam AI for speech-to-text, text-to-speech, and chat completions across Indian languages — without maintaining a custom backend. It follows the same flow as our LiveKit and Pipecat integration pages: what you will build, a quick path to a first success, then customization, reference tables, best practices, and troubleshooting.
If you already use n8n for CRM, support, or internal tools, you can add Sarvam nodes wherever you need audio or Indic-language text intelligence.
What you’ll build
You will be able to:
- Transcribe uploaded or downloaded audio (23 Indian languages plus English, with several transcript modes).
- Synthesize speech from text (11 languages, many speakers, codecs, and pace).
- Run Sarvam chat models on transcripts or any text, then chain TTS or downstream systems.
- Compose full flows such as form → transcript, webhook → STT → chat → TTS, or sheet → batch transcribe — all in the node editor.
Quick overview
- Watch the video walkthrough at the top of this page — community node install, credentials, and a TTS → STT → Chat demo.
- Create a Sarvam AI account and copy an API key.
- Install the community package
n8n-nodes-sarvamfrom Settings → Community Nodes (or npm on self-hosted). - Create a Sarvam AI API credential in n8n and attach it to each Sarvam node.
- Add a Sarvam AI node: pick Resource (Speech or Chat) and Operation, then map binary or text fields from upstream nodes.
- Execute workflow and inspect each node’s OUTPUT panel to verify
transcript, binarydata, orchoices[0].message.content. - Optionally import one of the JSON templates below to skip manual wiring.
- Iterate using the customization and best practices sections when you go to production.
Quick start
1. Prerequisites
- n8n — n8n Cloud or self-hosted v1.x / v2.x (with permission to install community nodes if you are not the admin).
- Sarvam AI API key — from dashboard.sarvam.ai (includes credits for new accounts).
- Basic familiarity with n8n: triggers, connections, and opening the JSON tab on a node’s output.
2. Install the community node
n8n Cloud / UI (recommended)
Self-hosted (npm)
- In n8n, open Settings (gear) → Community nodes.
- Under Install, enter the package name:
- Accept the prompt, wait for install to finish, then restart n8n if the UI asks you to.
- In the canvas, press + and search Sarvam AI — the node should appear.
Tip: Package and install docs also live on npm — n8n-nodes-sarvam and n8n community nodes.
3. Create your Sarvam credential
- Go to Credentials → New.
- Search for Sarvam AI API.
- Paste your API key from the dashboard and Save.
You reuse this credential on every Sarvam AI node in the workspace.
4. Run your first node (manual test)
- Create a new workflow and add Manual Trigger.
- Add Sarvam AI → connect Manual Trigger → Sarvam.
- Configure:
- Credential: your Sarvam AI API.
- Resource: Speech.
- Operation: Text to Speech.
- Text:
Hello from Sarvam in n8n. - Target Language: e.g.
en-INorhi-IN.
- Click Execute workflow on the Manual Trigger.
- Open the Sarvam node OUTPUT — you should see binary data (often property name
data) containing generated audio.
That confirms install, credentials, and outbound calls to Sarvam.
5. Read outputs in the next node
- After Speech to Text, the transcript is typically in
{{ $json.transcript }}. - After Chat → Complete, the assistant text is
{{ $json.choices[0].message.content }}. - After Text to Speech, audio is in binary; the default property name is usually
data— set Input Binary Field on the next STT node to match exactly.
6. Optional — import a starter workflow
If you prefer to start from JSON, skip to Import sample workflows below, import one template, then re-attach your credential on each Sarvam node (n8n may show a warning until you do).
Import sample workflows
Use the workflow menu ⋯ → Import from URL / File, or File → Import from File, then paste JSON. After import:
- Open each Sarvam AI node and select your Sarvam AI API credential (replace any placeholder).
- Execute once from the trigger to confirm binary and JSON fields look correct.
Transcribe an audio upload (form)
Flow: Form Trigger → Sarvam AI (Speech → Speech to Text) → transcript in JSON.
Sarvam node fields
Text to speech → speech to text → chat
Same pattern as the sample fern/n8n.json: proves binary handoff (data) and {{ $json.transcript }} into chat.
Understanding the workflow
Unlike a single Python pipeline, n8n runs discrete nodes. Think in terms of JSON (text fields) and binary (files / audio).
Typical voice Q&A path
- Trigger produces binary (e.g.
dataor a custom name likeaudio). - Speech to Text reads that binary property and writes
transcripton the JSON item. - Chat → Complete consumes text (expression from STT or static prompt).
- Text to Speech reads Text from chat output and writes audio back to binary (default often
data).
When multiple Sarvam nodes exist, disambiguate with $('Exact node name').item.json... if {{ $json }} is ambiguous.
Build a full voice Q&A (webhook sketch)
Use this when you want HTTP POST with raw audio → transcript → Sarvam model → spoken answer.
- Add Webhook — method POST, Binary data enabled so the body becomes a binary property (often
data). - Add Sarvam AI — Speech / Speech to Text — Input Binary Field:
data— Mode: Transcribe (or Translate to English if the model should only see English). - Add Sarvam AI — Chat / Complete — User Message:
={{ $json.transcript }}— optional System Message with tone and safety rules — Model:sarvam-105borsarvam-30bfor lower latency. - Add Sarvam AI — Speech / Text to Speech — Text:
={{ $('Sarvam AI').item.json.choices[0].message.content }}(adjust the node name to match your Chat node’s label in the canvas). - Add Respond to Webhook — respond with binary from the TTS node (binary field
dataunless you renamed it).
Naming tip: Give each Sarvam node a unique, descriptive name (for example Sarvam STT, Sarvam Chat, Sarvam TTS) so expressions and logs stay readable.
Customization examples
These mirror the “Hindi / Tamil / multilingual / translate” progression from our LiveKit and Pipecat guides, expressed as node fields instead of Python.
Example 1: Hindi STT and TTS
- Speech to Text → Options → Language:
hi-IN— Mode: Transcribe. - Text to Speech → Target Language:
hi-IN— Options → Speaker: e.g.simran,shubh, or any voice from the list below.
Example 2: Tamil STT and TTS
- STT Language:
ta-IN— Mode: Transcribe. - TTS Target Language:
ta-IN— pick a Speaker that fits your UX tests.
Example 3: Multilingual input (auto language)
- STT Options → Language: Auto Detect (value
unknownin the API) when callers may switch languages. - Mode: Transcribe for same-language script output, or Code Mixed when you want English words Latin and Indic words in native script.
Example 4: Indian-language speech → English text for downstream tools
Difference: use Mode: Translate to English on STT when the rest of the workflow (CRM, ticketing, English-only LLM) should only consume English text. Then wire User Message on Chat to ={{ $json.transcript }} as usual.
Operations reference
Speech to Text — modes
Chat — models (node UI)
Under Options you can tune temperature, max tokens, top p, penalties, reasoning effort (where applicable), seed, and wiki grounding — same concepts as in Chat completion docs.
Available options
Language codes (common)
STT supports 23 Indian languages (full list in the node’s language dropdown). TTS Target language supports 11 languages (also listed in the UI).
TTS — output codec and sample rate
Under Options on Text to Speech:
- Output Audio Codec:
wav(default),mp3,aac,flac,opus,mulaw,alaw,linear16. - Sample Rate:
8000–48000Hz (see dropdown). - Pace:
0.5–2.0(default1.0). - Speaker: many Indian-language voices (default
shubh). Try a few in the UI to match your brand.
Speaker voices (names as in node)
Male: aayan, aditya, advait, amit, anand, ashutosh, dev, gokul, kabir, manan, mani, mohit, rahul, rehan, rohan, ratan, shubh, soham, sumit, sunny, tarun, varun, vijay.
Female: amelia, ishita, kavitha, kavya, neha, pooja, priya, ritu, roopa, rupali, shreya, shruti, simran, sophia, suhani, tanya.
Expressions and multiple Sarvam nodes
{{ $json.transcript }}— transcript from the immediately previous node when that node outputs it on the same item.{{ $json.choices[0].message.content }}— assistant reply from Chat → Complete.$('Node display name').item.json...— use when several nodes ran before the current one or when you need a specific upstream node. The string must match the node name on the canvas exactly.
Always turn on “Execute previous nodes” (or run the whole workflow) so the expression editor can resolve previews.
Best practices
1. Match binary property names exactly
The Input Binary Field on STT must equal the binary field name from the upstream node (data, audio, etc.). Inspect the Binary tab on the previous node’s output — do not guess.
2. Prefer unique node names before writing expressions
Rename nodes from generic Sarvam AI to Sarvam STT – inbound, Sarvam Chat – support, and so on. It prevents broken $('...') references after copy-paste.
3. Pin sample data while building
Use Pin data on a trigger or HTTP node with a small sample file so you can iterate on STT options without re-uploading each run.
4. Keep execution order predictable
Imported samples use executionOrder: v1. For multi-branch workflows, understand n8n’s execution model so STT always runs after the node that fetches audio.
5. Treat credentials like environment secrets
Do not embed API keys in expressions or static JSON exports you commit to git. Use Credentials and n8n’s external secrets integrations for production.
6. Use Sarvam as an AI tool where supported
The community node is built usableAsTool: true — on n8n versions that support it, you can expose Sarvam operations to AI Agent-style workflows as tools.
Pro tips
- Auto Detect on STT is ideal for mixed or unknown caller languages.
- Sarvam handles code-mixed speech (for example Hinglish); pair that with Code Mixed mode when you care about script split.
sarvam-30bis a good default for fast replies in chat-heavy automations; move tosarvam-105bwhen quality matters more.- Chain Translate to English early if downstream services are English-only — simpler than translating later in the graph.
Troubleshooting
Node does not appear after install — Restart n8n. On self-hosted, confirm the package is installed in the same environment n8n loads and check server logs for install errors.
Community nodes disabled — Your admin may block installs. Ask for n8n-nodes-sarvam to be allow-listed or pre-baked into the image.
Binary data not found on STT — Wrong Input Binary Field name, or the previous node did not output binary on that execution (e.g. GET returned JSON instead of file). For HTTP Request, set response to File when downloading audio.
Empty or nonsense transcript — Check audio format (wav, mp3, ogg, flac), duration, and corruption. Try a fixed Language instead of auto-detect for debugging.
Chat returns errors — Verify User Message is non-empty (expressions must evaluate to text). Confirm Model is set and the Sarvam API is reachable from your host.
401 / 403 from Sarvam — Rotate the key in the dashboard and update the Sarvam AI API credential; ensure no stray spaces in the key field.
Expressions show “undefined” — Prior node did not run in this execution, or the JSON path is wrong. Open OUTPUT on each upstream node and copy paths from the JSON view.
Additional resources
- Sarvam AI documentation
- npm — n8n-nodes-sarvam
- n8n-nodes-sarvam on GitHub (source and issues)
- n8n community nodes — installation
- Speech to Text — REST
- Text to Speech — REST
- Chat completion — overview
- Build Voice Agent with LiveKit — same Sarvam capabilities in real-time code
- Build Voice Agent with Pipecat — pipeline-style voice agent
Need help?
- Sarvam Support: developer@sarvam.ai
- Community: Join the Discord Community
Happy building!