Build workflows with Sarvam AI in n8n

Video walkthrough

Watch this first for a full click-through: installing the n8n-nodes-sarvam community node, creating Sarvam AI API credentials in n8n, and wiring a TTS → STT → Chat sample chain.

Then continue with Overview and Quick start for prerequisites, install commands, and importable JSON.

Overview

This guide shows how to use n8n as your automation layer and Sarvam AI for speech-to-text, text-to-speech, and chat completions across Indian languages — without maintaining a custom backend. It follows the same flow as our LiveKit and Pipecat integration pages: what you will build, a quick path to a first success, then customization, reference tables, best practices, and troubleshooting.

If you already use n8n for CRM, support, or internal tools, you can add Sarvam nodes wherever you need audio or Indic-language text intelligence.

What you’ll build

You will be able to:

Transcribe uploaded or downloaded audio (23 Indian languages plus English, with several transcript modes).
Synthesize speech from text (11 languages, many speakers, codecs, and pace).
Run Sarvam chat models on transcripts or any text, then chain TTS or downstream systems.
Compose full flows such as form → transcript, webhook → STT → chat → TTS, or sheet → batch transcribe — all in the node editor.

Quick overview

Watch the video walkthrough at the top of this page — community node install, credentials, and a TTS → STT → Chat demo.
Create a Sarvam AI account and copy an API key.
Install the community package n8n-nodes-sarvam from Settings → Community Nodes (or npm on self-hosted).
Create a Sarvam AI API credential in n8n and attach it to each Sarvam node.
Add a Sarvam AI node: pick Resource (Speech or Chat) and Operation, then map binary or text fields from upstream nodes.
Execute workflow and inspect each node’s OUTPUT panel to verify transcript, binary data, or choices[0].message.content.
Optionally import one of the JSON templates below to skip manual wiring.
Iterate using the customization and best practices sections when you go to production.

Quick start

1. Prerequisites

n8n — n8n Cloud or self-hosted v1.x / v2.x (with permission to install community nodes if you are not the admin).
Sarvam AI API key — from dashboard.sarvam.ai (includes credits for new accounts).
Basic familiarity with n8n: triggers, connections, and opening the JSON tab on a node’s output.

2. Install the community node

n8n Cloud / UI (recommended)

Self-hosted (npm)

In n8n, open Settings (gear) → Community nodes.
Under Install, enter the package name:

n8n-nodes-sarvam

Accept the prompt, wait for install to finish, then restart n8n if the UI asks you to.
In the canvas, press + and search Sarvam AI — the node should appear.

Tip: Package and install docs also live on npm — n8n-nodes-sarvam and n8n community nodes.

3. Create your Sarvam credential

Go to Credentials → New.
Search for Sarvam AI API.
Paste your API key from the dashboard and Save.

You reuse this credential on every Sarvam AI node in the workspace.

Credential type : Sarvam AI API
API Key         : <your-key-from-dashboard.sarvam.ai>

4. Run your first node (manual test)

Create a new workflow and add Manual Trigger.
Add Sarvam AI → connect Manual Trigger → Sarvam.
Configure:
- Credential: your Sarvam AI API.
- Resource: Speech.
- Operation: Text to Speech.
- Text: Hello from Sarvam in n8n.
- Target Language: e.g. en-IN or hi-IN.
Click Execute workflow on the Manual Trigger.
Open the Sarvam node OUTPUT — you should see binary data (often property name data) containing generated audio.

That confirms install, credentials, and outbound calls to Sarvam.

5. Read outputs in the next node

After Speech to Text, the transcript is typically in {{ $json.transcript }}.
After Chat → Complete, the assistant text is {{ $json.choices[0].message.content }}.
After Text to Speech, audio is in binary; the default property name is usually data — set Input Binary Field on the next STT node to match exactly.

6. Optional — import a starter workflow

If you prefer to start from JSON, skip to Import sample workflows below, import one template, then re-attach your credential on each Sarvam node (n8n may show a warning until you do).

Import sample workflows

Use the workflow menu ⋯ → Import from URL / File, or File → Import from File, then paste JSON. After import:

Open each Sarvam AI node and select your Sarvam AI API credential (replace any placeholder).
Execute once from the trigger to confirm binary and JSON fields look correct.

Transcribe an audio upload (form)

Flow: Form Trigger → Sarvam AI (Speech → Speech to Text) → transcript in JSON.

1 {
2   "name": "Sarvam — Transcribe form upload",
3   "nodes": [
4     {
5       "parameters": {
6         "formTitle": "Sample audio",
7         "formDescription": "Upload a short clip to transcribe",
8         "formFields": {
9           "values": [
10             {
11               "fieldLabel": "audio",
12               "fieldType": "file",
13               "acceptFileTypes": ".mp3,.wav"
14             }
15           ]
16         },
17         "options": {}
18       },
19       "type": "n8n-nodes-base.formTrigger",
20       "typeVersion": 2.5,
21       "position": [-200, 0],
22       "id": "form-trigger-sarvam-stt",
23       "name": "On form submission"
24     },
25     {
26       "parameters": {
27         "resource": "speech",
28         "operation": "speechToText",
29         "binaryPropertyName": "audio",
30         "sttMode": "transcribe",
31         "speechToTextOptions": {}
32       },
33       "type": "n8n-nodes-sarvam.sarvam",
34       "typeVersion": 1,
35       "position": [80, 0],
36       "id": "sarvam-stt-form",
37       "name": "Transcribe audio",
38       "credentials": {
39         "sarvamApi": {
40           "id": "__REPLACE_WITH_YOUR_CREDENTIAL_ID__",
41           "name": "Sarvam AI API"
42         }
43       }
44     }
45   ],
46   "pinData": {},
47   "connections": {
48     "On form submission": {
49       "main": [
50         [
51           {
52             "node": "Transcribe audio",
53             "type": "main",
54             "index": 0
55           }
56         ]
57       ]
58     }
59   },
60   "active": false,
61   "settings": {
62     "executionOrder": "v1",
63     "binaryMode": "separate"
64   },
65   "tags": []
66 }

Sarvam node fields

UI field	Value
Resource	Speech
Operation	Speech to Text
Input Binary Field	`audio` (must match the form file field label)
Mode	Transcribe (or another mode from the table below)

Text to speech → speech to text → chat

Same pattern as the sample fern/n8n.json: proves binary handoff (data) and {{ $json.transcript }} into chat.

1 {
2   "name": "Sarvam — TTS, STT, then chat",
3   "nodes": [
4     {
5       "parameters": {},
6       "type": "n8n-nodes-base.manualTrigger",
7       "typeVersion": 1,
8       "position": [0, 0],
9       "id": "manual-trigger-demo",
10       "name": "When clicking 'Execute workflow'"
11     },
12     {
13       "parameters": {
14         "resource": "speech",
15         "operation": "textToSpeech",
16         "text": "What is the capital of India?",
17         "ttsTargetLanguage": "en-IN",
18         "textToSpeechOptions": {}
19       },
20       "type": "n8n-nodes-sarvam.sarvam",
21       "typeVersion": 1,
22       "position": [220, 0],
23       "id": "sarvam-tts-demo",
24       "name": "Text to speech",
25       "credentials": {
26         "sarvamApi": {
27           "id": "__REPLACE_WITH_YOUR_CREDENTIAL_ID__",
28           "name": "Sarvam AI API"
29         }
30       }
31     },
32     {
33       "parameters": {
34         "resource": "speech",
35         "operation": "speechToText",
36         "binaryPropertyName": "data",
37         "sttMode": "transcribe",
38         "speechToTextOptions": {}
39       },
40       "type": "n8n-nodes-sarvam.sarvam",
41       "typeVersion": 1,
42       "position": [440, 0],
43       "id": "sarvam-stt-demo",
44       "name": "Speech to text",
45       "credentials": {
46         "sarvamApi": {
47           "id": "__REPLACE_WITH_YOUR_CREDENTIAL_ID__",
48           "name": "Sarvam AI API"
49         }
50       }
51     },
52     {
53       "parameters": {
54         "resource": "chat",
55         "operation": "complete",
56         "userMessage": "={{ $json.transcript }}",
57         "chatOptions": {}
58       },
59       "type": "n8n-nodes-sarvam.sarvam",
60       "typeVersion": 1,
61       "position": [660, 0],
62       "id": "sarvam-chat-demo",
63       "name": "Complete chat",
64       "credentials": {
65         "sarvamApi": {
66           "id": "__REPLACE_WITH_YOUR_CREDENTIAL_ID__",
67           "name": "Sarvam AI API"
68         }
69       }
70     }
71   ],
72   "pinData": {},
73   "connections": {
74     "When clicking 'Execute workflow'": {
75       "main": [[{ "node": "Text to speech", "type": "main", "index": 0 }]]
76     },
77     "Text to speech": {
78       "main": [[{ "node": "Speech to text", "type": "main", "index": 0 }]]
79     },
80     "Speech to text": {
81       "main": [[{ "node": "Complete chat", "type": "main", "index": 0 }]]
82     }
83   },
84   "active": false,
85   "settings": {
86     "executionOrder": "v1",
87     "binaryMode": "separate"
88   },
89   "tags": []
90 }

Understanding the workflow

Unlike a single Python pipeline, n8n runs discrete nodes. Think in terms of JSON (text fields) and binary (files / audio).

Typical voice Q&A path

Webhook / Form / File  →  [binary audio]  →  Sarvam STT  →  {{ $json.transcript }}
                                                      ↓
                                            Sarvam Chat (User message)
                                                      ↓
                              {{ $json.choices[0].message.content }}  →  Sarvam TTS  →  binary audio  →  Email / HTTP / Storage

Trigger produces binary (e.g. data or a custom name like audio).
Speech to Text reads that binary property and writes transcript on the JSON item.
Chat → Complete consumes text (expression from STT or static prompt).
Text to Speech reads Text from chat output and writes audio back to binary (default often data).

When multiple Sarvam nodes exist, disambiguate with $('Exact node name').item.json... if {{ $json }} is ambiguous.

Build a full voice Q&A (webhook sketch)

Use this when you want HTTP POST with raw audio → transcript → Sarvam model → spoken answer.

Add Webhook — method POST, Binary data enabled so the body becomes a binary property (often data).
Add Sarvam AI — Speech / Speech to Text — Input Binary Field: data — Mode: Transcribe (or Translate to English if the model should only see English).
Add Sarvam AI — Chat / Complete — User Message: ={{ $json.transcript }} — optional System Message with tone and safety rules — Model: sarvam-105b or sarvam-30b for lower latency.
Add Sarvam AI — Speech / Text to Speech — Text: ={{ $('Sarvam AI').item.json.choices[0].message.content }} (adjust the node name to match your Chat node’s label in the canvas).
Add Respond to Webhook — respond with binary from the TTS node (binary field data unless you renamed it).

Naming tip: Give each Sarvam node a unique, descriptive name (for example Sarvam STT, Sarvam Chat, Sarvam TTS) so expressions and logs stay readable.

Customization examples

These mirror the “Hindi / Tamil / multilingual / translate” progression from our LiveKit and Pipecat guides, expressed as node fields instead of Python.

Example 1: Hindi STT and TTS

Speech to Text → Options → Language: hi-IN — Mode: Transcribe.
Text to Speech → Target Language: hi-IN — Options → Speaker: e.g. simran, shubh, or any voice from the list below.

Example 2: Tamil STT and TTS

STT Language: ta-IN — Mode: Transcribe.
TTS Target Language: ta-IN — pick a Speaker that fits your UX tests.

Example 3: Multilingual input (auto language)

STT Options → Language: Auto Detect (value unknown in the API) when callers may switch languages.
Mode: Transcribe for same-language script output, or Code Mixed when you want English words Latin and Indic words in native script.

Example 4: Indian-language speech → English text for downstream tools

Difference: use Mode: Translate to English on STT when the rest of the workflow (CRM, ticketing, English-only LLM) should only consume English text. Then wire User Message on Chat to ={{ $json.transcript }} as usual.

Operations reference

Resource	Operation	Purpose
Speech	Speech to Text	Binary audio in → `transcript` (plus mode / language options)
Speech	Text to Speech	Text in → binary audio out (codec, pace, speaker, sample rate)
Chat	Complete	User (+ optional system) message → Sarvam chat completion JSON

Speech to Text — modes

Mode	When to use
Transcribe	Normal transcript in the spoken language and script
Translate to English	Spoken Indic (or mix) → English text for English-only steps
Transliterate (Roman)	Latin letters representing pronunciation
Code Mixed	English stays Latin; Indic words in native script
Verbatim	Minimal normalization — closest to raw recognition

Chat — models (node UI)

Model value	When to use
`sarvam-105b` (default in node)	Higher quality, richer reasoning options
`sarvam-30b`	Lower latency and cost for simpler prompts

Under Options you can tune temperature, max tokens, top p, penalties, reasoning effort (where applicable), seed, and wiki grounding — same concepts as in Chat completion docs.

Available options

Language codes (common)

Language	Code
English (India)	`en-IN`
Hindi	`hi-IN`
Bengali	`bn-IN`
Tamil	`ta-IN`
Telugu	`te-IN`
Gujarati	`gu-IN`
Kannada	`kn-IN`
Malayalam	`ml-IN`
Marathi	`mr-IN`
Punjabi	`pa-IN`
Odia	`od-IN`
Auto-detect (STT)	Use Auto Detect in the node (`unknown` in API terms)

STT supports 23 Indian languages (full list in the node’s language dropdown). TTS Target language supports 11 languages (also listed in the UI).

TTS — output codec and sample rate

Under Options on Text to Speech:

Output Audio Codec: wav (default), mp3, aac, flac, opus, mulaw, alaw, linear16.
Sample Rate: 8000–48000 Hz (see dropdown).
Pace: 0.5–2.0 (default 1.0).
Speaker: many Indian-language voices (default shubh). Try a few in the UI to match your brand.

Speaker voices (names as in node)

Male: aayan, aditya, advait, amit, anand, ashutosh, dev, gokul, kabir, manan, mani, mohit, rahul, rehan, rohan, ratan, shubh, soham, sumit, sunny, tarun, varun, vijay.

Female: amelia, ishita, kavitha, kavya, neha, pooja, priya, ritu, roopa, rupali, shreya, shruti, simran, sophia, suhani, tanya.

Expressions and multiple Sarvam nodes

{{ $json.transcript }} — transcript from the immediately previous node when that node outputs it on the same item.
{{ $json.choices[0].message.content }} — assistant reply from Chat → Complete.
$('Node display name').item.json... — use when several nodes ran before the current one or when you need a specific upstream node. The string must match the node name on the canvas exactly.

Always turn on “Execute previous nodes” (or run the whole workflow) so the expression editor can resolve previews.

Best practices

1. Match binary property names exactly

The Input Binary Field on STT must equal the binary field name from the upstream node (data, audio, etc.). Inspect the Binary tab on the previous node’s output — do not guess.

2. Prefer unique node names before writing expressions

Rename nodes from generic Sarvam AI to Sarvam STT – inbound, Sarvam Chat – support, and so on. It prevents broken $('...') references after copy-paste.

3. Pin sample data while building

Use Pin data on a trigger or HTTP node with a small sample file so you can iterate on STT options without re-uploading each run.

4. Keep execution order predictable

Imported samples use executionOrder: v1. For multi-branch workflows, understand n8n’s execution model so STT always runs after the node that fetches audio.

5. Treat credentials like environment secrets

Do not embed API keys in expressions or static JSON exports you commit to git. Use Credentials and n8n’s external secrets integrations for production.

6. Use Sarvam as an AI tool where supported

The community node is built usableAsTool: true — on n8n versions that support it, you can expose Sarvam operations to AI Agent-style workflows as tools.

Pro tips

Auto Detect on STT is ideal for mixed or unknown caller languages.
Sarvam handles code-mixed speech (for example Hinglish); pair that with Code Mixed mode when you care about script split.
sarvam-30b is a good default for fast replies in chat-heavy automations; move to sarvam-105b when quality matters more.
Chain Translate to English early if downstream services are English-only — simpler than translating later in the graph.

Troubleshooting

Node does not appear after install — Restart n8n. On self-hosted, confirm the package is installed in the same environment n8n loads and check server logs for install errors.

Community nodes disabled — Your admin may block installs. Ask for n8n-nodes-sarvam to be allow-listed or pre-baked into the image.

Binary data not found on STT — Wrong Input Binary Field name, or the previous node did not output binary on that execution (e.g. GET returned JSON instead of file). For HTTP Request, set response to File when downloading audio.

Empty or nonsense transcript — Check audio format (wav, mp3, ogg, flac), duration, and corruption. Try a fixed Language instead of auto-detect for debugging.

Chat returns errors — Verify User Message is non-empty (expressions must evaluate to text). Confirm Model is set and the Sarvam API is reachable from your host.

401 / 403 from Sarvam — Rotate the key in the dashboard and update the Sarvam AI API credential; ensure no stray spaces in the key field.

Expressions show “undefined” — Prior node did not run in this execution, or the JSON path is wrong. Open OUTPUT on each upstream node and copy paths from the JSON view.

Additional resources

Sarvam AI documentation
npm — n8n-nodes-sarvam
n8n-nodes-sarvam on GitHub (source and issues)
n8n community nodes — installation
Speech to Text — REST
Text to Speech — REST
Chat completion — overview
Build Voice Agent with LiveKit — same Sarvam capabilities in real-time code
Build Voice Agent with Pipecat — pipeline-style voice agent

Need help?

Sarvam Support: developer@sarvam.ai
Community: Join the Discord Community

Happy building!