October 24, 2024

Sarvam 1

Sarvam 1: The first Indian language LLM

Large language models have demonstrated remarkable capabilities across diverse tasks, yet their development has predominantly focused on English and other high-resource languages. This English-centric approach has created a significant technological gap for the billions of speakers of Indian languages. While there have been efforts to introduce Indic languages to popular LLMs through continued pretraining, ground-up multilingual efforts like BLOOM are rare. The effectiveness of these models is also limited by poor token efficiency for Indic scripts and insufficient high-quality training data for these languages.

We introduce Sarvam-1, a 2-billion parameter language model specifically optimized for Indian languages. Built from the ground up to support 10 major Indian languages alongside English, Sarvam-1 demonstrates that careful curation of training data can yield superior performance even with a relatively modest parameter count. Our work addresses two critical challenges in Indic language modeling:

  • Token Efficiency: Existing multilingual models exhibit high token fertility (tokens needed per word) for Indic scripts, often requiring 4 to 8 tokens per word compared to 1.4 for English. Sarvam-1's tokenizer achieves significantly better efficiency, with fertility rates of 1.4-2.1 across all supported languages.
  • Data Quality: While web-crawled Indic language data exists, it often lacks depth and quality. Through advanced synthetic-data-generation techniques, we have developed a high-quality training corpus of 2 trillion tokens, specifically for 10 Indic languages (Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu).

Despite its compact size, Sarvam-1 demonstrates exceptional performance across standard benchmarks. It achieves high accuracy on both knowledge and reasoning tasks, especially in Indic languages, delivering state-of-the-art performance in its class. It also punches above its weight by being competitive to much larger models in most tasks. Concretely, it easily outperforms Gemma-2-2B and Llama-3.2-3B on a variety of standard benchmarks including MMLU, Arc-Challenge, and IndicGenBench, while achieving similar numbers to Llama 3.1 8B.

These results are particularly notable given Sarvam-1's size, which enables 4-6x faster inference compared to larger models while matching or exceeding their performance on Indic language tasks. This combination of high performance and computational efficiency makes Sarvam-1 particularly well-suited for practical applications, including deployment on edge devices. The model can be downloaded from 🤗 Hub.

Sarvam 2T: our Indic pretraining corpus

A key challenge in developing effective language models for Indian languages has been the scarcity of high-quality training data. While datasets like Sangraha exist, they often lack the depth, diversity, and quality necessary for training world-class models. Therefore, the bulk of our efforts has been focused on developing high-quality, diverse data that addresses these limitations.

Our training corpus, which we call Sarvam-2T, encompasses ~2 trillion Indic tokens in total. The data is almost evenly split between the 10 supported languages, with the exception of Hindi, which comprises about 20% of the data. For training Sarvam-1, we augmented Sarvam 2T with approximately equal amounts of English tokens, and a substantial collection of code covering most major programming languages. This balanced distribution ensures robust performance across both monolingual and multilingual tasks while maintaining decent coding capabilities.

Data Quality

Sarvam-2T demonstrates substantial improvements over existing Indic language datasets across multiple key metrics. Here is a comparison with Sangraha, the best open-source Indic pretraining corpus, which mostly contains documents crawled from the web:

Comparison of document length and quality: Sarvam-2T vs. Sangraha
Document Quality:
  • Average document length is 2x longer compared to web data
  • Quality assessment metrics show 3x more high-quality samples
  • Significantly lower repetition rates and improved coherence scores
Content Distribution:
  • 8x higher concentration of scientific and technical content
  • 6x more programming and technical documentation
  • Balanced representation across domains including academic, technical, and general knowledge
  • Reduced coverage (0.5x) of potentially sensitive topics

This improved content distribution, particularly the increased representation of scientific and technical material, enhances the model's capabilities in tasks requiring complex reasoning and domain-specific knowledge. The longer document lengths support better understanding of context and discourse structure, while the higher quality metrics ensure reliable training signals for the model. The conscious decision to limit sensitive content demonstrates our balanced approach to creating a comprehensive yet responsible training corpus for Indic language AI development.

Examples

A few example snippets from Sarvam 2T are shown below:

Basic algebra in Hindi: यह सामग्री मानती है कि आप कुछ बुनियादी बीजगणित जानते हैं और बहुपदों के साथ संचालन कैसे करते हैं। विभिन्न असंबंधित विषयों पर चर्चा करने के बाद, अब हम बहुपदों पर ध्यान केंद्रित कर रहे हैं जिसमें कई चर होते हैं, जैसे कि $P(x, y, z)$ जब तीन चर ($n = 3$) होते हैं। इन बहुपदों को "समतुल्य" कहा जाता है यदि वे तब भी समान रहते हैं जब हम किसी भी दो चरों को बदलते हैं। उदाहरण के लिए, $P(x, y) = xy + 3$ सममित है क्योंकि $P(x, y) = P(y, x)$। दूसरी ओर, $xy^2 - 4x^2y$ सममित नहीं है।

Astronomy in Oriya: ଆମ ମୀଲକି ୱେ ଗ୍ୟାଲେକ୍ସର ଧାରରେ ଏକ ସ୍ୱତନ୍ତ୍ର ତାର ରହିଛି ଯାହା ତାରଗୁଡ଼ିକ କିପରି ତିଆରି ହୁଅନ୍ତି ସେ ବିଷୟରେ ଆମର ବୁଝାମଣାକୁ ଆହ୍ଵାନ କରେ । ଏହି ତାରା, ଯାହାକୁ ଏସ.ଡି.ଏସ.ଏସ. ଜେ୧୦୨୯୧୫+୧୭୨୯୨୭ କୁହାଯାଏ, ଆମ ବର୍ତ୍ତମାନର ସିଦ୍ଧାନ୍ତରେ ଠିଆ ହେବା ପରି ମନେ ହେଉନାହିଁ । ଏହା ଅତ୍ୟନ୍ତ ଛୋଟ ଏବଂ ପ୍ରାୟ ୧୩ ଶହ କୋଟି ବର୍ଷ ପୁରୁଣା - ଏବଂ ଏହାର ଆକାର ଥିବା ତାରା ସୃଷ୍ଟି କରିବା ପାଇଁ ଏଥିରେ ଯଥେଷ୍ଟ ପରିମାଣର ପଦାର୍ଥ ନାହିଁ।

Web design in Gujarati: એડિટર વિ. બ્રાઉઝર ડિસ્પ્લેને સમજવું
વેબ પેજ ડિઝાઇન કરતી વખતે, તમે મૂળભૂત લેઆઉટથી શરૂઆત કરો છો જે બીજી બધી વસ્તુઓ માટે મંચ સુયોજિત કરે છે. અહીં તે કેવું દેખાય છે:
```
<meta charset="UTF-8" />
<title>Page Title Goes Here</title>
Page Content Goes Here
```
આ દરેક વેબ પેજ માટે મૂળભૂત ટેમ્પલેટ છે. `<title>` ટેગમાં તમારા પૃષ્ઠનું શીર્ષક હોવું જરૂરી છે. જ્યારે કોઈ વ્યક્તિ તમારું પેજ ખોલે છે, ત્યારે તેઓ તેમના બ્રાઉઝરની ટોચની પટ્ટીમાં આ શીર્ષક જોશે. તમે જે કોઈ પણ વાસ્તવિક સામગ્રી ઇચ્છો છો તેને લોકોને `<body>` ટેગની અંદર જવું જોઈએ. `<body>` ટેગની અંદરનું બધું બ્રાઉઝર વિન્ડોમાં દેખાય છે. તમે `<body>` ટેગની અંદર ટેક્સ્ટ અને ઈમેજ મૂકો છો, અને પછી તેમને યોગ્ય રીતે ગોઠવવા માટે HTML ટેગમાં લપેટી.

Model

Tokenizer

We developed a custom tokenizer optimized specifically for Indic languages, featuring a vocabulary size of 68,096 tokens, with 4,096 tokens reserved for future expansion. A distinguishing characteristic of our tokenizer is its remarkably low fertility across all supported languages—a metric that measures the average number of tokens required to encode a given text sequence.

The efficiency of our tokenizer design plays a crucial role in maximizing the effective training signal from the Sarvam-2T corpus. While the raw token count stands at 2 trillion, the actual information density is substantially higher due to its low fertility. This enables each token to encapsulate more semantic information compared to conventional tokenizers used in other multilingual models. This increased information density has significant implications: when normalized for information content per token, we estimate that our 2 trillion tokens provide a training signal equivalent to 6-8 trillion tokens processed through other popular tokenizers.

As shown in the comparison chart below, our tokenizer achieves significantly lower fertility scores across Indic languages, directly translating to more efficient training and inference processes.

Comparison of tokenizer fertility between Sarvam-1 and other popular LLMs

Architecture

Our model architecture follows established best practices with few exceptions. Notably, we opted for a deeper and thinner configuration compared to similar-sized models, a design choice supported by recent research demonstrating improved effectiveness.

Some key hyperparameters include:

  • Hidden size: 2048
  • Intermediate size: 11,008
  • Number of attention heads: 16
  • Number of hidden layers: 28
  • Number of key-value heads: 8
  • Maximum position embeddings: 8,192

The model uses SwiGLU as its hidden activation function and employs rotary positional embeddings (RoPE) with a theta value of 10,000. We train the model with grouped-query attention and bfloat16 mixed-precision for enhanced inference efficiency.

Training Infrastructure

The model was trained on Yotta's Shakti cluster, utilizing 1,024 GPUs over a 5-day period. We leveraged NVIDIA's NeMo framework for the training process, benefiting from its kernel fusion and other optimizations for large-scale language model training.

Evaluation

The evaluation of large language models for Indic languages presents unique challenges due to the scarcity of standardized benchmarks. To address this, we have structured our evaluation into two components: (1) performance on existing benchmarks adapted for Indic languages, and (2) downstream evaluation on Indic-relevant tasks. We compare Sarvam-1 against Gemma 2 2B, Llama 3.2 3B, and Llama 3.1 8B, noting that despite the larger size of Llama 3.1 8B, Sarvam-1 demonstrates competitive performance.

Academic Benchmarks

Evaluations translated from English

We have translated four widely-used benchmarks into 10 Indic languages to create a comprehensive evaluation suite:

  • MMLU (Massive Multitask Language Understanding): A diverse set of multiple-choice questions spanning various domains, considered a key benchmark for assessing an LLM's broad knowledge.
  • ARC-Challenge (AI2 Reasoning Challenge): A grade-school level question-answering dataset designed to evaluate the reasoning capabilities of LLMs.
  • BoolQ: A binary (yes/no) question-answering dataset that tests both world knowledge and basic reasoning skills.
  • TriviaQA: Originally a generation task for assessing factual retrieval, adapted to a multiple-choice format for this evaluation by randomly sampling three incorrect answers.

These translated datasets are open-sourced and available here. We report zero-shot performance for all models on these tasks, following standard practices in the field.

0-shot, top-1 accuracy of Sarvam-1 and other popular LLMs

Across the four standard benchmarks, Sarvam-1 demonstrates strong performance across languages despite its smaller size compared to Llama 3.1 8B. While it trails slightly in English tasks when compared against the larger model, it consistently outperforms both Gemma 2B and Llama 3.2 3B in all evaluations. Most notably, Sarvam-1 achieves exceptional results on TriviaQA, with an average score of 90.62 across Indic languages, significantly surpassing even the larger Llama 3.1 8B model (61.47). On MMLU, ARC-Challenge, and BoolQ, it achieves a new state-of-the-art with an Indic average of 44.44 and 58.50, 80.68 respectively. For language-wise breakdown, see the Appendix.

IndicGenBench

Additionally, we evaluate on IndicGenBench, a benchmark suite from Google, comprising four datasets:

  • CrossSum: Cross-lingual summarization, going from English documents to summaries in target Indic languages.
  • Flores: Focused on English to Indic language translation.
  • XORQA: A question-answering dataset with English context and questions, requiring answers in the target Indic language.
  • XQUAD: A question-answering dataset with both context and questions in Indic languages.

We observe that, while other models show significant performance degradation in zero-shot settings for these tasks, Sarvam-1 maintains consistent performance, resulting in a substantial performance gap. However, to be consistent with literature, we report one-shot performance as recommended by the original paper.

1-shot performance of Sarvam-1 and other popular LLMs on IndicGenBench

On the IndicGenBench suite, Sarvam-1 shows particularly impressive results in translation tasks, achieving a remarkable average chrF++ score of 39.83 on Flores English-to-Indic translation, substantially outperforming all baseline models including Llama 3.1 8B (34.23). The model maintains competitive performance on cross-lingual summarization (CrossSum) with an average chrF++ of 20.48, and demonstrates strong cross-lingual question-answering capabilities on XORQA with an average word-level F1 of 25.27. While XQUAD results (41.58) are slightly below Llama 3.1 8B (44.04), Sarvam-1 still outperforms both Gemma 2B and Llama 3.2 3B, showing its effectiveness in handling complex multilingual question-answering tasks For language-wise breakdown, see the Appendix.

Example Use Case: Translation

To assess the practical utility of Sarvam-1, we conducted extensive evaluations on downstream tasks after fine-tuning. Translation performance serves as a particularly illustrative example of the model's capabilities and efficiency. We finetuned Sarvam-1 for English-to-Indic translation on the BPCC dataset, and evaluate its performance on IN22-Gen. Results demonstrate that Sarvam-1:

  • Outperforms comparably sized models in its class
  • Achieves BLEU scores (~20) comparable to significantly larger models like Gemma-2-9B and Llama-3.1-8B

A key advantage of Sarvam-1 is its computational efficiency: it is 4-6x faster inference speed compared to these larger models while maintaining competitive performance. The smaller parameter count enables cost-effective deployment in production environments.

Translation accuracy vs. Inference time

This combination of strong performance and superior inference efficiency makes Sarvam-1 particularly well-suited for practical applications, including on edge devices. We can’t wait to see what the community builds with Sarvam-1!

Acknowledgements

We extend our sincere gratitude to several organizations and partners whose support was instrumental in the development and training of Sarvam-1:

NVIDIA: We thank NVIDIA for their valuable assistance with the NeMo codebase. Their expertise in large-scale model training frameworks significantly streamlined our development process and enabled efficient utilization of computational resources.

Yotta: Our appreciation goes to Yotta for providing access to their state-of-the-art GPU cluster, Shakti. This high-performance computing infrastructure was crucial for training Sarvam-1 at scale, allowing us to push the boundaries of Indic language model capabilities.

AI4Bharat: We are grateful for our academic partnership with AI4Bharat. Their expertise in Indian language technologies and their contributions to open-source language resources have been invaluable to our research and development efforts.

Appendix

Update (08 Nov, 2024): The results have been updated after annealing and model-merging showed significant improvements in performance (see the Llama technical report, Section 3.1.3, for more details on annealing).

Language Gemma-2-2B Llama-3.2-3B Llama-3.1-8B Sarvam-1
mmlu_en 45.06 52.69 61.19 47.64
mmlu_bn 29.95 34.13 39.90 44.24
mmlu_gu 29.39 32.50 35.26 44.58
mmlu_hi 32.35 37.44 44.58 45.58
mmlu_kn 29.29 32.90 37.22 44.50
mmlu_ml 30.71 33.04 38.60 44.25
mmlu_mr 30.32 33.88 39.79 44.83
mmlu_or 27.23 31.83 35.06 43.17
mmlu_pa 29.32 32.89 37.67 45.05
mmlu_ta 30.82 32.14 37.50 43.79
mmlu_te 29.20 33.15 37.43 44.43
Average 31.24 35.14 40.38 44.73
Indic Average 29.86 33.39 38.30 44.44
Language-wise accuracy of translated MMLU for Gemma-2-2B, Llama-3.2-3B, Llama-3.1-8B, and Sarvam-1.


Model gemma-2-2b llama-3.2-3b llama-3.1-8b sarvam-1
arcc_en 59.2 67.48 78.99 65.04
arcc_bn 32.43 39.83 48.17 59.39
arcc_gu 29.3 33.65 40.61 59.22
arcc_hi 37.57 49.13 56.17 60.00
arcc_kn 29.22 36.43 44.7 57.04
arcc_ml 29.91 33.22 46.78 58.96
arcc_mr 28.87 37.39 44.96 58.09
arcc_or 25.04 33.3 40.35 55.13
arcc_pa 29.39 33.22 40.43 60.70
arcc_ta 32.78 34.7 44.78 57.04
arcc_te 30 34.09 43.04 59.39
Average 33.06 39.31 48.09 59.09
Indic Average 30.45 36.50 45.00 58.50
Language-wise accuracy of translated Arc-Challenge for Gemma-2-2B, Llama-3.2-3B, Llama-3.1-8B, and Sarvam-1.


Model gemma-2-2b llama-3.2-3b llama-3.1-8b sarvam-1
triviaqa_en 78.13 87.28 92.57 93.38
triviaqa_bn 43.94 49.63 65.4 91.16
triviaqa_gu 37.8 42.44 58.25 91.35
triviaqa_hi 48.31 56.33 71.86 91.66
triviaqa_kn 36.75 45.45 60.34 91.00
triviaqa_ml 42.77 44.93 62.05 90.45
triviaqa_mr 43.8 52.55 68.15 91.47
triviaqa_or 28.63 35.22 46.86 87.76
triviaqa_pa 41.62 45.73 61.41 91.00
triviaqa_ta 45.22 44.59 61.29 89.48
triviaqa_te 37 43.11 59.12 90.89
Average 44.00 49.75 64.30 90.87
Indic Average 40.58 46.00 61.47 90.62
Language-wise accuracy of translated TriviaQA for Gemma-2-2B, Llama-3.2-3B, Llama-3.1-8B, and Sarvam-1.


Model gemma-2-2b llama-3.2-3b llama-3.1-8b sarvam-1
boolq_en 62.23 72.35 74.19 77.08
boolq_bn 62.45 64.53 69.57 81.07
boolq_gu 55.26 64.53 67.13 81.28
boolq_hi 50.98 68.01 73.03 81.62
boolq_kn 55.38 63.61 66.79 80.52
boolq_ml 60.15 63.88 66.51 80.7
boolq_mr 53.27 64.89 69.48 81.28
boolq_or 62.2 62.2 64.59 79.24
boolq_pa 61.93 63.3 69.63 81.38
boolq_ta 60.52 64.31 67.52 79.51
boolq_te 56.54 64.65 67.74 80.21
Average 58.26 65.11 68.74 80.42
Indic Average 57.87 64.39 68.20 80.68
Language-wise accuracy of translated BoolQ for Gemma-2-2B, Llama-3.2-3B, Llama-3.1-8B, and Sarvam-1.


Model gemma-2-2b llama-3.2-3b llama-3.1-8b sarvam-1
crosssum_bn 6.09 22.25 24.94 21.06
crosssum_gu 2.3 18.56 19.6 18.31
crosssum_hi 10.51 22.02 24.88 18.68
crosssum_kn 6.56 18.72 20.81 19.75
crosssum_ml 3.5 18.96 20.57 22.17
crosssum_mr 5.03 23.45 24.22 20.01
crosssum_or 1.6 14.14 15.04 21.69
crosssum_pa 6.12 17.84 18.37 17.54
crosssum_ta 13.53 24.05 26.69 27.41
crosssum_te 10.43 18.18 20.77 18.19
Average 6.57 19.82 21.59 20.48
Language-wise chrF++ scores of CrossSum for Gemma-2-2B, Llama-3.2-3B, Llama-3.1-8B, and Sarvam-1.


gemma-2-2b llama-3.2-3b llama-3.1-8b sarvam-1
flores_en-bn 29.91 30.6 37.24 41.0
flores_en-gu 22.35 24.15 33.07 42.84
flores_en-hi 44.81 38.48 44.85 37.52
flores_en-kn 23.21 26.81 33.8 41.54
flores_en-ml 24.13 27.93 35.39 36.02
flores_en-mr 30.45 29.95 35.18 38.18
flores_en-or 11.33 18.74 25.51 31.86
flores_en-pa 21.24 22.78 29.81 39.53
flores_en-ta 32.74 27.4 35.3 44.02
flores_en-te 26.05 24.58 32.1 45.76
Average 26.62 27.14 34.23 39.83
Language-wise chrF++ scores of Flores (en->xx) for Gemma-2-2B, Llama-3.2-3B, Llama-3.1-8B, and Sarvam-1.


gemma-2-2b llama-3.2-3b llama-3.1-8b sarvam-1
xorqa_bn 9.1 11.63 15.59 15.09
xorqa_gu 28.19 18.06 23.78 25.91
xorqa_hi 9.83 33.7 39.08 38.61
xorqa_kn 10.48 19.49 24.41 25.98
xorqa_ml 17.94 22.53 31.64 36.91
xorqa_mr 15.52 22.69 21.81 23.23
xorqa_or 1.9 11.15 17.81 18.12
xorqa_pa 8.09 17.6 23.04 26.35
xorqa_ta 11.7 15.55 25.24 24.57
xorqa_te 10.04 11.34 17.58 17.95
Average 12.28 18.37 24.00 25.27
Language-wise, word-level F1 scores of XORQA for Gemma-2-2B, Llama-3.2-3B, Llama-3.1-8B, and Sarvam-1.


gemma-2-2b llama-3.2-3b llama-3.1-8b sarvam-1
xquad_bn 33.5 39.08 51.91 41.13
xquad_gu 35.76 36.36 42.05 43.89
xquad_hi 46.27 57.31 62.88 48.93
xquad_kn 36.47 44.09 40.29 44.29
xquad_ml 34.96 41.85 38.22 36.45
xquad_mr 24.65 48.21 55.36 44.2
xquad_or 8.95 30.58 22.26 18.83
xquad_pa 37.45 42.72 42.53 50.03
xquad_ta 36.88 42.12 46.75 41.89
xquad_te 30.05 34.55 38.17 46.17
Average 32.49 41.69 44.04 41.58
Language-wise, word-level F1 scores of Xquad for Gemma-2-2B, Llama-3.2-3B, Llama-3.1-8B, and Sarvam-1.


-- Draft Elements --
BULBUL

Meera

Professional and articulate

Arvind

Conversational and articulate

Maitryee

Engaging and informational

Amol

Narrational and mature  

Pavithra

Dramatic and engaging

Amartya

Expressive and distinct
E-commerce support
E-commerce requires clear communication of order details, prices, and delivery timelines, often mixing English terms with regional languages. Pick a voice for your brand and keep it consistent across all your communications and languages.
TTS Input: "Your order will be delivered in 2 days""Your order for 2 pairs of Allen Solly jeans and 1 Nike T-shirt has been confirmed. Total price: ₹3,999. Your order will be delivered in 2 days."
Hindi
Kanada
Odia
Telugu
Fintech Applications:
Financial services demand precise pronunciation of monetary values and financial terms, often involving large numbers and specialized vocabulary.
TTS Input: "Your account balance is ₹10,435.26. Kya aap ek FD open karna chahenge?"
Hindi
Punjabi
Tamil
Healthcare Communication:
Healthcare communication requires accurate pronunciation of medical terms, dosages, and instructions, often involving complex terminology and precise numerical information.
TTS Input: "Namaste Sharma ji, Dr. Gupta ne aapko Metformin 500mg prescribe kiya hai. Ise daily two times, subah aur shaam ko khana ke baad lena hai. Kya aapko koi side-effects ka anubhav ho raha hai?"
Hindi
Multilingual Audiobooks:
Audiobooks require consistent voice quality across languages, natural code-mixing, and expressive narration to bring stories to life. Give a unique voice to your characters in the same language.
TTS Input: "भगवान कृष्ण कहते हैं, सुखी जीवन जीने और स्वर्ग प्राप्त करने के लिए तपस्या और दान जैसे कुछ कार्य करने चाहिए। पुण्य कर्म करने से अनजाने में किए गए पाप भी नष्ट हो जाते हैं। इस प्रकार मनुष्य को नरक में नहीं जाना पड़ता।"
Hindi
Bengali
E-Learning Platform
Educational content often involves technical terms, mathematical expressions, and the need to maintain student engagement through varied intonation.
TTS Input:  "आज हम Einstein की Theory of Relativity के बारे में पढ़ेंगे। Theory कहती है कि समय और space एक दूसरे से जुड़े हुए हैं और इन्हें एक साथ space-time कहा जाता है। यह theory बताती है कि जब कोई object बहुत high speed से move करता है, तो उसके लिए time slow हो जाता है। इसे mathematically इस equation से express किया जा सकता है:

E = mc^2

जहाँ E energy है, m object का mass है, और c speed of light in vacuum है, जो लगभग 3 times 10^8 meters per second होती है। यह equation दिखाती है कि mass और energy interchangeable हैं और एक दूसरे में convert हो सकते हैं।"
Hindi
Multilingual news broadacasting
TTS Input with lots of abbreviation: "The ISRO (Indian Space Research Organisation) has successfully launched its latest satellite, GSAT-30, from the Satish Dhawan Space Centre. The satellite will enhance communication services across India. This achievement marks another milestone for ISRO following their earlier successful missions this year."
English
Tamil
Astrology Bot
Astrology applications need to convey mystical and predictive content with an appropriate tone and handling of astrological terminology.
TTS Input: "Namaste! Aaj aapka din shubh hai. Venus ki position se aapko aaj ek good news mil sakti hai. Office mein kisi senior se important task assign ho sakta hai. Stay confident!"
Hindi
Gujarati
Giving a Desi Touch to Google Maps:
Navigation services need to provide clear, timely instructions with accurate pronunciation of street names and landmarks.
TTS Input:  “Head south on Netaji Subhash Marg toward Dayanand Road. In 12 meters, turn left onto Dayanand Road. Continue straight for 350 meters, passing the United Bank of India ATM on your left."
Hindi
Speak to your users via IoT
Smart home devices need to convey information clearly and handle queries in natural, conversational language.
TTS Input:  "Good morning! It's 7:00 AM. The temperature today is 28 degrees Celsius, and the weather is very pleasant. You have a busy day ahead. Your first meeting is scheduled for 9:30 AM with the marketing team to discuss the upcoming campaign strategies.”
Marathi
Legal Documents
The powers of judicial review in the matters involving financial implications are also very limited. The wisdom and advisability of the Courts in the matters concerning the finance, are ordinarily not amenable to judicial review unless a gross case of arbitrariness or unfairness is established by the aggrieved party.​
Key Feature: With Formal Mode, you can create legal documents in different Indic languages while maintaining the formal tone.

Colloquial mode now empowers millions of Indians to access these complex documents by translating it in colloquial Indic language.
Other Translation Models
‍वित्तीय निहितार्थ से जुड़े मामलों में न्यायिक समीक्षा की शक्तियाँ भी बहुत सीमित हैं। वित्त से संबंधित मामलों में न्यायालयों का ज्ञान और सलाह आम तौर पर न्यायिक समीक्षा के लिए अनुकूल नहीं होते हैं जब तक कि पीड़ित पक्ष द्वारा मनमाने या अन्यायपूर्ण का एक गंभीर मामला स्थापित नहीं किया जाता है।​

Mayura (Formal + Preprocessing)
वित्त-संबंधी मामलों की समीक्षा करने के लिए न्यायपालिका की शक्ति काफी सीमित है। आम तौर पर, अदालतें वित्तीय मामलों में हस्तक्षेप नहीं करती हैं जब तक कि अन्याय या मनमाने ढंग से काम करने का स्पष्ट मामला न हो। यह आम तौर पर केवल तभी होता है जब निर्णय से प्रभावित व्यक्ति इसे साबित कर सकता है।​

Mayura (Colloqiual + Preprocessing)
Judiciary की financial-related cases को review करने की power बहुत restricted है। आमतौर पर, courts financial matters में interfere नहीं करते हैं जब तक कि unfairness या arbitrariness का clear case ना हो। ये आमतौर पर तभी होता है जब decision से प्रभावित व्यक्ति उसे prove कर सके।​
Unlock colloquial translation
I can help you sign up for our courses in just a few steps. Can you please provide your name and email address to get started?​


She's the GOAT when it comes to baking.
Formal
मैं कुछ ही चरणों में हमारे पाठ्यक्रमों के लिए साइन अप करने में आपकी मदद कर सकता हूँ। क्या आप कृपया अपना नाम और ईमेल पता प्रदान कर सकते हैं?

Colloquial
मैं आपको बस कुछ ही steps में हमारे courses के लिए sign up करने में मदद कर सकता हूँ। क्या आप अपना नाम और email address बता सकते हैं ताकि हम शुरू कर सकें?​

Other Models
जब बेकिंग की बात आती है तो वह बकरी है।

Colloquial Mode:
वे बेकिंग में महारत रखती हैं, उनके केक शानदार होते हैं।​

Visual
E-commerce requires clear communication of order details, prices, and delivery timelines, often mixing English terms with regional languages.
TTS Input: "Your order for 2 pairs of Allen Solly jeans and 1 Nike T-shirt has been confirmed. Total price: ₹3,999. Your order will be delivered in 2 days"
Hindi
Kanada
Healthcare Communication:
Healthcare communication requires accurate pronunciation of medical terms, dosages, and instructions, often involving complex terminology and precise numerical information.
TTS Input: "Namaste Sharma ji, Dr. Gupta ne aapko Metformin 500mg prescribe kiya hai. Ise daily two times, subah aur shaam ko khana ke baad lena hai. Kya aapko koi side-effects ka anubhav ho raha hai?"
Hindi
Gujarati
Multilingual Audiobooks:
Audiobooks require consistent voice quality across languages, natural code-mixing, and expressive narration to bring stories to life. Give a unique voice to your characters in the same language.
TTS Input:
कृष्ण: "अर्जुन, धर्म का मार्ग अक्सर चुनौतियों से भरा होता है, लेकिन विश्वास और संकल्प के साथ, सबसे अंधेरी रातें भी सुबह में बदल जाती हैं।"

अर्जुन: "कृष्ण, आपका ज्ञान हमारा मार्गदर्शक तारा है। मैं धर्म की रक्षा करने और अपने लोगों की रक्षा करने का प्रयास करूंगा।"

द्रौपदी: "कृष्ण, मेरा हृदय अन्याय के बोझ से भारी है, लेकिन आपकी उपस्थिति मुझे आशा से भर देती है। मुझे विश्वास है कि न्याय की जीत होगी।"
Krishna
Arjun
Draupadi
Male Professional newscaster voice in English:
TTS Input:  "The ISRO (Indian Space Research Organisation) has successfully launched its latest satellite, GSAT-30, from the Satish Dhawan Space Centre. The satellite will enhance communication services across India. This achievement marks another milestone for ISRO following their earlier successful missions this year."
TTS Output
Hindi (Female voice):
TTS Input:  "इसरो, Indian Space Research Organisation ने अपना latest satellite, GSAT-30, Satish Dhawan Space Centre से, successfully launch कर दिया है। , ये satellite पूरे India में, communication services को improve करेगा। , ये इस साल ISRO के successful missions के बाद , एक और बड़ी achievement है।"
TTS Output
Tamil (Male voice):
-- Draft Elements --
BULBUL

Meera

Professional and articulate

Arvind

Conversational and articulate

Maitryee

Engaging and informational

Amol

Narrational and mature  

Pavithra

Dramatic and engaging

Amartya

Expressive and distinct
Input
Output
Without Context: We are using it in the mushroom.

Given Context of Previous Turn i.e the Question asked by the Voice Agent was: Is it the bathroom, bedroom, or somewhere else?

Saaras Output with this Context: We are using it in the washroom.
Input
Output
Would you like to know the last four digits of my Aadhaar number? Please wait, I will tell you after giving you the number. Note that the last four digits of my Aadhaar number are 9088. Please tell us your birth date. Yes, my birth date is 15th May, 1998. Please tell me your phone number. Yes, my phone number is 3190-32320. We would like to know your address.
Input
Output
Hello, thank you for contacting WC Bank. I am Geetika, how can I assist you? Hello, I want to complete a mortgage loan application.
Input
Output
Friends, in the world of chemistry, the one who remembers the entire periodic table is the one who is immortal. You will also become carefree, you will also become better if you remember the entire periodic table. So let's start this session today. Friends, if you want to make your future journey easier and gain expertise in periodic properties, then for this, you will need to know screening and shielding effect along with Z-effective calculation. If you want to understand, then let's start this session from here today.
Input
Output
Fifty years of life have passed learning and speaking pure Hindi, now society says that Hindi won't work, we want fluent English
Input
Output
How to make onion samosa crispy without adding too much oil using only wheat flour, today I am going to show you the full recipe.
Astrology Bot
Astrology applications need to convey mystical and predictive content with an appropriate tone and handling of astrological terminology.
TTS Input: "Namaste! Aaj aapka din shubh hai. Venus ki position se aapko aaj ek good news mil sakti hai. Office mein kisi senior se important task assign ho sakta hai. Stay confident!"
Hindi
Gujarati
Giving a Desi Touch to Google Maps:
Navigation services need to provide clear, timely instructions with accurate pronunciation of street names and landmarks.
TTS Input:  “Head south on Netaji Subhash Marg toward Dayanand Road. In 12 meters, turn left onto Dayanand Road. Continue straight for 350 meters, passing the United Bank of India ATM on your left."
Hindi
Speak to your users via IoT
Smart home devices need to convey information clearly and handle queries in natural, conversational language.
TTS Input:  "Good morning! It's 7:00 AM. The temperature today is 28 degrees Celsius, and the weather is very pleasant. You have a busy day ahead. Your first meeting is scheduled for 9:30 AM with the marketing team to discuss the upcoming campaign strategies.”
Marathi
Legal Documents
The powers of judicial review in the matters involving financial implications are also very limited. The wisdom and advisability of the Courts in the matters concerning the finance, are ordinarily not amenable to judicial review unless a gross case of arbitrariness or unfairness is established by the aggrieved party.​
Key Feature: With Formal Mode, you can create legal documents in different Indic languages while maintaining the formal tone.

Colloquial mode now empowers millions of Indians to access these complex documents by translating it in colloquial Indic language.
Other Translation Models
‍वित्तीय निहितार्थ से जुड़े मामलों में न्यायिक समीक्षा की शक्तियाँ भी बहुत सीमित हैं। वित्त से संबंधित मामलों में न्यायालयों का ज्ञान और सलाह आम तौर पर न्यायिक समीक्षा के लिए अनुकूल नहीं होते हैं जब तक कि पीड़ित पक्ष द्वारा मनमाने या अन्यायपूर्ण का एक गंभीर मामला स्थापित नहीं किया जाता है।​

Mayura (Formal + Preprocessing)
वित्त-संबंधी मामलों की समीक्षा करने के लिए न्यायपालिका की शक्ति काफी सीमित है। आम तौर पर, अदालतें वित्तीय मामलों में हस्तक्षेप नहीं करती हैं जब तक कि अन्याय या मनमाने ढंग से काम करने का स्पष्ट मामला न हो। यह आम तौर पर केवल तभी होता है जब निर्णय से प्रभावित व्यक्ति इसे साबित कर सकता है।​

Mayura (Colloqiual + Preprocessing)
Judiciary की financial-related cases को review करने की power बहुत restricted है। आमतौर पर, courts financial matters में interfere नहीं करते हैं जब तक कि unfairness या arbitrariness का clear case ना हो। ये आमतौर पर तभी होता है जब decision से प्रभावित व्यक्ति उसे prove कर सके।​
Unlock colloquial translation
I can help you sign up for our courses in just a few steps. Can you please provide your name and email address to get started?​


She's the GOAT when it comes to baking.
Formal
मैं कुछ ही चरणों में हमारे पाठ्यक्रमों के लिए साइन अप करने में आपकी मदद कर सकता हूँ। क्या आप कृपया अपना नाम और ईमेल पता प्रदान कर सकते हैं?

Colloquial
मैं आपको बस कुछ ही steps में हमारे courses के लिए sign up करने में मदद कर सकता हूँ। क्या आप अपना नाम और email address बता सकते हैं ताकि हम शुरू कर सकें?​

Other Models
जब बेकिंग की बात आती है तो वह बकरी है।

Colloquial Mode:
वे बेकिंग में महारत रखती हैं, उनके केक शानदार होते हैं।​

Visual
E-commerce requires clear communication of order details, prices, and delivery timelines, often mixing English terms with regional languages.
TTS Input: "Your order for 2 pairs of Allen Solly jeans and 1 Nike T-shirt has been confirmed. Total price: ₹3,999. Your order will be delivered in 2 days"
Hindi
Kanada
Healthcare Communication:
Healthcare communication requires accurate pronunciation of medical terms, dosages, and instructions, often involving complex terminology and precise numerical information.
TTS Input: "Namaste Sharma ji, Dr. Gupta ne aapko Metformin 500mg prescribe kiya hai. Ise daily two times, subah aur shaam ko khana ke baad lena hai. Kya aapko koi side-effects ka anubhav ho raha hai?"
Hindi
Gujarati
Multilingual Audiobooks:
Audiobooks require consistent voice quality across languages, natural code-mixing, and expressive narration to bring stories to life. Give a unique voice to your characters in the same language.
TTS Input:
कृष्ण: "अर्जुन, धर्म का मार्ग अक्सर चुनौतियों से भरा होता है, लेकिन विश्वास और संकल्प के साथ, सबसे अंधेरी रातें भी सुबह में बदल जाती हैं।"

अर्जुन: "कृष्ण, आपका ज्ञान हमारा मार्गदर्शक तारा है। मैं धर्म की रक्षा करने और अपने लोगों की रक्षा करने का प्रयास करूंगा।"

द्रौपदी: "कृष्ण, मेरा हृदय अन्याय के बोझ से भारी है, लेकिन आपकी उपस्थिति मुझे आशा से भर देती है। मुझे विश्वास है कि न्याय की जीत होगी।"
Krishna
Arjun
Draupadi
Male Professional newscaster voice in English:
TTS Input:  "The ISRO (Indian Space Research Organisation) has successfully launched its latest satellite, GSAT-30, from the Satish Dhawan Space Centre. The satellite will enhance communication services across India. This achievement marks another milestone for ISRO following their earlier successful missions this year."
TTS Output
Hindi (Female voice):
TTS Input:  "इसरो, Indian Space Research Organisation ने अपना latest satellite, GSAT-30, Satish Dhawan Space Centre से, successfully launch कर दिया है। , ये satellite पूरे India में, communication services को improve करेगा। , ये इस साल ISRO के successful missions के बाद , एक और बड़ी achievement है।"
TTS Output
Tamil (Male voice):
Phase
Phase 1
Phase 2
Phase 3
Input
English audio (sentences)
English + Hindi audio (sentences)
English + Hindi audio (questions)
Output
Transcriptions
English -> Transcriptions. Hindi -> Transcriptions translated to English
Answers in English
Hours of audio
35
100
30
LR schedule
Constant with warmup
Cosine decay
Cosine decay with warmup