October 25, 2024

Sarvam Nvidia

Sarvam AI launches first LLM for Indian languages

Press Release

Sarvam AI launches first LLM developed in India for local languages, built with NVIDIA AI

Bangalore, India: Friday, October 25, 2024: Created with NVIDIA NeMo software and trained on NVIDIA Hopper GPUs, Sarvam 1 model delivers efficient support for 11 languages to advance generative AI development across the nation.

Sarvam AI has developed Sarvam 1, India’s first home-grown large multilingual language model (LLM), built entirely on NVIDIA technology. Sarvam 1 is a 2-billion-parameter model, trained on 4 trillion tokens curated by Sarvam on NVIDIA H100 Tensor Core GPUs. Its custom tokenizer is up to four times more efficient than leading
English-trained models on Indian language text. Sarvam 1 supports 11 languages: Bengali, Gujarati, Hindi, Marathi, Malayalam, Kannada, Oriya, Tamil, Telugu, Punjabi, and English.

Sarvam 1 is already powering generative AI agents and other applications from Sarvam AI. Developers can use the base model — available on Hugging Face — to build their own generative AI applications for Indic language speakers.

“The Sarvam 1 model is the first example of an LLM trained from scratch with data, research,
and compute being fully in India”, said Dr. Pratyush Kumar, Co-Founder, Sarvam. He added; “We expect it to power a range of use cases including voice and messaging agents. This is the beginning of our mission to build full stack sovereign AI. We are deeply excited to be working together with NVIDIA towards this mission”.


Sarvam leveraged NVIDIA NeMo Curator to accelerate data processing pipelines and curate a high-quality pretraining corpus of data. NeMo Curator domain and quality classifier models were crucial in improving training data quality and enhancing the models final accuracy.

Sarvam 1, having undergone training on multiple applications, serves as an effective model for fine-tuning in various specialized tasks. These include formal and code-mixed translation, transliteration, preprocessing for text-to-speech systems, and vectorization for Indic content retrieval, as well as quality assessment and domain classification of pre-training data.

"Enterprises are seeking to leverage generative AI to accelerate innovation and tackle complex challenges at scale," said Kari Briski, vice president of AI software, models and services atNVIDIA. "Sarvam AI's multilingual model, developed using NVIDIA's full-stack AI platform including NeMo and Hopper GPUs, showcases how tailored AI solutions can address linguistic diversity and drive inclusive technological growth in regions like India."

NVIDIA TensorRT-LLM supports the low-precision FP8 inference of the Sarvam 1 model on the H100 GPUs and can be efficiently served and scaled using the NVIDIA Triton Inference Server TensorRT-LLM backend. Sarvam AI leverages its model within its voice-to-voice platform, recognized as an industry-leading solution for enterprises developing voice bots in Indian languages. Built on NVIDIA Riva speech and translation AI microservices, included with NVIDIA AI Enterprise, this platform effectively addresses use cases in legal, public, finance, and other sectors, particularly relevant to the Indian market.

Sarvam AI can run on NVIDIA-accelerated infrastructure on premises and on instances from NVIDIA’s global and Indian cloud partners to help advance AI adoption in India. This initiative marks a milestone in the country’s AI journey, helping position India as a leader in AI innovation and making advanced capabilities accessible to millions.

About Sarvam AI

Sarvam AI is a startup in the generative AI space focusing on efficient Indian language voice bots and productivity tools for knowledge workers. Sarvam AI is innovating across layers - building unique datasets, models for Indian languages speech and LLMs, and low-code authoring experiences for customer and professional agents. Sarvam AI is domiciled in India and aims to offer a sovereign stack for population scale AI usage

-- Draft Elements --
BULBUL

Meera

Professional and articulate

Arvind

Conversational and articulate

Maitryee

Engaging and informational

Amol

Narrational and mature  

Pavithra

Dramatic and engaging

Amartya

Expressive and distinct
E-commerce support
E-commerce requires clear communication of order details, prices, and delivery timelines, often mixing English terms with regional languages. Pick a voice for your brand and keep it consistent across all your communications and languages.
TTS Input: "Your order will be delivered in 2 days""Your order for 2 pairs of Allen Solly jeans and 1 Nike T-shirt has been confirmed. Total price: ₹3,999. Your order will be delivered in 2 days."
Hindi
Kanada
Odia
Telugu
Fintech Applications:
Financial services demand precise pronunciation of monetary values and financial terms, often involving large numbers and specialized vocabulary.
TTS Input: "Your account balance is ₹10,435.26. Kya aap ek FD open karna chahenge?"
Hindi
Punjabi
Tamil
Healthcare Communication:
Healthcare communication requires accurate pronunciation of medical terms, dosages, and instructions, often involving complex terminology and precise numerical information.
TTS Input: "Namaste Sharma ji, Dr. Gupta ne aapko Metformin 500mg prescribe kiya hai. Ise daily two times, subah aur shaam ko khana ke baad lena hai. Kya aapko koi side-effects ka anubhav ho raha hai?"
Hindi
Multilingual Audiobooks:
Audiobooks require consistent voice quality across languages, natural code-mixing, and expressive narration to bring stories to life. Give a unique voice to your characters in the same language.
TTS Input: "भगवान कृष्ण कहते हैं, सुखी जीवन जीने और स्वर्ग प्राप्त करने के लिए तपस्या और दान जैसे कुछ कार्य करने चाहिए। पुण्य कर्म करने से अनजाने में किए गए पाप भी नष्ट हो जाते हैं। इस प्रकार मनुष्य को नरक में नहीं जाना पड़ता।"
Hindi
Bengali
E-Learning Platform
Educational content often involves technical terms, mathematical expressions, and the need to maintain student engagement through varied intonation.
TTS Input:  "आज हम Einstein की Theory of Relativity के बारे में पढ़ेंगे। Theory कहती है कि समय और space एक दूसरे से जुड़े हुए हैं और इन्हें एक साथ space-time कहा जाता है। यह theory बताती है कि जब कोई object बहुत high speed से move करता है, तो उसके लिए time slow हो जाता है। इसे mathematically इस equation से express किया जा सकता है:

E = mc^2

जहाँ E energy है, m object का mass है, और c speed of light in vacuum है, जो लगभग 3 times 10^8 meters per second होती है। यह equation दिखाती है कि mass और energy interchangeable हैं और एक दूसरे में convert हो सकते हैं।"
Hindi
Multilingual news broadacasting
TTS Input with lots of abbreviation: "The ISRO (Indian Space Research Organisation) has successfully launched its latest satellite, GSAT-30, from the Satish Dhawan Space Centre. The satellite will enhance communication services across India. This achievement marks another milestone for ISRO following their earlier successful missions this year."
English
Tamil
Astrology Bot
Astrology applications need to convey mystical and predictive content with an appropriate tone and handling of astrological terminology.
TTS Input: "Namaste! Aaj aapka din shubh hai. Venus ki position se aapko aaj ek good news mil sakti hai. Office mein kisi senior se important task assign ho sakta hai. Stay confident!"
Hindi
Gujarati
Giving a Desi Touch to Google Maps:
Navigation services need to provide clear, timely instructions with accurate pronunciation of street names and landmarks.
TTS Input:  “Head south on Netaji Subhash Marg toward Dayanand Road. In 12 meters, turn left onto Dayanand Road. Continue straight for 350 meters, passing the United Bank of India ATM on your left."
Hindi
Speak to your users via IoT
Smart home devices need to convey information clearly and handle queries in natural, conversational language.
TTS Input:  "Good morning! It's 7:00 AM. The temperature today is 28 degrees Celsius, and the weather is very pleasant. You have a busy day ahead. Your first meeting is scheduled for 9:30 AM with the marketing team to discuss the upcoming campaign strategies.”
Marathi
Legal Documents
The powers of judicial review in the matters involving financial implications are also very limited. The wisdom and advisability of the Courts in the matters concerning the finance, are ordinarily not amenable to judicial review unless a gross case of arbitrariness or unfairness is established by the aggrieved party.​
Key Feature: With Formal Mode, you can create legal documents in different Indic languages while maintaining the formal tone.

Colloquial mode now empowers millions of Indians to access these complex documents by translating it in colloquial Indic language.
Other Translation Models
‍वित्तीय निहितार्थ से जुड़े मामलों में न्यायिक समीक्षा की शक्तियाँ भी बहुत सीमित हैं। वित्त से संबंधित मामलों में न्यायालयों का ज्ञान और सलाह आम तौर पर न्यायिक समीक्षा के लिए अनुकूल नहीं होते हैं जब तक कि पीड़ित पक्ष द्वारा मनमाने या अन्यायपूर्ण का एक गंभीर मामला स्थापित नहीं किया जाता है।​

Mayura (Formal + Preprocessing)
वित्त-संबंधी मामलों की समीक्षा करने के लिए न्यायपालिका की शक्ति काफी सीमित है। आम तौर पर, अदालतें वित्तीय मामलों में हस्तक्षेप नहीं करती हैं जब तक कि अन्याय या मनमाने ढंग से काम करने का स्पष्ट मामला न हो। यह आम तौर पर केवल तभी होता है जब निर्णय से प्रभावित व्यक्ति इसे साबित कर सकता है।​

Mayura (Colloqiual + Preprocessing)
Judiciary की financial-related cases को review करने की power बहुत restricted है। आमतौर पर, courts financial matters में interfere नहीं करते हैं जब तक कि unfairness या arbitrariness का clear case ना हो। ये आमतौर पर तभी होता है जब decision से प्रभावित व्यक्ति उसे prove कर सके।​
Unlock colloquial translation
I can help you sign up for our courses in just a few steps. Can you please provide your name and email address to get started?​


She's the GOAT when it comes to baking.
Formal
मैं कुछ ही चरणों में हमारे पाठ्यक्रमों के लिए साइन अप करने में आपकी मदद कर सकता हूँ। क्या आप कृपया अपना नाम और ईमेल पता प्रदान कर सकते हैं?

Colloquial
मैं आपको बस कुछ ही steps में हमारे courses के लिए sign up करने में मदद कर सकता हूँ। क्या आप अपना नाम और email address बता सकते हैं ताकि हम शुरू कर सकें?​

Other Models
जब बेकिंग की बात आती है तो वह बकरी है।

Colloquial Mode:
वे बेकिंग में महारत रखती हैं, उनके केक शानदार होते हैं।​

Visual
E-commerce requires clear communication of order details, prices, and delivery timelines, often mixing English terms with regional languages.
TTS Input: "Your order for 2 pairs of Allen Solly jeans and 1 Nike T-shirt has been confirmed. Total price: ₹3,999. Your order will be delivered in 2 days"
Hindi
Kanada
Healthcare Communication:
Healthcare communication requires accurate pronunciation of medical terms, dosages, and instructions, often involving complex terminology and precise numerical information.
TTS Input: "Namaste Sharma ji, Dr. Gupta ne aapko Metformin 500mg prescribe kiya hai. Ise daily two times, subah aur shaam ko khana ke baad lena hai. Kya aapko koi side-effects ka anubhav ho raha hai?"
Hindi
Gujarati
Multilingual Audiobooks:
Audiobooks require consistent voice quality across languages, natural code-mixing, and expressive narration to bring stories to life. Give a unique voice to your characters in the same language.
TTS Input:
कृष्ण: "अर्जुन, धर्म का मार्ग अक्सर चुनौतियों से भरा होता है, लेकिन विश्वास और संकल्प के साथ, सबसे अंधेरी रातें भी सुबह में बदल जाती हैं।"

अर्जुन: "कृष्ण, आपका ज्ञान हमारा मार्गदर्शक तारा है। मैं धर्म की रक्षा करने और अपने लोगों की रक्षा करने का प्रयास करूंगा।"

द्रौपदी: "कृष्ण, मेरा हृदय अन्याय के बोझ से भारी है, लेकिन आपकी उपस्थिति मुझे आशा से भर देती है। मुझे विश्वास है कि न्याय की जीत होगी।"
Krishna
Arjun
Draupadi
Male Professional newscaster voice in English:
TTS Input:  "The ISRO (Indian Space Research Organisation) has successfully launched its latest satellite, GSAT-30, from the Satish Dhawan Space Centre. The satellite will enhance communication services across India. This achievement marks another milestone for ISRO following their earlier successful missions this year."
TTS Output
Hindi (Female voice):
TTS Input:  "इसरो, Indian Space Research Organisation ने अपना latest satellite, GSAT-30, Satish Dhawan Space Centre से, successfully launch कर दिया है। , ये satellite पूरे India में, communication services को improve करेगा। , ये इस साल ISRO के successful missions के बाद , एक और बड़ी achievement है।"
TTS Output
Tamil (Male voice):
-- Draft Elements --
BULBUL

Meera

Professional and articulate

Arvind

Conversational and articulate

Maitryee

Engaging and informational

Amol

Narrational and mature  

Pavithra

Dramatic and engaging

Amartya

Expressive and distinct
Input
Output
Without Context: We are using it in the mushroom.

Given Context of Previous Turn i.e the Question asked by the Voice Agent was: Is it the bathroom, bedroom, or somewhere else?

Saaras Output with this Context: We are using it in the washroom.
Input
Output
Would you like to know the last four digits of my Aadhaar number? Please wait, I will tell you after giving you the number. Note that the last four digits of my Aadhaar number are 9088. Please tell us your birth date. Yes, my birth date is 15th May, 1998. Please tell me your phone number. Yes, my phone number is 3190-32320. We would like to know your address.
Input
Output
Hello, thank you for contacting WC Bank. I am Geetika, how can I assist you? Hello, I want to complete a mortgage loan application.
Input
Output
Friends, in the world of chemistry, the one who remembers the entire periodic table is the one who is immortal. You will also become carefree, you will also become better if you remember the entire periodic table. So let's start this session today. Friends, if you want to make your future journey easier and gain expertise in periodic properties, then for this, you will need to know screening and shielding effect along with Z-effective calculation. If you want to understand, then let's start this session from here today.
Input
Output
Fifty years of life have passed learning and speaking pure Hindi, now society says that Hindi won't work, we want fluent English
Input
Output
How to make onion samosa crispy without adding too much oil using only wheat flour, today I am going to show you the full recipe.
Astrology Bot
Astrology applications need to convey mystical and predictive content with an appropriate tone and handling of astrological terminology.
TTS Input: "Namaste! Aaj aapka din shubh hai. Venus ki position se aapko aaj ek good news mil sakti hai. Office mein kisi senior se important task assign ho sakta hai. Stay confident!"
Hindi
Gujarati
Giving a Desi Touch to Google Maps:
Navigation services need to provide clear, timely instructions with accurate pronunciation of street names and landmarks.
TTS Input:  “Head south on Netaji Subhash Marg toward Dayanand Road. In 12 meters, turn left onto Dayanand Road. Continue straight for 350 meters, passing the United Bank of India ATM on your left."
Hindi
Speak to your users via IoT
Smart home devices need to convey information clearly and handle queries in natural, conversational language.
TTS Input:  "Good morning! It's 7:00 AM. The temperature today is 28 degrees Celsius, and the weather is very pleasant. You have a busy day ahead. Your first meeting is scheduled for 9:30 AM with the marketing team to discuss the upcoming campaign strategies.”
Marathi
Legal Documents
The powers of judicial review in the matters involving financial implications are also very limited. The wisdom and advisability of the Courts in the matters concerning the finance, are ordinarily not amenable to judicial review unless a gross case of arbitrariness or unfairness is established by the aggrieved party.​
Key Feature: With Formal Mode, you can create legal documents in different Indic languages while maintaining the formal tone.

Colloquial mode now empowers millions of Indians to access these complex documents by translating it in colloquial Indic language.
Other Translation Models
‍वित्तीय निहितार्थ से जुड़े मामलों में न्यायिक समीक्षा की शक्तियाँ भी बहुत सीमित हैं। वित्त से संबंधित मामलों में न्यायालयों का ज्ञान और सलाह आम तौर पर न्यायिक समीक्षा के लिए अनुकूल नहीं होते हैं जब तक कि पीड़ित पक्ष द्वारा मनमाने या अन्यायपूर्ण का एक गंभीर मामला स्थापित नहीं किया जाता है।​

Mayura (Formal + Preprocessing)
वित्त-संबंधी मामलों की समीक्षा करने के लिए न्यायपालिका की शक्ति काफी सीमित है। आम तौर पर, अदालतें वित्तीय मामलों में हस्तक्षेप नहीं करती हैं जब तक कि अन्याय या मनमाने ढंग से काम करने का स्पष्ट मामला न हो। यह आम तौर पर केवल तभी होता है जब निर्णय से प्रभावित व्यक्ति इसे साबित कर सकता है।​

Mayura (Colloqiual + Preprocessing)
Judiciary की financial-related cases को review करने की power बहुत restricted है। आमतौर पर, courts financial matters में interfere नहीं करते हैं जब तक कि unfairness या arbitrariness का clear case ना हो। ये आमतौर पर तभी होता है जब decision से प्रभावित व्यक्ति उसे prove कर सके।​
Unlock colloquial translation
I can help you sign up for our courses in just a few steps. Can you please provide your name and email address to get started?​


She's the GOAT when it comes to baking.
Formal
मैं कुछ ही चरणों में हमारे पाठ्यक्रमों के लिए साइन अप करने में आपकी मदद कर सकता हूँ। क्या आप कृपया अपना नाम और ईमेल पता प्रदान कर सकते हैं?

Colloquial
मैं आपको बस कुछ ही steps में हमारे courses के लिए sign up करने में मदद कर सकता हूँ। क्या आप अपना नाम और email address बता सकते हैं ताकि हम शुरू कर सकें?​

Other Models
जब बेकिंग की बात आती है तो वह बकरी है।

Colloquial Mode:
वे बेकिंग में महारत रखती हैं, उनके केक शानदार होते हैं।​

Visual
E-commerce requires clear communication of order details, prices, and delivery timelines, often mixing English terms with regional languages.
TTS Input: "Your order for 2 pairs of Allen Solly jeans and 1 Nike T-shirt has been confirmed. Total price: ₹3,999. Your order will be delivered in 2 days"
Hindi
Kanada
Healthcare Communication:
Healthcare communication requires accurate pronunciation of medical terms, dosages, and instructions, often involving complex terminology and precise numerical information.
TTS Input: "Namaste Sharma ji, Dr. Gupta ne aapko Metformin 500mg prescribe kiya hai. Ise daily two times, subah aur shaam ko khana ke baad lena hai. Kya aapko koi side-effects ka anubhav ho raha hai?"
Hindi
Gujarati
Multilingual Audiobooks:
Audiobooks require consistent voice quality across languages, natural code-mixing, and expressive narration to bring stories to life. Give a unique voice to your characters in the same language.
TTS Input:
कृष्ण: "अर्जुन, धर्म का मार्ग अक्सर चुनौतियों से भरा होता है, लेकिन विश्वास और संकल्प के साथ, सबसे अंधेरी रातें भी सुबह में बदल जाती हैं।"

अर्जुन: "कृष्ण, आपका ज्ञान हमारा मार्गदर्शक तारा है। मैं धर्म की रक्षा करने और अपने लोगों की रक्षा करने का प्रयास करूंगा।"

द्रौपदी: "कृष्ण, मेरा हृदय अन्याय के बोझ से भारी है, लेकिन आपकी उपस्थिति मुझे आशा से भर देती है। मुझे विश्वास है कि न्याय की जीत होगी।"
Krishna
Arjun
Draupadi
Male Professional newscaster voice in English:
TTS Input:  "The ISRO (Indian Space Research Organisation) has successfully launched its latest satellite, GSAT-30, from the Satish Dhawan Space Centre. The satellite will enhance communication services across India. This achievement marks another milestone for ISRO following their earlier successful missions this year."
TTS Output
Hindi (Female voice):
TTS Input:  "इसरो, Indian Space Research Organisation ने अपना latest satellite, GSAT-30, Satish Dhawan Space Centre से, successfully launch कर दिया है। , ये satellite पूरे India में, communication services को improve करेगा। , ये इस साल ISRO के successful missions के बाद , एक और बड़ी achievement है।"
TTS Output
Tamil (Male voice):
Phase
Phase 1
Phase 2
Phase 3
Input
English audio (sentences)
English + Hindi audio (sentences)
English + Hindi audio (questions)
Output
Transcriptions
English -> Transcriptions. Hindi -> Transcriptions translated to English
Answers in English
Hours of audio
35
100
30
LR schedule
Constant with warmup
Cosine decay
Cosine decay with warmup