September 6, 2024

Blog: Mayura

Making complex information accessible with

Mayura




In the diverse linguistic landscape of India, English to Indic language translations play a crucial role in knowledge sharing and consumption. While translation technologies have existed for years, a significant gap has persisted between formal translations and the way Indians actually communicate in their daily lives, leaving most knowledge lost in translation. Existing translation models, have long struggled with the nuances of colloquial language, regional expressions, and the unique phenomenon of code-mixing that characterizes Indian multilingualism.

In India, where most people are bilingual, spoken language often mixes words from English and regional languages. Colloquial language differs vastly from formal language in the Indian context and varies across dialects. Existing translation models, trained primarily on formal sources like newspapers and books, don't represent how Indians actually communicate. Sarvam's translation model is trained on real-world formal and colloquial communication data, making translations closer to everyday speech.

Conventional models fall short in several aspects:

  • Everyday Relevance: They focus on standardized language, while most real-world interactions occur in casual, colloquial settings.
  • Linguistic Flexibility: Rigid adherence to grammatical rules fails to capture the fluid structures of natural speech. For complex sentences, these models cannot simplify to produce accurate translations in Indian languages.
  • Vocabulary Richness: Limited "proper" vocabulary excludes slang, dialects, and code-mixing that color genuine conversations.
  • Cultural Nuance: Stripping away cultural context (e.g., the difference between formal and spoken Tamil) results in technically correct but culturally tone-deaf translations.
  • Authenticity: Pursuit of "universal" pronunciation often comes at the cost of regional accents and variations that give language its character.
  • Gender: Most Indic languages are gendered, unlike English. Translations need to represent these gender differences.

These shortcomings have real-world consequences. Social media conversations, local news, and person-to-person exchanges – the fabric of daily digital life – remain largely inaccessible across language barriers. This hinders personal communication and limits the reach and effectiveness of critical information dissemination, e-commerce, and digital services.

Sarvam AI's new translation model aims to address these challenges by embracing colloquial language and code-mixing. It represents a significant step towards making translation technology truly reflective of how people communicate in their daily lives. This approach has the potential to expand the reach of digital services, enhance cross-cultural understanding, and make the internet's vast resources more accessible to a broader audience.

How we built Sarvam Translate

Sarvam Translate was developed with a practical, application-first approach, recognizing that conventional translation models often fail to dissolve the significant information asymmetry that exists when translating from English to Indian languages, particularly in specialized or context-rich content. We also extended the capability further by investing in training our models on code-mixed, colloquial Indic data to really empower people to consume knowledge in the language they speak.

Our approach to building this model was multi-faceted:

  1. Diverse Data Collection: We gathered a wide-variety of data such as conversational, domain specific & technical documents, narrational from high quality sources to ensure diversity of linguistic contexts and use cases.
  2. Real-World Language Patterns: We focused on understanding how people actually communicate in the 10 supported languages in formal and colloquial settings. This meant recognizing and incorporating code-mixing patterns, where English words (particularly difficult verbs, technical nouns, and domain-specific terms) are naturally integrated into Indian language speech.
  3. Contextual Awareness: We used Indian context data to accurately handle the complexities of second and third person respectful forms in various languages. This ensures that translations maintain appropriate levels of formality and respect based on the context of the sentence.
  4. Gender Sensitivity: Recognizing that English is often gender-neutral in the first person, we developed our model to support appropriate gendering in Indian languages where it's grammatically necessary.

This approach allows Sarvam Translate to produce translations that not only convey the meaning of the original English text but do so in a way that sounds natural and familiar to Indian language speakers.

Real life applications of Sarvam Translate

1. Conversational Translation

Create LLM Powered voice chatbots that are colloquial and preserve consistent voice:

Translating LLM outputs into Indic languages to create bots that can communicate in the language of your customers.

Challenge: The LLM firstly generates responses that are more compatible with written word and not spoken english example use of bullets, long form & multi-clause sentences. Secondly, the issue of gender is lost in english and make it challenge to adjust the salutations, and preserve the gender in the context of a real time conversation.

Sarvam Approach: The simplification approach in Sarvam Translate firstly helps in converting this LLM text into spoken Indic language format that sounds natural when read aloud. This includes appropriate translation of filler words, slangs, proverbs etc. Second, it also provides a gender toggle for first-person translations to ensure that the gender of the translated output is maintained through the conversation.

Message:

I can help you sign up for our courses in just a few steps. Can you please provide your name and email address to get started?

Formal Translation:

मैं कुछ ही चरणों में हमारे पाठ्यक्रमों के लिए साइन अप करने में आपकी मदद कर सकता हूँ। क्या आप कृपया अपना नाम और ईमेल पता प्रदान कर सकते हैं?

Colloquial Translation:

मैं आपको बस कुछ ही steps में हमारे courses के लिए sign up करने में मदद कर सकता हूँ। क्या आप अपना नाम और email address बता सकते हैं ताकि हम शुरू कर सकें?

Not only for AI Agents, but translation model can help human agents also communicate with clients who speak a different language in real time.

Human Agent with filler words and numbers:

Agent:

Hello! Uh, I'm sorry to hear you're having trouble with your smartwatch order #DEL9876. Let's, like, troubleshoot first. Have you tried resetting the watch and reinstalling the app? If that doesn't work, I can help you discuss replacement options.

Translated Agent Response for Customer:

Agent: नमस्ते! मुझे सुनकर बहुत दुख हुआ कि आपको अपने smartwatch order #DEL9876 के साथ परेशानी हो रही है। चलो, पहले troubleshoot करते हैं। क्या आपने watch reset करके app reinstall करने की कोशिश की है? अगर वो काम नहीं करता है, तो मैं replacement options discuss करने में आपकी मदद कर सकता हूँ।


2. Domain-specific, technically complex document translation

Most real-world, domain specific documents contain a lot of complexity, technical terminologies that make translation to different languages while preserving the meaning a tough challenge. The fundamental requirement often is to translate complex, specialized content often written in longer, multi-clause sentences while preserving accuracy, context, and often the technical jargon. This involves not just linguistic translation, but also cultural adaptation and domain-specific expertise.

Legal Content:

Translating intricate legal terminology into clear, accurate Indic language versions.

Challenge: While 30 word sentences with multiple clauses and subclauses are common in legal documents, Indic language is comprised of simple smaller sentences. Conventional translation models lead to literal word by word translation of such complex paragraphs that make the outputs quite abstruse.

Approach: Sarvam Translate uses a unique approach to break down these complex English syntactic structures and translate it into Indic languages resulting in more easy-to-read sentences that feel natural to read.

Legal text in English:

The powers of judicial review in the matters involving financial implications are also very limited. The wisdom and advisability of the Courts in the matters concerning the finance, are ordinarily not amenable to judicial review unless a gross case of arbitrariness or unfairness is established by the aggrieved party.

Output from conventional translation models:

वित्तीय निहितार्थ से जुड़े मामलों में न्यायिक समीक्षा की शक्तियाँ भी बहुत सीमित हैं। वित्त से संबंधित मामलों में न्यायालयों का ज्ञान और सलाह आम तौर पर न्यायिक समीक्षा के लिए अनुकूल नहीं होते हैं जब तक कि पीड़ित पक्ष द्वारा मनमाने या अन्यायपूर्ण का एक गंभीर मामला स्थापित नहीं किया जाता है।

Output with our preprocessing approach to formal translation:

वित्त-संबंधी मामलों की समीक्षा करने के लिए न्यायपालिका की शक्ति काफी सीमित है। आम तौर पर, अदालतें वित्तीय मामलों में हस्तक्षेप नहीं करती हैं जब तक कि अन्याय या मनमाने ढंग से काम करने का स्पष्ट मामला न हो। यह आम तौर पर केवल तभी होता है जब निर्णय से प्रभावित व्यक्ति इसे साबित कर सकता है।

Further, with our colloquial translation models, we can also help improve access and easily communicate laws to a larger audience.

Output with our preprocessing approach to colloquial translation:

Judiciary की financial-related cases को review करने की power बहुत restricted है। आमतौर पर, courts financial matters में interfere नहीं करते हैं जब तक कि unfairness या arbitrariness का clear case ना हो। ये आमतौर पर तभी होता है जब decision से प्रभावित व्यक्ति उसे prove कर सके।

Scientific and Technical Content:

Creating educational and scientific content in all major Indian languages by converting complex mathematical, scientific, or technical material into accessible Indic language formats.

Challenge: Most scientific and technical documentation comprises of lots of technical terms, inline-code and equations/formulas that need to be retained as it is after translation.

Approach: Sarvam Translate employs a dual-stream architecture where one stream processes the textual content while the other analyzes and preserves formatting elements (HTML tags, Markdown syntax, code blocks etc.). These streams are merged post-translation, ensuring that the structural and stylistic elements of the original text are maintained in the output.

Technical Text with inline tags in English:

To create a CV in LaTeX, you can use a template like the moderncv package. Start by including the package with \\usepackage{moderncv} and set the style with \\moderncvstyle{style}, where style can be casual, classic, or banking. Then, use commands like \\name{Name}, \\address{Address}, \\phone{Phone}, \\email{Email}, and \\section{Section} to populate your CV.

Conventional Translation Models:

LaTeX में CV बनाने के लिए, आप मॉडर्नसीवी पैकेज जैसे टेम्पलेट का उपयोग कर सकते हैं। पैकेज को \\usepackage{moderncv} के साथ शामिल करके प्रारंभ करें और स्टाइल को \\moderncvstyle{style} के साथ सेट करें, जहां स्टाइल कैज़ुअल, क्लासिक या बैंकिंग हो सकता है। फिर, अपना सीवी भरने के लिए \\नाम{नाम}, \\पता{पता}, \\फोन{फोन}, \\ईमेल{ईमेल}, और \\सेक्शन{सेक्शन} जैसे कमांड का उपयोग करें।

Sarvam Output with Special Entity Retention and Formatting Fidelity:

लाटेक्स में एक सीवी बनाने के लिए, आप moderncv पैकेज जैसे टेम्पलेट का उपयोग कर सकते हैं। पैकेज को \\usepackage{moderncv} के साथ शामिल करके शुरू करें और शैली को \\moderncvstyle{style} के साथ सेट करें, जहाँ शैली अनौपचारिक, क्लासिक या बैंकिंग हो सकती है। फिर, अपने सीवी को भरने के लिए \\name{Name}, \\address{Address}, \\phone{Phone}, \\email{Email} और \\section{Section} जैसे कमांड का उपयोग करें।


Government Communication

Translating government notices and bills into multiple Indian languages, enhancing access to crucial information.

Challenge: Translate into formal language and maintain accuracy and consistency in communication across all languages. While good formal translation models might exist for languages like Hindi, the performance of these models rapidly degrades as you move to languages like Odia.

Approach: We have ensured that we train on data for resource constrained languages like Odia. This helps ensure that the performance of our models is consistently good across all languages and we can bridge the gaps for various people.

Government Announcement/Bill:

The Government of India has announced the launch of the National Health Protection Scheme (NHPS), aimed at providing comprehensive health coverage to all citizens, particularly the underprivileged. The NHPS offers health coverage of up to Rs. 5 lakh per family per year for secondary and tertiary care hospitalization, benefiting over 50 crore people and making it the world's largest government-funded health insurance initiative. The scheme will empanel more than 10,000 government and private hospitals nationwide, ensuring wide accessibility and choice for beneficiaries, who can avail cashless treatment at these hospitals. Supported by a robust digital platform for seamless service delivery and efficient claim management, the NHPS marks a significant step towards achieving universal health coverage in India. The government remains committed to improving healthcare infrastructure and services, enhancing the well-being of all citizens. For more information, visit the Ministry of Health and Family Welfare's official website or contact their helpline.

Hindi Version:

भारत सरकार ने राष्ट्रीय स्वास्थ्य सुरक्षा योजना (एन.एच.पी.एस.) शुरू करने की घोषणा की है, जिसका उद्देश्य सभी नागरिकों, विशेष रूप से वंचित लोगों को व्यापक स्वास्थ्य सुरक्षा प्रदान करना है। एन.एच.पी.एस. माध्यमिक और तृतीयक देखभाल के लिए प्रति परिवार प्रति वर्ष 5 लाख रुपये तक के स्वास्थ्य बीमा की पेशकश करता है, जिससे 50 करोड़ से अधिक लोग लाभान्वित होते हैं और यह दुनिया की सबसे बड़ी सरकारी वित्त पोषित स्वास्थ्य बीमा पहल बन जाती है। यह योजना देश भर में 10,000 से अधिक सरकारी और निजी अस्पतालों को सूचीबद्ध करेगी, जो लाभार्थियों के लिए व्यापक पहुंच और विकल्प सुनिश्चित करेगी, जो इन अस्पतालों में नकद रहित उपचार का लाभ उठा सकते हैं। निर्बाध सेवा वितरण और कुशल दावा प्रबंधन के लिए एक मजबूत डिजिटल प्लेटफॉर्म के समर्थन से, एन.एच.पी.एस. भारत में सार्वभौमिक स्वास्थ्य कवरेज प्राप्त करने की दिशा में एक महत्वपूर्ण कदम है। सरकार स्वास्थ्य सेवा के बुनियादी ढांचे और सेवाओं में सुधार, सभी नागरिकों के कल्याण में वृद्धि के लिए प्रतिबद्ध है। अधिक जानकारी के लिए स्वास्थ्य और परिवार कल्याण मंत्रालय की आधिकारिक वेबसाइट पर जाएं या उनकी हेल्पलाइन से संपर्क करें।

Telugu Version:

భారత ప్రభుత్వం అన్ని పౌరులకు, ముఖ్యంగా వెనుకబడిన వారికి సమగ్ర ఆరోగ్య రక్షణ కల్పించడం లక్ష్యంగా నేషనల్ హెల్త్ ప్రొటెక్షన్ స్కీమ్ (ఎన్‌హెచ్‌పిఎస్) ను ప్రారంభించినట్లు ప్రకటించింది. సెకండరీ మరియు తృతీయ సంరక్షణ ఆసుపత్రిలో చేరడానికి ఎన్‌హెచ్‌పిఎస్ సంవత్సరానికి ఒక కుటుంబానికి రూ. 5 లక్షల వరకు ఆరోగ్య కవరేజీని అందిస్తుంది, ఇది 50 కోట్ల మందికి ప్రయోజనం చేస్తుంది మరియు దీనిని ప్రపంచంలోనే అతిపెద్ద ప్రభుత్వ-నిధులతో కూడిన ఆరోగ్య బీమా చొరవ చేస్తుంది. ఈ పథకం దేశవ్యాప్తంగా 10,000 కంటే ఎక్కువ ప్రభుత్వ మరియు ప్రైవేట్ ఆసుపత్రులను ఎంప్యానెల్ చేయనుంది, లబ్ధిదారులకు విస్తృత ప్రాప్యత మరియు ఎంపికను భరోసా ఇస్తుంది, వారు ఈ ఆసుపత్రులలో నగదు రహిత చికిత్స పొందవచ్చు. అమరమైన సేవా పంపిణి మరియు సమర్థవంతమైన క్లెయిమ్ నిర్వహణ కోసం బలమైన డిజిటల్ ప్లాట్‌ఫారమ్‌ మద్దతుతో, ఎన్‌హెచ్‌పిఎస్ భారతదేశంలో సార్వత్రిక ఆరోగ్య కవరేజీని సాధించడానికి గణనీయమైన అడుగు వేసింది. ఆరోగ్య సంరక్షణ మౌలిక సదుపాయాలు, సేవలను మెరుగుపరచడానికి, పౌరులందరి శ్రేయస్సును పెంచడానికి ప్రభుత్వం కట్టుబడి ఉంది. మరింత సమాచారం కోసం, ఆరోగ్య మరియు కుటుంబ సంక్షేమ మంత్రిత్వ శాఖ యొక్క అధికారిక వెబ్సైట్ను సందర్శించండి లేదా వారి హెల్ప్లైన్ను సంప్రదించండి.

Odia Version:

ସମସ୍ତ ନାଗରିକ, ବିଶେଷ କରି ଅବହେଳିତ ବର୍ଗଙ୍କୁ ବ୍ୟାପକ ସ୍ୱାସ୍ଥ୍ୟ ସୁରକ୍ଷା ପ୍ରଦାନ କରିବା ଲାଗି ଭାରତ ସରକାର ଜାତୀୟ ସ୍ୱାସ୍ଥ୍ୟ ସୁରକ୍ଷା ଯୋଜନା (ଏନ୍‌.ଏଚ୍‌.ପି.ଏସ୍‌.) ଆରମ୍ଭ ଘୋଷଣା କରିଛନ୍ତି। ଏନ୍‌.ଏଚ୍‌.ପି.ଏସ୍‌. ମାଧ୍ୟମିକ ଏବଂ ତୃତୀୟକ ଯତ୍ନ ପାଇଁ ପ୍ରତିବର୍ଷ ପ୍ରତ୍ୟେକ ପରିବାରପିଛା ୫ ଲକ୍ଷ ଟଙ୍କା ପର୍ଯ୍ୟନ୍ତ ସ୍ୱାସ୍ଥ୍ୟ କଭରେଜ ପ୍ରଦାନ କରିଥାଏ, ଯାହା ୫୦ କୋଟିରୁ ଅଧିକ ଲୋକଙ୍କୁ ଉପକୃତ କରିଥାଏ ଏବଂ ଏହାକୁ ବିଶ୍ୱର ସରକାରୀ ଅନୁଦାନପ୍ରାପ୍ତ ସବୁଠାରୁ ବଡ଼ ସ୍ୱାସ୍ଥ୍ୟ ବୀମା ପଦକ୍ଷେପ କରିଥାଏ। ଏହି ଯୋଜନାରେ ଦେଶବ୍ୟାପୀ ୧୦,୦୦୦ରୁ ଅଧିକ ସରକାରୀ ଏବଂ ଘରୋଇ ଡାକ୍ତରଖାନାକୁ ଅନ୍ତର୍ଭୁକ୍ତ କରାଯିବ, ଯାହାଦ୍ୱାରା ହିତାଧିକାରୀମାନଙ୍କ ପାଇଁ ବ୍ୟାପକ ସୁଲଭ ଉପଲବ୍ଧି ଏବଂ ପସନ୍ଦର ନିଶ୍ଚିତ ହେବ, ଯେଉଁମାନେ ଏହି ଡାକ୍ତରଖାନାରେ ନଗଦ ଚିକିତ୍ସା ପାଇପାରିବେ। ନିରବଚ୍ଛିନ୍ନ ସେବା ପ୍ରଦାନପାଇଁ ଏକ ଦୃଢ଼ ଡିଜିଟାଲ ପ୍ଲାଟଫର୍ମ ଏବଂ ଦକ୍ଷ ଦାବି ପରିଚାଳନା ଦ୍ୱାରା ସମର୍ଥିତ, ଏନ୍‌.ଏଚ୍‌.ପି.ଏସ୍‌. ଭାରତରେ ସାର୍ବଜନୀନ ସ୍ୱାସ୍ଥ୍ୟ କଭରେଜ୍‌ ହାସଲ ଦିଗରେ ଏକ ଗୁରୁତ୍ୱପୂର୍ଣ୍ଣ ପଦକ୍ଷେପ ଅଟେ। ସ୍ୱାସ୍ଥ୍ୟସେବା ଭିତ୍ତିଭୂମି ଏବଂ ସେବାରେ ଉନ୍ନତି ଆଣିବା, ସମସ୍ତ ନାଗରିକଙ୍କ କଲ୍ୟାଣ ବୃଦ୍ଧି କରାଇବା ଦିଗରେ ସରକାର ପ୍ରତିଶ୍ରୁତିବଦ୍ଧ ରହିଛନ୍ତି। ଅଧିକ ସୂଚନା ପାଇଁ, ସ୍ୱାସ୍ଥ୍ୟ ଏବଂ ପରିବାର କଲ୍ୟାଣ ମନ୍ତ୍ରଣାଳୟର ଅଫିସିଆଲ୍ ୱେବସାଇଟ୍ ଦେଖନ୍ତୁ କିମ୍ବା ସେମାନଙ୍କ ହେଲ୍ପଲାଇନ୍ ସହିତ ଯୋଗାଯୋଗ କରନ୍ତୁ।


Medical Communication

Translating medical instructions, prescriptions, and healthcare information for patients who are more comfortable in their native language is critical for public health.

Challenge: Translating medical content involves handling domain-specific terms and proper nouns, such as medicine names, which require code-mixing for formal translation. It is crucial to retain commonly used technical terms and proper nouns like medicine names, as well as ensure consistency in dosages and units.

Sarvam's Approach: In the critical field of healthcare, Sarvam Translate's ability to retain technical terms, proper nouns, and ensure consistency in dosages and units is not just a feature – it's a lifesaving necessity. It transforms complex medical instructions into clear, accurate translations that patients can easily understand in their native language.

Medical Instruction in English:

Take two tablets of Paracetamol 500mg every 6 hours for fever. If symptoms persist, consult a healthcare professional. Additionally, for cough relief, take one spoon of Cough Syrup three times a day after meals. Ensure you stay hydrated and monitor your temperature regularly. Avoid cold beverages and rest adequately. If you experience any side effects like dizziness or rash, stop the medication and seek immediate medical attention.

Code-mixed Translation:

बुखार के लिए हर 6 घंटे में Paracetamol 500mg की दो tablets लें। अगर symptoms बने रहते हैं, तो किसी healthcare professional से consult करें। इसके अलावा, खांसी से राहत पाने के लिए, दिन में तीन बार खाने के बाद एक चम्मच Cough Syrup लें। ज़रूर, ध्यान रखें कि आप hydrated रहें और अपने temperature को regularly monitor करें। ठंडे पेय पदार्थों से बचें और आराम से रहें। अगर आपको कोई side effect जैसे dizziness या rash हो, तो दवा लेना बंद कर दें और तुरंत medical attention लें।


3. Content localization for promotional and ads messaging

Brands operating in India often face the friction of not being able to reach beyond the non-English speaker/reader and truly connect with their consumer base as effectively and consistently as they do in English. Translating promotional content while maintaining brand voice and local relevance becomes critical.

Challenge: The promotional messages often have a colloquial and casual tone with many emojis and specific spacing. They also include URLs, discount codes, and numbers. It is crucial for brands to retain this formatting and tone to preserve their unique voice as well as call to action.

Approach: Sarvam Translate retains the formatting, spacing, special entities like URLs and codes, and emojis to ensure the translated content maintains the original style and effectiveness.

English:

🎓 Back-to-School Bonanza!
Get ready to ace your learning journey with unbeatable offers!

📚 Top-rated courses starting at just ₹499
🖥️ Interactive learning tools from ₹299 onwards
📖 Comprehensive study materials at flat 40% discount

Use code LEARN2023 for an extra 10% off on your first purchase! ⏰

Start learning today at www.edustart.com

Other Translation Models:

बैक-टू-स्कूल बोनान्ज़ा!
अद्वितीय ऑफ़र के साथ अपनी सीखने की यात्रा में आगे बढ़ने के लिए तैयार हो जाइए!
टॉप-रेटेड पाठ्यक्रम मात्र ₹499 से शुरू
इंटरैक्टिव शिक्षण उपकरण ₹299 से शुरू
व्यापक अध्ययन सामग्री फ्लैट 40% छूट पर
अपनी पहली खरीदारी पर अतिरिक्त 10% छूट के लिए कोड सीखना2023 का उपयोग करें!
आज ही www.edustart.com पर सीखना शुरू करें


Sarvam Translation Model with Code-mixing and Formatting Fidelity

🎓 बैक-टू-स्कूल बोनान्ज़ा!

अपनी पढ़ाई में excel करने के लिए तैयार हो जाइए, fantastic deals के साथ!

📚 Top-rated courses सिर्फ़ ₹499 से शुरू होते हैं
🖥️ Interactive learning tools ₹299 से शुरू होते हैं
jih Study materials flat 40% discount पे available हैं।

अपनी पहली purchase पर extra 10% off पाने के लिए LEARN2023 code use करें! ⏰

अभी www.edustart.com पर सीखना शुरू करें


News

Translating global news articles into Indian languages while preserving the expression, urgency, and facts intact across languages is key. With the growing young reader base, it also becomes imperative to match the language of news to the evolving linguistic needs of this user base.

Challenge: Ensuring formal translations while incorporating code-mixing and retaining proper nouns is complex. Formal translations must also be closer to real-life formal communication and not use archaic words that are no longer understood.

Sarvam's Approach: Sarvam translate seamlessly integrates code-mixed patterns and proper noun preservation to match up news delivery with the evolving communication and comprehension patterns of the country.

English:

The COVID-19 pandemic, as reported by the Centers for Disease Control and Prevention (CDC), has had a profound impact on people's lives across the globe. The virus, known scientifically as SARS-CoV-2, has caused widespread disruption, affecting nearly every aspect of daily life. From public health challenges to economic hardships, the pandemic has prompted governments and health organizations worldwide to implement various measures aimed at curbing its spread. The CDC has been at the forefront of these efforts, providing critical guidelines and information to help mitigate the virus's impact and protect public health.

Other Models

रोग नियंत्रण और रोकथाम केंद्र (सीडीसी) की रिपोर्ट के अनुसार, सीओवीआईडी-19 महामारी ने दुनिया भर में लोगों के जीवन पर गहरा प्रभाव डाला है। वैज्ञानिक रूप से SARS-CoV-2 के रूप में ज्ञात वायरस ने व्यापक व्यवधान पैदा किया है, जिससे दैनिक जीवन का लगभग हर पहलू प्रभावित हुआ है। सार्वजनिक स्वास्थ्य चुनौतियों से लेकर आर्थिक कठिनाइयों तक, महामारी ने दुनिया भर की सरकारों और स्वास्थ्य संगठनों को इसके प्रसार को रोकने के उद्देश्य से विभिन्न उपायों को लागू करने के लिए प्रेरित किया है। सीडीसी इन प्रयासों में सबसे आगे रहा है, जो वायरस के प्रभाव को कम करने और सार्वजनिक स्वास्थ्य की रक्षा करने में मदद करने के लिए महत्वपूर्ण दिशानिर्देश और जानकारी प्रदान कर रहा है।

Sarvam Model

जैसा कि रोग नियंत्रण और रोकथाम केंद्र (सी.डी.सी.) द्वारा बताया गया है, कोविड-19 महामारी ने दुनिया भर में लोगों के जीवन पर गहरा प्रभाव डाला है। वैज्ञानिक रूप से सार्स-कोव-2 के रूप में जाना जाने वाला वायरस व्यापक व्यवधान पैदा कर रहा है, जो रोजमर्रा के जीवन के लगभग हर पहलू को प्रभावित कर रहा है। सार्वजनिक स्वास्थ्य चुनौतियों से लेकर आर्थिक कठिनाइयों तक, महामारी ने दुनिया भर में सरकारों और स्वास्थ्य संगठनों को इसके प्रसार को रोकने के उद्देश्य से विभिन्न उपायों को लागू करने के लिए प्रेरित किया है। सी.डी.सी. इन प्रयासों में सबसे आगे रहा है, जो विषाणु के प्रभाव को कम करने और सार्वजनिक स्वास्थ्य की रक्षा करने में मदद करने के लिए महत्वपूर्ण दिशानिर्देश और जानकारी प्रदान करता है।

The Path Forward

Sarvam Translate represents a significant step towards making India an equal participant in the global knowledge economy. By breaking down language barriers, we're not just translating content – we're opening doors to opportunities, education, and global connectivity for millions of Indians.

As we continue to refine and expand Sarvam Translate, we invite developers, businesses, and content creators to join us in this mission. Together, we can ensure that language is no longer a barrier to learning, growth, and success in the digital age.

Welcome to a future where every Indian, regardless of their linguistic background, can access, understand, and contribute to the world's knowledge.

-- Draft Elements --
BULBUL

Meera

Professional and articulate

Arvind

Conversational and articulate

Maitryee

Engaging and informational

Amol

Narrational and mature  

Pavithra

Dramatic and engaging

Amartya

Expressive and distinct
E-commerce support
E-commerce requires clear communication of order details, prices, and delivery timelines, often mixing English terms with regional languages. Pick a voice for your brand and keep it consistent across all your communications and languages.
TTS Input: "Your order will be delivered in 2 days""Your order for 2 pairs of Allen Solly jeans and 1 Nike T-shirt has been confirmed. Total price: ₹3,999. Your order will be delivered in 2 days."
Hindi
Kanada
Odia
Telugu
Fintech Applications:
Financial services demand precise pronunciation of monetary values and financial terms, often involving large numbers and specialized vocabulary.
TTS Input: "Your account balance is ₹10,435.26. Kya aap ek FD open karna chahenge?"
Hindi
Punjabi
Tamil
Healthcare Communication:
Healthcare communication requires accurate pronunciation of medical terms, dosages, and instructions, often involving complex terminology and precise numerical information.
TTS Input: "Namaste Sharma ji, Dr. Gupta ne aapko Metformin 500mg prescribe kiya hai. Ise daily two times, subah aur shaam ko khana ke baad lena hai. Kya aapko koi side-effects ka anubhav ho raha hai?"
Hindi
Multilingual Audiobooks:
Audiobooks require consistent voice quality across languages, natural code-mixing, and expressive narration to bring stories to life. Give a unique voice to your characters in the same language.
TTS Input: "भगवान कृष्ण कहते हैं, सुखी जीवन जीने और स्वर्ग प्राप्त करने के लिए तपस्या और दान जैसे कुछ कार्य करने चाहिए। पुण्य कर्म करने से अनजाने में किए गए पाप भी नष्ट हो जाते हैं। इस प्रकार मनुष्य को नरक में नहीं जाना पड़ता।"
Hindi
Bengali
E-Learning Platform
Educational content often involves technical terms, mathematical expressions, and the need to maintain student engagement through varied intonation.
TTS Input:  "आज हम Einstein की Theory of Relativity के बारे में पढ़ेंगे। Theory कहती है कि समय और space एक दूसरे से जुड़े हुए हैं और इन्हें एक साथ space-time कहा जाता है। यह theory बताती है कि जब कोई object बहुत high speed से move करता है, तो उसके लिए time slow हो जाता है। इसे mathematically इस equation से express किया जा सकता है:

E = mc^2

जहाँ E energy है, m object का mass है, और c speed of light in vacuum है, जो लगभग 3 times 10^8 meters per second होती है। यह equation दिखाती है कि mass और energy interchangeable हैं और एक दूसरे में convert हो सकते हैं।"
Hindi
Multilingual news broadacasting
TTS Input with lots of abbreviation: "The ISRO (Indian Space Research Organisation) has successfully launched its latest satellite, GSAT-30, from the Satish Dhawan Space Centre. The satellite will enhance communication services across India. This achievement marks another milestone for ISRO following their earlier successful missions this year."
English
Tamil
Astrology Bot
Astrology applications need to convey mystical and predictive content with an appropriate tone and handling of astrological terminology.
TTS Input: "Namaste! Aaj aapka din shubh hai. Venus ki position se aapko aaj ek good news mil sakti hai. Office mein kisi senior se important task assign ho sakta hai. Stay confident!"
Hindi
Gujarati
Giving a Desi Touch to Google Maps:
Navigation services need to provide clear, timely instructions with accurate pronunciation of street names and landmarks.
TTS Input:  “Head south on Netaji Subhash Marg toward Dayanand Road. In 12 meters, turn left onto Dayanand Road. Continue straight for 350 meters, passing the United Bank of India ATM on your left."
Hindi
Speak to your users via IoT
Smart home devices need to convey information clearly and handle queries in natural, conversational language.
TTS Input:  "Good morning! It's 7:00 AM. The temperature today is 28 degrees Celsius, and the weather is very pleasant. You have a busy day ahead. Your first meeting is scheduled for 9:30 AM with the marketing team to discuss the upcoming campaign strategies.”
Marathi
Legal Documents
The powers of judicial review in the matters involving financial implications are also very limited. The wisdom and advisability of the Courts in the matters concerning the finance, are ordinarily not amenable to judicial review unless a gross case of arbitrariness or unfairness is established by the aggrieved party.​
Key Feature: With Formal Mode, you can create legal documents in different Indic languages while maintaining the formal tone.

Colloquial mode now empowers millions of Indians to access these complex documents by translating it in colloquial Indic language.
Other Translation Models
‍वित्तीय निहितार्थ से जुड़े मामलों में न्यायिक समीक्षा की शक्तियाँ भी बहुत सीमित हैं। वित्त से संबंधित मामलों में न्यायालयों का ज्ञान और सलाह आम तौर पर न्यायिक समीक्षा के लिए अनुकूल नहीं होते हैं जब तक कि पीड़ित पक्ष द्वारा मनमाने या अन्यायपूर्ण का एक गंभीर मामला स्थापित नहीं किया जाता है।​

Mayura (Formal + Preprocessing)
वित्त-संबंधी मामलों की समीक्षा करने के लिए न्यायपालिका की शक्ति काफी सीमित है। आम तौर पर, अदालतें वित्तीय मामलों में हस्तक्षेप नहीं करती हैं जब तक कि अन्याय या मनमाने ढंग से काम करने का स्पष्ट मामला न हो। यह आम तौर पर केवल तभी होता है जब निर्णय से प्रभावित व्यक्ति इसे साबित कर सकता है।​

Mayura (Colloqiual + Preprocessing)
Judiciary की financial-related cases को review करने की power बहुत restricted है। आमतौर पर, courts financial matters में interfere नहीं करते हैं जब तक कि unfairness या arbitrariness का clear case ना हो। ये आमतौर पर तभी होता है जब decision से प्रभावित व्यक्ति उसे prove कर सके।​
Unlock colloquial translation
I can help you sign up for our courses in just a few steps. Can you please provide your name and email address to get started?​


She's the GOAT when it comes to baking.
Formal
मैं कुछ ही चरणों में हमारे पाठ्यक्रमों के लिए साइन अप करने में आपकी मदद कर सकता हूँ। क्या आप कृपया अपना नाम और ईमेल पता प्रदान कर सकते हैं?

Colloquial
मैं आपको बस कुछ ही steps में हमारे courses के लिए sign up करने में मदद कर सकता हूँ। क्या आप अपना नाम और email address बता सकते हैं ताकि हम शुरू कर सकें?​

Other Models
जब बेकिंग की बात आती है तो वह बकरी है।

Colloquial Mode:
वे बेकिंग में महारत रखती हैं, उनके केक शानदार होते हैं।​

Visual
E-commerce requires clear communication of order details, prices, and delivery timelines, often mixing English terms with regional languages.
TTS Input: "Your order for 2 pairs of Allen Solly jeans and 1 Nike T-shirt has been confirmed. Total price: ₹3,999. Your order will be delivered in 2 days"
Hindi
Kanada
Healthcare Communication:
Healthcare communication requires accurate pronunciation of medical terms, dosages, and instructions, often involving complex terminology and precise numerical information.
TTS Input: "Namaste Sharma ji, Dr. Gupta ne aapko Metformin 500mg prescribe kiya hai. Ise daily two times, subah aur shaam ko khana ke baad lena hai. Kya aapko koi side-effects ka anubhav ho raha hai?"
Hindi
Gujarati
Multilingual Audiobooks:
Audiobooks require consistent voice quality across languages, natural code-mixing, and expressive narration to bring stories to life. Give a unique voice to your characters in the same language.
TTS Input:
कृष्ण: "अर्जुन, धर्म का मार्ग अक्सर चुनौतियों से भरा होता है, लेकिन विश्वास और संकल्प के साथ, सबसे अंधेरी रातें भी सुबह में बदल जाती हैं।"

अर्जुन: "कृष्ण, आपका ज्ञान हमारा मार्गदर्शक तारा है। मैं धर्म की रक्षा करने और अपने लोगों की रक्षा करने का प्रयास करूंगा।"

द्रौपदी: "कृष्ण, मेरा हृदय अन्याय के बोझ से भारी है, लेकिन आपकी उपस्थिति मुझे आशा से भर देती है। मुझे विश्वास है कि न्याय की जीत होगी।"
Krishna
Arjun
Draupadi
Male Professional newscaster voice in English:
TTS Input:  "The ISRO (Indian Space Research Organisation) has successfully launched its latest satellite, GSAT-30, from the Satish Dhawan Space Centre. The satellite will enhance communication services across India. This achievement marks another milestone for ISRO following their earlier successful missions this year."
TTS Output
Hindi (Female voice):
TTS Input:  "इसरो, Indian Space Research Organisation ने अपना latest satellite, GSAT-30, Satish Dhawan Space Centre से, successfully launch कर दिया है। , ये satellite पूरे India में, communication services को improve करेगा। , ये इस साल ISRO के successful missions के बाद , एक और बड़ी achievement है।"
TTS Output
Tamil (Male voice):
Phase
Phase 1
Phase 2
Phase 3
Input
English audio (sentences)
English + Hindi audio (sentences)
English + Hindi audio (questions)
Output
Transcriptions
English -> Transcriptions. Hindi -> Transcriptions translated to English
Answers in English
Hours of audio
35
100
30
LR schedule
Constant with warmup
Cosine decay
Cosine decay with warmup