AI: is India falling behind?

The Government of India and a clutch of startups have set their sights on creating an indigenous foundational Artificial Intelligence large language model (LLM), along the lines of OpenAI’s ChatGPT, Google’s Gemini, and Meta’s Llama. Foundational AI, or LLMs, are manually trained systems that can churn out responses to queries. Training them requires large amounts of data and enormous computing power, two resources that are abundant on the internet and in the cyberspaces of Western countries respectively.

In India, the crucial advance of creating a homegrown LLM is likely to be an uphill climb, albeit one that the government and startups are keen on achieving. Hopes have especially been heightened after the success of DeepSeek. The Chinese firm, at a far lower cost than Western tech companies, was able to train a so-called ‘reasoning’ model that arrives at a response after a series of logical reasoning steps that are displayed to users in an abstracted form and are generally able to give much better responses. Policymakers have cited India’s low-cost advances in space exploration and telecommunications as a critical example of the potential to hit a similar breakthrough, and soon.

LLMs and small language models (SLMs) are generally compiled by condensing massive volumes of text data, typically scraped from the web, and ‘training’ the system through a neural network. A neural network is a machine learning model that roughly imitates the way a human brain works by linking several pieces of information and passing them through ‘layers’ of nodes until an output, based on multiple interactions in the hidden layers, results in an acceptable response.

Neural networks have been a tremendous breakthrough in machine learning and have for years been the backbone of services such as automated social media moderation, machine translation, recommendation systems on services such as YouTube and Netflix, and a host of business intelligence tools.

The AI rush

While deep learning and machine learning developments surged in the 2010s, the underlying research had several landmark developments, such as the ‘attention mechanism’, a natural language processing framework that effectively gave developers a way to break down a sentence into components, allowing computer systems to reach ever closer to ‘understanding’ an input that was not a piece of code. Even if this technology was not completely based on any sort of actual intelligence, it was still a massive leap in machine learning capabilities.

The transformer, which built on these advances, was the key breakthrough that paved the way for LLMs such as ChatGPT. A 2017 paper by researchers at Google laid out the transformer architecture, laying out for the first time the theory of practically training LLMs on graphics processing units (GPUs), which have emerged as critical for the entire tech industry’s AI pivot.

It was quite some time before OpenAI started practically implementing the findings of the advancement in a way that the public could witness. ChatGPT’s first model was released more than five years after the Google researchers’ paper, for a reason that has emerged as both a commercial headache for firms looking to leverage AI and for countries looking to build their capabilities: cost.

Simply training the first major model, ChatGPT 3.5, cost millions of dollars, not accounting for the data centre infrastructure. With the lack of immediate commercialisation, this kind of expense was necessarily a long shot, the kind that only a large tech company, or well-endowed venture capitalists, could finance in the medium term.

The result, however, was extraordinary. The generative AI boom began in earnest after ChatGPT’s first public model, showcasing the accumulated technical advancements in machine learning until its release. The Turing test, a benchmark that can be passed by a machine that responds to a query sufficiently similar to a human, was no longer a useful way to look at new AI models.

A head-spinning rush followed to ship out similar foundational models from other companies that were already working on the technology. Firms such as Google were, in 2022, already running their models like LaMDA. This model was in the news as one prominent developer at the company made public (and unsubstantiated) claims that the chatbot was pretty much sentient. The company avoided releasing the model as it worked on safety and quality.

The generative AI rush had changed things, however, with each company best positioned to work on such models under tremendous investor and public pressure to compete. From going to keeping LaMDA restricted to internal testing, Google quickly deployed a public version, named Bard, later renamed Gemini, and swapped out its Google Assistant product on many Android phone users’ handsets with this AI model instead. Today, Gemini offers half a dozen models for different needs and deployed the AI model into its search engine and productivity suite.

Microsoft was no different: the Windows maker deployed its own CoPilot chatbot, leveraging integrations with its own Office products and dedicating a button to summon the chatbot on new PCs. Firms such as Amazon and a host of other smaller startups also started putting out their products for public use, such as France’s Mistral and PerplexityAI, the latter seeking to bring genAI capabilities to search. An image generation breakthrough based on similar technology also mushroomed against this context, with services like Dall-E paving the way to create realistic-looking pictures.

Indian industry players showed early enthusiasm in leveraging AI, as global firms have, to see how the technology could boost productivity and increase savings. Like in the rest of the world, text-generation tools have been able to increase employees’ ability to do routine tasks and much of the corporate adoption of AI has revolved around such speed boosts in daily work. However, there have been questions about critical thinking as more and more tasks get automated, and many firms are yet to see a massive amount of value from this growth.

Yet, the fascination around AI models has yet to die down, as hundreds of billions of dollars are planned to be invested in setting up the computing infrastructure to train and run these models. In India, Microsoft is hiring real estate lawyers in every Union Territory and State to negotiate and obtain land parcels for building datacentres. The scale of the planned investments is a massive bet on the financial viability of AI models.

This is partly why the potential of advances such as DeepSeek have drawn attention. The Guangzhou-based firm was able to train the most cutting-edge models — capable of ‘deep research’ and reasoning — at a fraction of the investments being made by Western giants.

An Indian model

The cost reduction has led to an immense level of interest in whether India can replicate this success or, at least, build on it. Last year, before DeepSeek’s achievements gained global repute, the Union government dedicated ₹10,372 crore to the IndiaAI Mission, in an attempt to drive more work by startups in the field. The mission is architected in a public-private partnership model and aims to provide computing capacity, foster AI skills among youth, and help researchers work on AI-related projects.

After DeepSeek’s cost savings came into focus, the government rolled out the computing capacity component of the mission and invited proposals for creating a foundational AI model in India. Applications have been invited on a rolling basis each month, and Union IT Minister Ashwini Vaishnaw said he hoped India would have its foundational model by the end of the year.

Some policymakers have argued that there is an “element of pride” involved in the discourse around building a domestic foundational model, Tanuj Bhojwani, until recently the head of People + AI, said in a recent Parley podcast with The Hindu. “We are ambitious people, and want our own model,” Mr. Bhojwani said, pointing to India’s achievements in space exploration and telecommunications, shining examples of technical feats achieved at low costs.

There are of course monetary costs attached to training even a post-DeepSeek foundational model: Mr. Bhojwani referred to estimates that DeepSeek’s hardware purchases and prior training runs exceeded $1.3 billion, a sum that is greater than the IndiaAI Mission’s whole allocation. “The Big Tech firms are investing $80 billion a year on infrastructure,” Mr. Bhojwani pointed out, bringing the scale of Indian investment corpus into perspective. “The government is not taking that concentrated bet. We are taking very sparse resources that we have and we are further thinning it out.”

Pranesh Prakash, the founder of the Centre for Internet and Society, India, insisted that building a foundational AI model was important. “It is important to have people who are able to build foundation models and also to have people who can build on top of foundation models to deploy and build applications,” Mr. Prakash said. “We need to have people in India who are able to apply themselves to every part of building AI.”

There is also an argument that a domestic AI would enhance Indian cyber sovereignty. Mr. Prakash was dismissive of this notion, as many of the most cutting-edge LLMs — even the one published by DeepSeek — are open source, allowing researchers around the world to iterate from an existing model and build on the latest progress without having to duplicate breakthroughs themselves.

Beyond the investment hurdle, there is also the payoff ceiling: “Spending $200 a month to replace a human worker may be possible in the U.S., but in India, that is what the human worker is being paid in the first place,” Mr. Bhojwani pointed out. It is unclear as yet if the automation breakthroughs that are possible will ever be worthwhile enough to replace a significant number of human workers.

Even for Indian firms seeking to make and sell AI models, our experience in the software era of the previous decades shows a key dynamic that could limit such aspirations: “If we believe we will make an Indian model with local language content, you are capping yourself on the knee because the overall Indian enterprise market that will purchase AI is much smaller,” Mr. Bhojwani said, pointing out that even Indian software giants sell much of their services in the United States, which remains the main market for much of the technology industry.

Financial imperatives are not everything, though. The Indian government’s focus on initiatives like Bhashini — which uses neural networks to power Indian language translation — reveals an appetite to leverage AI models at scale like Aadhaar or UPI. While it is unclear how much political will and investment will end up feeding those ambitions, however, as Microsoft CEO Satya Nadella pointed out in a recent interview, if AI’s potential across the board “is really as powerful as people make it out to be, the state is not going to sit around and wait for private companies.”

While India has a large pool of talent, it suffers from perennial migrations of its top research minds across all fields, a dynamic that could slow down breakthroughs in AI. Academic ecosystems have also been underfunded, something that severely limits resources even for those who are staying in the country to work on these problems.

The data divide

The most imposing barrier may not be the investment one, or even the potential for commercialising investments. The barrier could be data.

Most LLMs and SLMs rely on a massive amount of data, and if the data is not massive, then it has to at least be high-quality data that has been curated and labelled until it is usable to train a foundational model. For many well-funded tech giants, the data that is publicly available on the web is a rich source. This means that most models have skewed toward English since that is the language that is spoken most widely in the world, and thus is represented enormously in public content.

Even monolingual societies like China, South Korea, and Japan can get away with the amount of data they can obtain, as these are monolingual societies where internet users largely use the internet — and participate in discussions online — in their languages. This gives LLM makers a rich foundation for customising models for local sensibilities, styles, and ultimately needs.

India does not have enough of this data. Vivekanand Pani, a co-founder of Reverie Language Technologies, has worked with tech companies for decades to nudge users to use the web in their own languages. Most Indian users, even those who speak very little English, navigate their phones and the internet in English, adapting to the digital ecosystem. While machine translation can serve as a bridge between English and Indian languages, this is a “transformative” technology, Mr. Pani said, and not a generative one, like LLMs. “We haven’t solved that problem, and we are still not willing to solve it,” Mr. Pani told The Hindu in a recent interview, referring to getting more Indians to use the web in Indian languages.

Yet, some firms are still trying. Sarvam, a Bengaluru-based firm, announced last October that it had developed a 2 billion parameter LLM with support for 10 languages plus English: Bengali, Gujarati, Hindi, Marathi, Malayalam, Kannada, Odia, Tamil, Telugu and Punjabi. The firm said it was “already powering generative AI agents and other applications.” Sarvam did this on NVIDIA chips that are in high demand from big tech firms building massive data centres for AI across the world.

Then there’s Karya, the Bengaluru-based firm that has been paying users to contribute voice samples in their mother tongue, gradually providing data for future AI models that hope to work well with local languages. The firm has gained global attention — including a cover from TIME magazine — for its efforts to fill the data deficit.

“India has 22 scheduled languages and countless dialects,” the IndiaAI Mission said in a post last July. “An India-specific LLM could better capture the nuances of Indian languages, culture, and context compared to globally focused models, which tend to capture more western sentiments and contexts.”

Krutrim AI, backed by the ridesharing platform Ola, is attempting a similar effort, by leveraging drivers on the Ola platform to be “data workers”. The IndiaAI Mission is itself planning on publishing a datasets platform, though details of where this data will come from and how it has been cleaned up and labelled have not yet been forthcoming.

“I think that we need to think much more about data not just as a resource and an input into AI, but as an ecosystem,” Astha Kapoor, co-founder of the Aapti Institute, told The Hindu in an interview. “There are social infrastructures around data, like the people who collect it, label it, and so on.” Ms. Kapoor was one of the very few Indian speakers at the AI Action Summit in Paris in February. “Our work reveals a key question: why do you need all this data, and what do I get in return? Therefore, people who the data is about, and the people who are impacted by the data, must be involved in the process of governance.”

Is the effort worth it?

And then there are the sticky questions that arose during the mass-scraping of English-language content that has fed the very first models: even if job displacement can be ruled out (and it is far from clear that it can), there are questions about data ownership, compensation, rights of people whose data is being used, and the power of the firms that are amassing them, that will have to be contended with fully. This is a process that is far from settled even for the pioneer models.

Ultimately, one of the defining opinions on foundational models came from Nandan Nilekani last December, when the Infosys founder dismissed the idea altogether based on cost alone. “Foundation models are not the best use of your money,” Mr. Nilekani had said at an interaction with journalists. “If India has $50 billion to spend, it should use that to build compute, infrastructure, and AI cloud. These are the raw materials and engines of this game.”

After DeepSeek dramatically cut those costs, Mr. Nilekani conceded that a foundational LLM breakthrough was indeed achievable for many firms: “so many” firms could spend $50 million on the effort, he said.

But he has continued to emphasise in subsequent public remarks that AI has to ultimately be inexpensive across the board, and useful to Indians everywhere. That is a standard that is still not on the horizon, unless costs come down much more dramatically, and India also sees a scale-up of domestic infrastructure and ecosystems that support this work.

“I think the real question to ask is not whether we should undertake the Herculean effort of building one foundational model,” Mr. Bhojwani said, “but to ask: what are the investments we should be making such that the research environment, the innovation, private market investors, etc., all come together and orchestrate in a way to produce — somewhere out of a lab or out of a private player — a foundational large language model?”

Published – May 05, 2025 07:28 pm IST

Source link

The AI rush

An Indian model

The data divide

Is the effort worth it?

Leave a Comment Cancel Reply

Stay Connected

Signup For Our Newsletter

The AI rush

An Indian model

The data divide

Is the effort worth it?

Leave a Comment Cancel Reply

Stay Connected

Signup For Our Newsletter

Subscribe To Our Weekly Newsletter