India Risks Becoming Data Rich But Cognition Poor In The AI Era

India must move beyond data sovereignty to build and own LLMs as critical infrastructure, ensuring true AI autonomy and cultural integrity.

iStock.com/

By Krishnadevan V

Krishnadevan is Consulting Editor at BasisPoint Insight. He has worked in the equity markets, and been a journalist at ET, AFX News, Reuters TV and Cogencis.

June 20, 2025 at 1:54 AM IST

India’s digital policy conversation has revolved around data sovereignty with focus on where the data resides and who controls it. But as artificial intelligence, and particularly Large Language Models, become the engines that interpret, reason, and generate meaning from that data, the debate must move beyond storage to cognition. In today’s AI-driven world, sovereignty is no longer just about owning the bytes; it’s about owning the brains that make sense of them.

India could soon be on the verge of becoming a data-rich but cognition-poor nation, collecting petabytes of information but outsourcing its interpretation to foreign models. These global LLMs, trained primarily on Western data and assumptions, often miss the nuances of Indian life. For instance, “mugging” refers to assault in the West, and in local parlance it is students cramming for examination, or “tuition” is university fees in the West and it means ”private coaching” in India.

This isn’t just a translation error but a systemic blind spot that can shape everything from government policy to business strategy.

If data localisation was step one in reclaiming digital autonomy, step two must be the creation and control of sovereign LLMs. And this requires a bold policy shift with India giving recognition to LLMs as critical national infrastructure. Infrastructure status is a policy signal, a financial unlock, and a strategic imperative all rolled in one.

LLMs are the digital highways and power grids of the AI era, and are no longer just software. Building and running these models demands massive compute power, robust data centres, high-speed connectivity, gigawatts of power and secure storage. Yet, most government funding still treats LLMs as research projects rather than the foundational infrastructure they truly are. This misclassification limits both scale and speed, keeping India reliant on foreign cognition layers and exposing us to risks around data privacy, regulatory compliance, and digital sovereignty.

Granting infrastructure status to LLMs would unlock long-term capital, priority lending, tax incentives, and faster regulatory clearances. It would encourage public-private partnerships to co-develop foundational models that are open-source, multilingual, and tuned to Indian realities. It would allow budgetary allocation for LLMs under infrastructure heads, not just as innovation pilots. The ₹5 billion IndiaAI Mission in the Union Budget for 2025-2026 is a commendable start, but it pales in comparison to the tens of billion dollars being invested by the US, China, and the EU.

The stakes go well beyond economics. Sovereign LLMs are about cultural authenticity and digital egalitarianism. India’s linguistic and cultural diversity is unmatched, with 22 official languages and thousands of dialects. A homegrown LLM, trained on Indian languages, idioms, and lived experience, is not just about technically superiority, but about cultural. Social and contextual integrity. It can reflect constitutional values, accommodate social realities like caste and religious diversity, and avoid the algorithmic biases that foreign models can encode.

Categorising LLMs as infrastructure is also about access and inclusion. India-centric LLMs with voice-first and regional-language interfaces can bridge the digital divide for millions who are not fluent in English. When AI speaks the language of the people, it democratises access to information, government services, healthcare, and opportunity. It transforms digital growth from a privilege for the few into a right for the many.

Of course, building sovereign LLMs is a capital-intensive and technologically daunting task. But so was electrifying rural India, building telecom towers, or rolling out Aadhaar. Infrastructure is never cheap, but it always pays of as when it underpins the next wave of national competitiveness. Consultant McKinsey estimates that AI could add up to $400 billion to India’s GDP by 2030. But most of that value will accrue to those who build, not just use, these foundational models.

Relying on API access to foreign LLMs may seem expedient, but it is strategically brittle. In times of geopolitical stress or trade restrictions, that dependency can quickly become a vulnerability. AI sovereignty means more than just owning data centres. It means controlling the models, the weights, the training data, and the context in which meaning is derived. Anything less is akin to outsourcing judgment.

India’s budgetary and policy apparatus must move with urgency and clarity and reclassify LLMs as infrastructure. Allocate capital under infrastructure heads, not just innovation pilots. Encourage public-private partnerships to co-develop foundational models that are open-source, multilingual, and tuned to Indian sensibilities. Make computing power accessible to universities, startups, and non-profits that want to build responsibly and inclusively. And above all, view building LLMs as a nation-building activity similar to the view and status given to roads, ports, and power grids.

A country cannot claim AI superpowers without LLM sovereignty. And there is no LLM sovereignty without state-level commitment to funding, scale, and public infrastructure frameworks. The models that define how a country thinks, how it reasons, learns, and communicates must not be outsourced. If data is the new oil, then LLMs are the new refineries. It’s time India built its own.