Can Soket AI Power India’s Foundational LLM Dream?

Can Soket AI Power India’s Foundational LLM Dream?

SUMMARY

Founded in 2019, Soket AI spent its early years exploring AI opportunities and collecting datasets before zeroing in on building India’s very own LLM with a multilingual focus

After Sam Altman’s 2023 remark doubting India’s ability to build ChatGPT-level AI, founder Abhishek Upperwal was spurred into action, accelerating Soket AI’s efforts under Project EKA to develop foundational models with academic partners

Now selected under India’s INR 10,037 Cr IndiaAI Mission, Soket AI is building an Indic-focussed LLM, while generating revenue through TensorStudio, its real-time speech API product

The month was June. The year: 2023. It had already been a little over six months since the world had woken up to the new normal in the realm of AI and ChatGPT. However, June 7 was different. The man behind ChatGPT, OpenAI’s CEO Sam Altman, had visited India. We welcomed Altman with open arms, but only to be told that it was “hopeless” to compete with them.

Something ticked off Abhishek Upperwal that day, and this ruffling of his feathers resulted in what India would come to know as SoketAI, a startup that will, in the not-so-distant future, help India make its very own LLM — India’s answer to the ChatGPTs of the world.

Cut to January 2025. Union minister Ashwini Vaishnaw announced that the country would endeavour to build a domestic large language model (LLM) as part of the INR 10,037 Cr IndiaAI Mission. 

Proposals were invited, and the stakeholders of the AI industry, including startups, formed a beeline to take part in the opportunity to build an AI-ready nation. 

Among the hundreds of Indian AI players seeking to make history was Bengaluru-based Soket AI, a relatively younger player that specialises in text-based LLMs.

Stars aligned, and the 2019-founded Soket AI is now one of the four startups, alongside Sarvam AI, Gnani.ai, and Gan.ai, to have been selected by the Ministry of Electronics and Information Technology (MeitY) to develop Indian LLMs. 

Two months before its selection, Inc42 stumbled upon the chance to speak with founder and CEO Upperwal, who shared that the startup was building an open-source foundational model under Project EKA. 

Back then, Soket AI was already collaborating with academic institutions, like IIT Gandhinagar and IIT Roorkee, under Project EKA to develop a 120 Bn-parameter foundational model. 

However, the only missing link in its endeavour to build a foundational AI model was the government’s support. This is because R&D was a major cost head for Upperwal, and with the Centre’s support, it has become easy to walk through one of the most difficult terrains of the AI building journey.  

Notably, the government is set to provide computational support for building the LLM.

With this, the Soket AI team is now ready to fast-track development. It is committed to delivering the first iteration of its Indic languages-focussed LLM within 12 months.

If not for Altman’s controversial statement, Upperwal would have remained stuck in the paradox of searching for a product or perhaps never finding one at all.

The Product Paradox

Destiny had something else planned for Upperwal. The founder, who positions Soket AI as one of the first Indian startups to build a globally competitive text-based LLM, had no idea what he was going to build until a few years ago. But one thing was certain — his drive to work in the area of AI.

Upperwal, a data scientist and a 2018 IISc graduate, incorporated Soket AI in February 2019, the venture did not see any activity for at least the next three years.

This is simply because problem-solving in the realm of AI requires access to large-scale datasets, and for Upperwal, a fresh graduate with zilch financial backing, it was next to impossible to gather resources to build something concrete.

Therefore, he focussed on getting access to datasets. As luck would have it, the Centre was building 100 smart cities back then. The aim was to integrate command and control centres, smart waste management, installation of more CCTV cameras in public places, and more.

Upperwal saw this as a great opportunity, given that smart signals, smart roads, and more technology enablement across cities generated huge amounts of unused data. 

His manifestation paved the way for him to work for the Smart Cities Mission, under the Ministry of Housing and Urban Affairs. It was during this time that an epiphany dawned upon him — connect all the smart cities and create a decentralised data exchange using peer-to-peer technology.

While this was just an idea, not much breakthrough was made by Upperwal.

Months passed, and the world woke up to a technological marvel in the form of OpenAI’s ChatGPT in November 2022.

Months passed again, but to no avail. He knew it was time for him to think in sync with the changing equations around the world, courtesy AI.

Finally, the day of Sam Altman’s reality check hit Upperwal hard. He found his true calling.

“A bunch of us from the IISc became determined to take a bet on building a base model from India. We started figuring out data sources, we wanted to do something different and built Indic models because English was already solved for,” said Upperwal.

Soket AI factsheet

Upperwal Sets The Indic LLM Wheel In Motion

Soon, Soket AI started building its first Indic language AI datasets for the development of AI models. Its ‘Bhasha’ series was launched in April 2024 with two datasets: ‘bhasha-wiki’ and ‘bhasha-wiki-indic’. Bhasha-wiki, an open source dataset, is built with 44.1 Mn Wikipedia articles translated into six major Indian languages from 6.3 Mn English articles.

While building these datasets, it was able to crack a few computational partnerships and started working on building its first language model, ‘Pragna’. 

Soket AI deemed Pragna-1B as India’s first open-source multilingual model. Also launched in April last year, it is available in four Indian languages – Hindi, Gujarati, Bangla and English. 

Pragna was a proof of concept (POC) model for the company and did not see much traction. However, it set the stage for the business and everything it would build going forward.

Then, Upperwal began collecting conversational data related to sales, services, and marketing, and started fine-tuning Pragna using this data. Soon, he was able to build a real-time speech API with a fine-tuned Pragna model at the centre.

Today, Soket AI’s main source of revenue is this API. Marketed as TensorStudio, the tech is a voice agent. The startup is now doubling down on its Project EKA for building a foundational LLM.

Having raised $150K in an angel round last year, Soket AI is now set to raise $15 Mn in a few months.

Soket AI is building two products — TensorStudio and the work-in-progress foundational AI model.

The Cost Conundrum

“Building foundational models is an expensive affair. The computational cost of the graphics processing units (GPUs) is often almost 90% of the total cost of building them,” Upperwal said. 

According to a 2025 Stanford Institute report, training GPT-4, which is almost a 1.8 Tn parameter model, cost OpenAI over $100 Mn, with compute cost estimated to be around $79 Mn. As per various reports, building the 175 Bn parameter model, GPT 3, cost the company more than $4 Bn.

This equation was challenged by DeepSeek earlier this year, when it built its foundational model (DeepSeek-R1) in just two months at a cost of less than $6 Mn. However, there have been multiple questions on the reality of the claims.

In fact, as per Epoch AI, the cost of training frontier AI models has grown by 2-3X per year since 2016. Meanwhile, the computing used to train recent models has grown 4-5X every year between 2010 and 2024.

An Inc42 report, too, finds that training frontier models might cost more than $1 Bn by 2027.

With cost being the biggest challenge for a small startup like Soket AI, building a foundational text model from the ground up is nothing if not pure grit. 

As a first step towards building its 120 Bn-parameter foundational model, Soket AI is building a smaller model with 7 Bn parameters — expected to be rolled out in six months. 

Following the success of this project, the company will start building the first iteration of its bigger LLM. 

“So, this part of the business will only start generating revenue in a few years,” Upperwal said.

To keep the cost conundrums at bay, Soket AI’s primary revenue stream is TensorStudio. The startup sells its API to system integrators, like contact centres, to automate audio conversations using AI agents. 

This real-time speech API, with Pragna’s capabilities at its centre, caters to conversational needs in banking. The company is now working to open up the API to more system integrators for usage across insurance, healthcare, telecom, and edtech.

While the founder did not share its revenue numbers, the charges for Soket AI’s speech API start from $0.012 per minute.

As far as the foundational model is concerned, there is no scope for an immediate revenue stream. However, computational support from the government will help it cull the building cost significantly. 

“The computing will be utilised to build the model. But even then, we will need some private backing where, once you build out the model with government support, we can ultimately create a commercial product out of it,” said Upperwal.

What’s Next In Line For Soket AI? 

With the government’s approval for its proposal, the company has started doubling down on the LLM building process. It is also increasing its team size, which currently stands at 10 employees.

The company is also in the process of onboarding two people, thoroughly involved with Project EKA since its inception, as its cofounders.

Besides, many students at various universities, like IIT Gandhinagar and IIT Roorkee, are also working under the project to build the country’s first frontier model.

Soket AI’s LLM will be heavily focussed on catering to requirements in defence, agriculture, legal, and education. It will be trained with six scripts that can process 20-22 languages.

“I want Soket AI to be a frontier lab from India. We’ll be driving innovation across various aspects of our existing and upcoming models. Over time, many of these innovations will evolve into revenue-generating products,” Upperwal said, adding that the revenue will keep the R&D wheel spinning.

Meanwhile, there is a dedicated team that is working on TensorStudio. Until now, it had relied on external services to synthesise the audio component of its real-time speech API, while the text was powered by Pragna. However, the founder noted that the quality of the synthesised audio often suffered when conveying different emotions.

To address the challenge, Soket AI is now training an internal text-to-speech model, using about 25,000 hours of audio data. As of now, this is available only in Hindi and English.

With the seeds of India’s foundational model now sown, Soket AI’s journey will be interesting to track from here on. Can it put India on the global AI map for good? 

[Edited by Shishir Parasher]

Note: We at Inc42 take our ethics very seriously. More information about it can be found here.

You have reached your limit of free stories
Join Us In Celebrating 5 Years Of Inc42 Plus!

Unlock special offers and join 10,000+ founders, investors & operators staying ahead in India’s startup economy.

2 YEAR PLAN
₹19999
₹5999
₹249/Month
UNLOCK 70% OFF
Cancel Anytime
1 YEAR PLAN
₹9999
₹3499
₹291/Month
UNLOCK 65% OFF
Cancel Anytime
Already A Member?
Discover Startups & Business Models

Unleash your potential by exploring unlimited articles, trackers, and playbooks. Identify the hottest startup deals, supercharge your innovation projects, and stay updated with expert curation.

Can Soket AI Power India’s Foundational LLM Dream?-Inc42 Media
How-To’s on Starting & Scaling Up

Empower yourself with comprehensive playbooks, expert analysis, and invaluable insights. Learn to validate ideas, acquire customers, secure funding, and navigate the journey to startup success.

Can Soket AI Power India’s Foundational LLM Dream?-Inc42 Media
Identify Trends & New Markets

Access 75+ in-depth reports on frontier industries. Gain exclusive market intelligence, understand market landscapes, and decode emerging trends to make informed decisions.

Can Soket AI Power India’s Foundational LLM Dream?-Inc42 Media
Track & Decode the Investment Landscape

Stay ahead with startup and funding trackers. Analyse investment strategies, profile successful investors, and keep track of upcoming funds, accelerators, and more.

Can Soket AI Power India’s Foundational LLM Dream?-Inc42 Media
Can Soket AI Power India’s Foundational LLM Dream?-Inc42 Media
You’re in Good company