DeepSeek: Geopolitical, Technological & a Layman’s view of big AI onset

ByHT Team

Feb 21, 2025 11:17 AM IST

Share Via

Copy Link

Deepseek’s models rely on a process called distillation (i.e.) using foundational models like Llama a to train a smaller more light-weight model.

Before we deeply seek (pun intended) to understand all the buzz around Deepseek, let us first try to get to the basics and answer the first question that popped up in your head – What exactly is Deepseek? It is a Chinese AI company that has recently created ripples in the AI industry with the development of its Large Language model to rival ChatGPT - Deepseek-R1. As of a few days ago, Deepseek was the most downloaded iOS app in the US. It has also overtaken the likes of ChatGPT and Google Gemini to become the most downloaded iOS app in India too.

As of a few days ago, Deepseek was the most downloaded iOS app in the US.(Deepseek)

What exactly is a Large Language Model?

The most popular LLM that we know as of today is ChatGPT. There are others like Meta’s Llama & Google Gemini. An LLM, short for Large Language Model, is a type of Artificial intelligence that understands and generates human-like text. Mostly it is used for answering questions in a conversational structure, writing & editing, summarizing, coding, generating creative content like ideas, poems etc.

How is Deepseek's technology different?

Some of the technological innovations in Deepseek are as below:

Mixture of Experts model:

MoE (Mixture of Experts) Architecture is what Deepseek is based on. Instead of making one giant AI (like ChatGPT), this architecture splits the AI into smaller expert models and activates only those experts that are necessary for the task at hand. Some drawbacks may be scaling or hallucinations, but these are minor disadvantages when compared to the cost advantages.

8-bit floating point training:

Computers generally store data as 16 or 32-bit numbers. Deepseek uses 8-bit floating point training technique which has enabled it to speed up training and lower costs without compromising on accuracy.

Reinforcement learning:

Deepseek works on reinforcement learning, which means it learns through trial and error, rewards and punishments or in other words – experience.

Distillation:

Distillation means to use foundational models like Llama to train smaller models. While this implies that Deepseek will be restricted to the limitations of its foundational models, it also means that developing models on top of existing technology is a lot more efficient than reinventing the wheel.

What are the seismic shifts that Deepseek brings in AI development?

Firstly, Deepseek is open source whereas OpenAI is in fact closed source. This increases accessibility for developers across the world to use non-proprietary technology and build upon it without licensing restrictions.

Efficient resource use – Deepseek’s models rely on a process called distillation (i.e.) using foundational models like Llama a to train a smaller more light-weight model. This has enabled Deepseek to run on consumer-grade CPUs and laptops, not massive datacenters like the foundational models. In a sense, Deepseek uses the best of the existing LLMs to train a cheaper, more efficient yet resourceful model. According to estimates, the company spent less than $6 million to train the DeekpSeek-V3 model compared to over $100 million spent by competitors for training similar models.

Performance – DeepSeek has developed AI models, such as DeepSeek-V3 and DeepSeek-R1, that perform on par with or even surpass existing models from leading players like OpenAI and Meta.

All these factors look like they may encourage small and medium enterprise adoption quickly.

What are the Geopolitical tensions around this development?

Democratization of AI is the most significant change that Deepseek brings with it. This significantly lowers the entry barrier, possibly enabling small and medium enterprises, researchers and hobbyists into the AI race as well. It is significant to note that Deepseek in itself is a passion project of Liang Wenfeng, who had earlier co-founded High-flyer, a quantitative hedge fund leveraging AI for trading and investment strategies.

Challenge to United States’ AI dominance:

While just a few days earlier, the dominance of the United States in the AI domain remained unchallenged, Deepseek has certainly put a dent to that perception. If nothing else, this effectively neutralizes the United States’ lead in the AI race. Even with severe export controls of Nvidia chips to China, this has been achieved. In a sense, the Chinese were forced to find a resource effective solution due to the export ban of the latest Nvidia chips to China.

Influence of the Chinese Government:

Since the model was developed in China and might have direct or indirect government involvement, this has raised a lot of questions regarding censorship issues. Some European countries have even started banning the app. India’s IT minister, Ashwini Vaishnaw says India will welcome Deepseek but will host the model on its own servers located in India.

Is India developing a LLM too?

IT minister, Ashwini Vaishnaw says that the Indian Government will also be funding companies developing LLMs as part of the IndiaAI mission and expect to build one within the next year.

Authored by: Varun Krishnan