Tech Tonic | OpenAI suddenly realises using someone else’s work is not cool

Jan 31, 2025 04:14 PM IST

Share Via

Copy Link

OpenAI, Sam Altman and Microsoft have no right to stand on any sort of pedestal and educate the world about ethics of AI

I’ll start with a little something that is off-topic, but relevant nonetheless (in the larger scheme of things). Mark Zuckerberg, in the fourth quarter earnings call this week, insisted he wants to “get back to how Facebook was used back in the day.” Basically, the fun and cool social media platform (at least it was compared with Hi5 and Orkut; yes, I’m that old) Facebook used to be, focused on friends and those conversations, instead of the algorithmic, unrecognisable mess it has become off late. The bigger takeaway, Meta isn’t as freaked out about DeepSeek’s rise, as perhaps some other parts of the Silicon Valley (https://test.everynews.info/technology/deepseek-has-us-ai-firms-talking-about-jevons-paradox-and-invigoration-101738152628392.html) established order. “There’s a number of novel things they did we’re still digesting,” is Zuckerberg’s assessment of where things stand.

Distillation is the technique that’s used by developers to train smaller AI models to in theory replicate the capabilities of a larger model (in terms of parameters, and therefore ability to understand and contextualise) (Reuters)

Nevertheless, things aren’t as cool, calm and collected over at the OpenAI headquarters. Or at Microsoft’s either. The two companies, and they have a lot at stake due to their close partnership, are insisting some Chinese firms used OpenAI’s models to train their models. The timing is interesting, considering a very impressive DeepSeek R1 (https://test.everynews.info/business/chinese-start-up-deepseek-shakes-up-artificial-intelligence-universe-101738038386365.html) open source model, threw the cat amongst the pigeons just days earlier. Apparently, Microsoft security researchers alerted OpenAI that their API, or application programming interface, was used. Distillation is the technique that’s used by developers to train smaller AI models to in theory replicate the capabilities of a larger model (in terms of parameters, and therefore ability to understand and contextualise).

OpenAI points to its Terms of Use document, which I read through too. There is a point in the list of things under “what you cannot do” and it reads something like this — “Use Output to develop models that compete with OpenAI”. There is little to pin this on DeepSeek, except it is the biggest threat to OpenAI since its own attempts at an implosion a few months ago. Just because the security researchers found attempts to distill the model by entities emerging out of China. It could have been DeepSeek. In equal possibility, it may not have been DeepSeek.

I took this contextual scenic route to get to the core point — OpenAI is suddenly realising it is not cool to use someone else’s work to build your own? That’s extremely rich, isn’t it?

In the last year and a half, OpenAI has faced a number of lawsuits from authors and book publishers, content creators, news organisations, comedians, musicians and music publishers, visual artists and pretty much anyone with a relevant enough footprint on the World Wide Web, accusing the AI start-up of using their copyrighted work to train its models. Basically, that data and content was used to train OpenAI’s GPT models, and the company made money off them. Premium subscriptions (these go up to $200 per month), is one method. The partnership with Microsoft gave it a massive, priceless Windows PC user base, wrapped in the Copilot wrapper. What did OpenAI give back the copyright holders? Nothing.

The brazenness and flagrance was on its best viewing, when OpenAI in a written submission to the UK’s House of Lords Communications and Digital Select Committee explained it would be “impossible to train today’s leading AI models without using copyrighted materials.” Scarlett Johansson sued OpenAI, but here’s where things get interesting — Johansson turned down OpenAI’s offer to make her the female voice of its ChatGPT tool, but weeks later, heard that the Sky voice assistant sounded “eerily similar” to her voice.

Then in response to The New York Times lawsuit, the start-up seemed to resort to the argument that two wrongs may actually make one right (with a hint of patriotism). They wrote in a blog post, “Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.”

Since, OpenAI has made efforts to come to some sort of content sharing agreements with media houses and publishers, but that is in very minor density compared to the gravity of the situation. Token measures for optics? At some point last year, OpenAI has promised a Media Manager suite that would allow creators and content publishers to tick off what among their creations can be used for training AI models, and what cannot be. That is yet to see the light of day.

The latest chapter of this contradiction — just yesterday, Microsoft said they’ll integrate DeepSeek’s model into Copilot. That is, alongside OpenAI’s models. Let that sink in.

OpenAI, Sam Altman and Microsoft have no right to stand on a pedestal and educate the world about ethics of AI. That is, if at all there is a case of model distillation, as OpenAI and Microsoft seem to be suggesting. US tech order may or may not respond to the brilliance of DeepSeek’s frugal AI model creation methodology, and then giving it away for free. But they really shouldn’t need an AI chatbot to tell them what goes around, comes around.

Vishal Mathur is the technology editor for HT. Tech Tonic is a weekly column that looks at the impact of personal technology on the way we live, and vice-versa. The views expressed are personal.