Data for training stored overseas, copyright law doesn’t apply: OpenAI
Sibal argued that using ANI content to train its software did not constitute infringement under the Indian Copyright Act.
American company OpenAI on Friday denied allegations that its ChatGPT software reproduces verbatim content from news agency ANI, asserting before the Delhi High Court that its model is designed specifically to avoid such reproduction.

“Verbatim reproduction is not there anywhere. Today the model has received such levels of sophistication that verbatim reproduction is totally avoided, it would defeat the object of OpenAI if it is regurgitating the content,” said senior advocate Amit Sibal, representing OpenAI before Justice Amit Bansal.
Sibal argued that using ANI content to train its software did not constitute infringement under the Indian Copyright Act. Among the grounds why, the lawyer states that the Copyright Act applies only in India, while the data storage and software training for ChatGPT occurred outside India, where such activities are lawful.
“Training data used in the pre-training process is also not stored in India and is stored on servers outside India. No part of training or alleged storage is taking place in India and where it is being done is not unlawful. The copyright acts extend to the whole of India, but it does not extend outside India,” Sibal submitted.
He added that even if the act were applicable, the storage would not amount to copyright infringement since the company was only extracting “non-expressive elements” of the data for “non-expressive use,” which he argued is permissible under the act.
These submissions were made in response to ANI’s copyright infringement suit against OpenAI, which alleges the company trained its language model using ANI’s content without proper licensing, exploiting its work for commercial gain.
The case has attracted significant attention, with various industry groups—including the Indian Music Industry, the Federation of Indian Publishers and the Digital News Publishers Association—supporting ANI’s position. The outcome is expected to significantly impact how copyright laws apply to AI-generated content and the protection of news agencies’ original work in the digital age.
In November last year, the High Court issued summons to OpenAI but refrained from immediately restraining it from using ANI’s content after the company informed the court it had already blacklisted ANI’s domain in October. The court also appointed amici curiae, acknowledging the case raised complex legal questions considering new technological advancements.
During Friday’s hearing, Sibal further argued that even using data to generate responses for users did not constitute infringement, as the act does not prohibit data use for various purposes and the news agency cannot claim “special right” over “discovery of a fact.”
“Use in general is not prohibited. There is no substantial reproduction and there is no right that emerges from discovery of a fact. Just because the facts are similar, the same will not amount to ‘substantial similarity’,” Sibal submitted.
The next hearing is scheduled for April 2, when Sibal will continue his submissions on OpenAI’s behalf.
In its January reply, OpenAI urged the court to dismiss the case, arguing that courts in California had exclusive jurisdiction. It also reiterated that it uses data in a “non-expressive” manner. However, on January 28, the High Court declined to rule on jurisdiction separately and decided to hear arguments on both jurisdiction and merits together.
Previously, ANI had urged the High Court to rule favourably in its copyright infringement suit, asserting that despite OpenAI’s undertaking, the company was scraping content shared on ANI’s subscribers’ websites to train ChatGPT and generate responses. ANI’s counsel argued this constitutes copyright infringement, as distribution of its content neither divested ANI’s control over the content nor the copyright.