close_game
close_game

OpenAI used over 1 million hours of YouTube data to train GPT-4 AI: Report

Apr 08, 2024 12:13 PM IST

Over one million hours of video content was transcribed which raised concerns about compliance with YouTube's policies, it was reported.

OpenAI used over a million hours of YouTube videos to train its large language model GPT-4, a report revealed as major tech companies are attempting to acquire more and more data to train their artificial intelligence (AI) models. The GPT-4 model was trained using a speech recognition tool named Whisper to transcribe YouTube videos, New York Times reported. As per this process, over one million hours of video content was transcribed which raised concerns about compliance with YouTube's policies as Google owned YouTube restricts use of its videos for independent applications.

OpenAI logo is seen. As per this process, over one million hours of video content was transcribed which raised concerns about compliance with YouTube's policies (Reuters)
OpenAI logo is seen. As per this process, over one million hours of video content was transcribed which raised concerns about compliance with YouTube's policies (Reuters)

Read more: Paytm's Vijay Shekhar Sharma shares he used ChatGPT to know more about this, netizens mock post

This comes days after YouTube CEO Neal Mohan was asked if OpenAI's Sora video generator uses data from YouTube in an interview with the Wall Stree Journal. He said that he was not aware if OpenAI used any YouTube data to train it new video tool but claimed that it would be a problem if OpenAI used YouTube videos.

Read more: Tata Steel share price hits 52-week high after Q4 update: Will it rise more?

The report also claimed that Google transcribed YouTube videos for AI training which could have potentially breached copyright laws. Even Mark Zuckerberg's Meta discussed acquiring Simon & Schuster to access a vast library of books. 

Why are AI companies obsessed with getting more and more data

The effectiveness of AI models gets enhanced by the volume of data they're trained on. It was earlier reported that the demand for high-quality data is so high that some tech companies might exhaust available internet data by 2026.

What have companies said on this so far?

Read more: Gold prices at fresh record high of 71,000, silver hits new peak: Check latest rates

OpenAI said that each of its AI models is trained on a unique dataset while Google acknowledged training AI models on some YouTube content under agreements with creators. 

Stay updated with the latest Business News on Petrol Price, Gold Rate, Income Tax Calculator along with Silver Rates, Diesel Prices and Stock Market Live Updates on Hindustan Times.
SHARE THIS ARTICLE ON
SHARE
Story Saved
Live Score
Saved Articles
Following
My Reads
Sign out
New Delhi 0C
Wednesday, May 07, 2025
Follow Us On