AI models don’t need publishers’ data

Sam Altman, CEO of OpenAI, at the Hope Global Forums annual meeting in Atlanta on Dec. 11, 2023.

Dustin Chambers | Bloomberg | Getty Images

DAVOS, Switzerland — Sam Altman said he was ‘surprised’ by the New York Times’ lawsuit against his company OpenAI, saying its artificial intelligence models didn’t need to train on the news publisher’s data.

Describing the legal action as a “strange thing,” Altman said that OpenAI had been in “productive negotiations” with the NYT before news of the lawsuit came out. According to Altman, OpenAI wanted to pay the outlet “a lot of money to display their content” in ChatGPT, the firm’s popular AI chatbot.

“We were as surprised as anybody else to read that they were suing us in the New York Times. That was sort of a strange thing,” the OpenAI leader said on stage at the World Economic Forum in Davos, Switzerland, Thursday.

He added that he isn’t that worried by the NYT lawsuit, and that a resolution with the publisher isn’t a top priority for OpenAI.

“We are open to training [AI] on the New York Times, but it’s not our priority,” Altman said in front of a packed Davos crowd.

“We actually don’t need to train on their data,” he added. “I think this is something that people don’t understand. Any one particular training source, it doesn’t move the needle for us that much.”

The New York Times sued both Microsoft and OpenAI late last year, accusing the companies of alleged copyright infringement through the use of its articles as training data for its AI models.

The NYT seeks to hold Microsoft and OpenAI accountable for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.”

In the suit, the NYT showed examples in which ChatGPT spewed out near-identical versions of NYT stories. OpenAI has disputed the NYT’s allegations.

The legal action has ignited worries that more media publishers could go after OpenAI with similar claims. Other outlets are looking to partner with the firm to license their own content, rather than battle it out in court. Axel Springer, for instance, has a deal with the company where it licenses its content.

OpenAI responded to the NYT lawsuit earlier this year, saying in a statement that instances of “regurgitation,” or spitting out entire “memorized” parts of specific pieces of content or articles, “is a rare bug that we are working to drive to zero.”

“We collaborate with news organizations and are creating new opportunities. Training is fair use, but we provide an opt-out because it’s the right thing to do,” OpenAI wrote in a statement last week.

Altman’s comments echo remarks that the AI leader made at an event organized by Bloomberg in Davos earlier this week. Then, Altman said that he wasn’t that worried about the NYT lawsuit, disputed the publisher’s allegations and said there would be plenty of ways to monetize news content in the future.

“There’s all the negatives of these people being like, oh, you know, don’t don’t do this, but the positives are, I think there’s going to be great new ways to consume and monetize news and other published content,” Altman said.

“And for every one New York Times situation, we have many more super productive things about people that are excited to build the future and not do the theatrics.”

Altman added there were ways that OpenAI could tweak the company’s GPT models, so that they don’t regurgitate any stories or features posted online online word-for-word

“We don’t want to regurgitate someone else’s content,” he said. “But the problem is not as easy as it sounds in a vacuum. I think we can get that number down and down and down, quite low. And that seems like a super reasonable thing to evaluate us on.”