Generative AI might be good for traditional publishers

The prevailing view is that Generative AI is yet another threat to traditional publishers like newspapers, but unless you believe the extremely bullish case to AI, it could be a benefit.

Jul 25, 2024

The New York Times sued OpenAI (and Microsoft) recently for copyright infringement. GPT-4 - OpenAI's flagship Large Language Model - like all LLMs, makes use of enormous bodies of training text sourced from all across the internet, and without asking for permission from its creators or rights-holders. AI models are nothing without their training data (especially high-quality data). Since then, OpenAI have struck deals with several publishers (there’s a list here), and other moves in this area are related (Reddit IPO, X/Twitter changing access to their API).

This is a sore point for traditional publishers, most of whom have suffered from the digital revolution1, with newspaper revenue and circulation falling by over 50% in the UK and US, and many local newspapers going out of business altogether2. There is a general feeling that big tech is to blame for this in some way3 - either through misbehaviour in the ad market or capricious algorithm design - so as they pile into AI-models trained on questionably-obtained data, it feels like another attempt for them to benefit at the publishers’ expense. Google’s implementation of AI-generated snippets at the top of search results is a particular bone of contention as Google search referral is a valuable source of traffic for publishers.

Publisher traffic generally falls into direct traffic - where someone opens a homepage or app - and third-party traffic, where someone clicks a link from Google, Facebook, Reddit, X/Twitter etc. Many newer digital publishers such as Buzzfeed were reliant on third party traffic from Facebook or other platforms (but mostly Facebook), which was an unsustainable strategy when Facebook changed their algorithm to suppress news. X/Twitter has also suppressed links.

Google refers lots of traffic from news and topic searches and this is the kind of referral that is under threat from its AI product: if it can provide the answer there and then, there will be no need for the user to click through to a website and generate ad revenue for the publisher (or some chance of developing further interest in their content).

However, this possible future for Search - and AI assistants more broadly - is dependent on LLMs solving or massively reducing their hallucination problems, where they will fabulate confident but wrong answers to queries. As Google discovered with its AI Search product, it runs the risk of taking total nonsense as its source, and this is where the real problem for AI-search - and possible gain for traditional publishers - comes to the fore. AI makes it easy to mass-produce text of incredibly low quality. Google Search was already perceived to be degrading due to SEO hacking and an impending tidal wave of slop makes it plausible that it could become unusable4.

One of the problems that publishers have had to deal with was the proliferation of user-generated content and much greater competition from smaller websites. One of their roles - for better or for worse - was gatekeeper: to have a big audience, you had to work for a big publisher. Unfortunately for publishers, the Internet was the death of the gatekeepers. The User Generated Internet - blogs, forums, social media - gave a voice to lots of the people who had been excluded from the higher echelons of the media. But AI is a serious threat to the user-generated internet: both Facebook and X/Twitter have serious issues with slop and bots. As Google and the social media sites have their experiences progressively degraded through the flood of nonsense5, this may push people back to traditional publishers: their writers are verifiably not bots6, and their pictures are not generated by AI. You don't need to worry about the provenance of an image on the Guardian, BBC or NYT like you do with something on Twitter or Facebook7.

So as things stand, in the short / medium term, publishers will get paid by AI companies to licence their content, so those companies can pollute the user-generated internet to such an extent that it becomes unusable and therefore drive users back to traditional publishers. That seems like a win for traditional publishers!

OK, that is the best-case scenario for traditional publishers and AI. What would make it not happen? What would be the events that suggest they are right to worry about AI?

The bull case for AI is right

LLMs do continue to develop rapidly, they don’t hit a wall on progress, and they can substantially reduce the hallucination problem. If that happens, Google’s vision for AI and Search - a user types a query, gets their answer, no need to click through to a website - will be reached and publishers will lose out.

Publishers pile into AI too

The above scenario depends on publishers being perceived to be - and actually being - untainted by AI. Sports Illustrated is the most prominent example of publishers caving to financial pressures and attempting to use AI, but AI-generated football match descriptions and simple business updates have been experimented with much more widely in the past. Publishers should either not do this at all and draw a red line, or be very clear about how and when they use AI8.

Good human-verification systems

At the moment, there is no widespread and reliable human-verification system that can guarantee that the content is being produced by a person. If users of Facebook and X/Twitter had to (or were able to) verify that they were real people, this would reduce their bot problem. Similarly, if Google only indexed or somehow flagged content that was verifiably produced by humans somehow9, it would limit the impact of slop on results. Some way of detecting if text or images were generated with AI could also mean they could be downweighted in results.

The extremely bear case for AI is right and we all get paperclipped

Still bad for publishers10.

People just… don’t really care?

We drift Wall-E-style into a world of slop optimised for engagement, and continue to scroll through feeds chock-full of nonsense (this might be more likely if the amount of slop isn’t completely overwhelming, or if people are happy to follow AI-generated influencers).

Conclusion

After two decades of internet-powered decline for publishers, as they’ve watched the tech giants, their rivals in the advertising industry, grow and grow, I think it’s fair to say that trust and optimism are in short supply. People in the media don’t seem to assume any good faith in the development of AI11, and historically publishers have suffered from secular trends that the tech industry has benefited, and changes the tech companies have made themselves, but that doesn’t mean that it will be the same this time. Publishers may well find themselves as little islands of trustworthy sanity in a sea of slop, and might benefit from a trend for once.

If you really, really liked it, you can also buy me a coffee.

Disclaimer: I work for the Guardian but not on AI or anything involved in policy or strategy. Views very much my own and not the company's.

This applies to local newspapers even more than national newspapers, but nationals have generally felt the pain too. Ironically it doesn’t apply so much to the NYT, which has done the best of probably any paper in the Anglosphere, although its revenue is still down on its peaks in the early 2000s (its stock price is higher).

The remaining papers are full of low-quality rehashes of national news stories and SEO-bait.

I don’t think this is really the case - big tech obviously are notable beneficiaries of the digital revolution but “the internet generally” is what has caused the problems for papers, not the actions of big tech specifically. I have more thoughts on this I might elaborate on in the future.

Unless Google sort themselves out. This is not impossible but nobody seems to have a high opinion of their current leadership or strategy.

AI SEO slop is already becoming a problem.

Yes, Sports Illustrated ruined my point, I will get to that

Ironically lots of the images of Google's AI snippets making mistakes were fakes, although of the traditional and not AI generated kind.

For example, The Guardian says it will explicitly state where generative AI has been used, in the event that it ever does

Not claiming this would be easy / simple / actually feasible.

https://en.wikipedia.org/wiki/Instrumental_convergence#Paperclip_maximizer

Leftists seem to be convincing themselves that there is no substance in AI and the investment boom is not just over-hyped but completely without foundation. I’m not convinced this is the case, but in the event that it is, this will be another reason that traditional publishers will be fine.

JPod

Discussion about this post