Americans are used to the price of things rising over the past year, but it could get even worse thanks to AI.
A decades-old arrangement between websites and bot crawlers is on thin ice with the rise of AI. The knock-on effect might be consumers paying for previously free content, write Kali Hays and Alistair Barr.
For years, crawlers scraped data from across the web to index information so search engines could better direct users. In the simplest terms, think of them as librarians ensuring books on dolphins, the Industrial Revolution, or physics are in the right place.
Both sides were mostly happy with the arrangement. Crawlers helped search engines direct visitors to the appropriate websites. Websites saw additional traffic from being properly indexed.
And if a website didn't want to be indexed, a single bit of code — robots.txt — meant it avoided getting its data scraped.
But now, thanks to AI's endless thirst for more training data, all bets are off.
Crawlers are now scraping data for tech giants' AI models. The new approach is a double whammy for creators since the models don't direct users to the original work and leverage the data they ingest to produce competing content.
Imagine libraries selling their own versions of the books they house without paying the original authors anything.
Researchers are working to understand the value specific pieces of data might have on a large AI model, Alistair writes. The US Copyright Office is opening a public comment period regarding generative AI and its impact on authors and creators.
But in the meantime, the one tool meant to stop these crawlers — robots.txt — has been made all but obsolete thanks to loopholes and a lack of legal precedent.
With nowhere left to turn, content creators might take drastic measures, Kali and Alistair write.
Some have chosen to stop engaging online entirely over fears of unintentionally training a bot that could eventually replace them.
Others might require users to pay and subscribe to view their content. Doing so would mean creators could protect their data and generate revenue.
But the impact could go even further depending on how websites try to remedy the situation.
As bad as some crawlers are for creators, others still provide value. GoogleBot, for example, collects web information so Google can rank and include it in search results, sending users to the original content.
However, now that flaws have been exposed in robots.txt's approach to blocking crawlers, websites might take a more aggressive stance. That could include a more all-encompassing block on crawlers, regardless of their intention.
As a result, the web could become incredibly difficult to navigate due to the lack of indexing.