After announcing its Open AI Ecosystem Framework earlier this year, the nonprofit organization Creative Commons announced its support for “pay-to-crawl” technology, or systems that automate the reward of website content when accessed by machines such as AI web crawlers.
Creative Commons (CC) is best known for spearheading the licensing movement that allows creators to share their work while retaining copyright. In July, the organization announced plans to provide a legal and technical framework for sharing datasets between companies that control the data and AI providers who want to train on it.
The nonprofit is now tentatively backing the pay-to-crawl system, saying it is “cautiously supportive.”
“When implemented responsibly, paid crawls can be a way for websites to maintain the creation and sharing of content, manage alternative uses, and keep publicly accessible content that would otherwise go unshared or disappear behind even more restrictive paywalls,” CC’s blog post said.
The idea behind pay-to-crawl, led by companies like Cloudflare, is to charge a fee each time an AI bot scrapes a site and collects content to train and update its models.
Previously, websites were free to allow web crawlers to index and incorporate their content into search engines such as Google. This arrangement benefited their site from appearing in search results, increasing visitors and clicks. However, with the advent of AI technology, things have changed. After consumers receive an answer via an AI chatbot, they are unlikely to click through to the source.
This change has already been devastating for publishers by reducing search traffic, and shows no signs of slowing down.
Pay-to-crawl systems, on the other hand, could help publishers recover from the AI-induced revenue hit. Additionally, it may work better for smaller web publishers who don’t have the leverage to negotiate one-off content deals with AI providers. Large deals have been signed between OpenAI and companies such as Condé Nast and Axel Springer. The same goes for Perplexity and Gannett. Amazon and the New York Times. such as Meta and various media publishers.
The CC offered several caveats about support for pay-to-crawl, noting that such systems could concentrate power on the web. Access to content could also be blocked for “researchers, non-profit organizations, cultural heritage institutions, educators, and other parties working in the public interest.”
It proposed a set of principles for responsible pay-to-crawl, including not making pay-to-crawl the default setting for all websites and avoiding blanket rules for the web. Additionally, he said pay-to-crawl systems must allow for not only blocking but also throttling to maintain access in the public interest. It must also be open, interoperable, and built with standardized components.
Cloudflare isn’t the only company investing in the paid crawl space.
Microsoft is also building an AI marketplace for publishers, and smaller startups like ProRata.ai and TollBit are starting to do the same. Another group called the RSL Collective has published its own specifications for a new standard called Really Simple Licensing (RSL). This standard specifies what parts of a website crawlers can access, but stops short of actually blocking them. RSL has since been adopted by Cloudflare, Akamai, and Fastly, and backed by Yahoo, Ziff Davis, O’Reilly Media, and more.
CC is one of the companies, along with CC Signal, that has announced support for RSL, a wide-ranging project to develop technologies and tools for the AI era.
