Wiki actively provides data to AI developers to resist crawlers

Wikipedia Proactively Provides Data to AI Developers to Deter Web Scraping

Wikipedia is discouraging developers from scraping content from the platform by releasing a dataset optimized specifically for training artificial intelligence models. The Wikimedia Foundation announced on Wednesday that it has partnered with Kaggle to release a beta dataset containing "structured Wikipedia content in English and French." Wikimedia said the hosted dataset "was designed with machine learning workflows in mind," making it easier for AI developers to access machine-readable article data for modeling, fine-tuning, benchmarking, alignment, and analysis. The dataset content is under an open license agreement and as of April 15, it includes research abstracts, short descriptions, image links, infobox data, and article sections.

—— Theverge

via Windvane Reference Express - Telegram Channel