We are currently testing this tool and will let you know when it is ready for full use!
This web scraper runs entirely in your browser and is perfect for creating training data for AI models. It works by reading the website’s sitemap.xml file, making it particularly well-suited for modern platforms like Squarespace and Shopify that automatically generate sitemaps.
The scraper preserves the structure of your content, including headings, paragraphs, lists, and tables, while removing unnecessary elements like navigation menus and footers. It also captures metadata, images, and PDF documents.
More technical details
This scraper uses a CORS proxy to access websites. Before using it:
The scraper will:
Ready to start
EDIT: I can't believe how many great and useful replies I've got, and not a…
This article is divided into four parts; they are: • The Problem with Static Batching…
Dell, Microsoft, and others are unveiling new laptops to compete directly with the Neo, but…
Modern artificial intelligence systems rely on moving large amounts of data between memory and processors,…
Been experimenting with Anima lately and ended up spending way too much time refining prompts.…
Keychron's K2 HE Concrete Edition sounds like a cute gimmick, but as I discovered, there's…