A powerful and flexible web scraping tool that automatically crawls websites, extracts structured data, and discovers new links.
A web scraper tool is an automated website crawling utility designed to bulk scrape pages, extract structured data (titles, descriptions, keywords, headings, images, links, etc.), and automatically discover relevant links. With CoreClaw, you can obtain structured web data with zero code, empowering data collection, SEO analysis, competitive research, and content monitoring.
| 📄 Page URL | 📝 Page Title |
|---|---|
| 📝 Page Description | 🔑 Keywords |
| 📊 H1 Main Heading | 📋 H2 Subheading List |
| 📏 Text Length | 🖼️ Image Count |
| 🔗 Link Count | 📏 Crawling Depth |
CoreClaw Web Scraper Tool handles proxy rotation, task scheduling, data standardization, and final delivery for you in the background. In just a few minutes, you can get your data by following these steps:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| url | array | ✅ Yes | - | List of starting URLs |
| maxCrawlingDepth | integer | ✅ Yes | 1 | Maximum crawling depth (0 means starting page only) |
| maxPagesPerCrawl | integer | No | 10 | Maximum number of pages per crawl (0 means no limit) |
| pageLoadTimeoutSecs | integer | No | 60 | Page load timeout (seconds) |
| waitUntil | string | No | networkidle2 | Page navigation completion condition |
| injectJQuery | boolean | No | false | Whether to inject the jQuery library |
| ignoreSslErrors | boolean | No | true | Whether to ignore SSL certificate errors |
| downloadMedia | boolean | No | false | Whether to download images/videos |
| downloadCss | boolean | No | true | Whether to download CSS stylesheets |
| debugLog | boolean | No | false | Whether to enable detailed debug logs |
Example 1: Basic Web Crawling
Example 2: Deep Crawling
Example 3: Fast Crawling (Ignore Resources)
For your convenience, output results are displayed in tables and tabs. You can choose to download the results in CSV/JSON format.
Each crawled page will output the following data:
Basic Fields
Structured Fields
JSON Example :
Crawling depth is calculated starting from the initial page:
You can control the volume using two parameters:
The jQuery injection feature allows you to:
By default, the tool ignores SSL certificate errors (ignoreSslErrors: true).
waitUntil parameter?The following navigation completion conditions are supported:
Use the following methods to boost performance:
downloadMedia: false - Skip images and videos.downloadCss: false - Skip CSS stylesheets.pageLoadTimeoutSecs - Shorten the timeout duration.domcontentloaded as the wait condition.If you need to extract specific page data (such as prices, authors, dates, etc.), you can do so through custom data extraction logic. CoreClaw offers flexible configuration options, supporting customized extraction fields and rules based on your requirements.
Explore more popular scrapers from our marketplace
by CoreClaw
It queries the Google search engine by keyword and returns a structured SERP summary, including the final search parameters, organic results, related queries, and people-also-ask data.
by Odin Kael
Dedup Datasets Worker is a powerful tool for merging and deduplicating datasets from multiple JSON/JSONL files. Fully optimized for the CafeScraper platform with enhanced features and robust error handling.
by Odin Kael
A powerful Google Sheets data import export tool designed for data synchronization, backup, and integration between Google Sheets and external systems. Supports three operation modes, two authentication methods, batch processing, data deduplication, and automatic backup.
by Odin Kael
A high-speed static page scraper based on Cheerio, designed specifically for static HTML pages. Uses Cheerio for HTML parsing, delivering speeds 10-50 times faster than full browser rendering.