Web Scraper Tool

01KP5B57CFXA9FTWAXRQKXRT5N

A powerful and flexible web scraping tool that automatically crawls websites, extracts structured data, and discovers new links.

by Odin Kael

4.7

5runs

Last updated:2026-04-14

Try for Free

2,000 Free Results

What is a Web Scraper Tool?

A web scraper tool is an automated website crawling utility designed to bulk scrape pages, extract structured data (titles, descriptions, keywords, headings, images, links, etc.), and automatically discover relevant links. With CoreClaw, you can obtain structured web data with zero code, empowering data collection, SEO analysis, competitive research, and content monitoring.

✅ Automatic Link Discovery - Automatically finds and crawls links within the same domain
✅ Depth Control - Configurable crawling depth (starting page depth is 0)
✅ Page Limits - Set the maximum number of pages per crawl to manage costs
✅ Resource Management - Option to block media/CSS to accelerate crawling speed
✅ jQuery Injection - Optional injection of the jQuery library for easier data extraction
✅ SSL Support - Ignores SSL errors from self-signed certificates
✅ Remote Browsers - Connects to fingerprint browser pools via CDP
✅ Flexible Configuration - Supports custom timeouts, wait conditions, and data extraction logic

What data can you extract from websites?

📄 Page URL	📝 Page Title
📝 Page Description	🔑 Keywords
📊 H1 Main Heading	📋 H2 Subheading List
📏 Text Length	🖼️ Image Count
🔗 Link Count	📏 Crawling Depth

How to use the Web Scraper Tool?

CoreClaw Web Scraper Tool handles proxy rotation, task scheduling, data standardization, and final delivery for you in the background. In just a few minutes, you can get your data by following these steps:

Create a free CoreClaw account using your email
Open the Web Scraper Tool control panel
Enter the starting URL and set parameters (depth, page limits, resource control, etc.)
Configure advanced options (jQuery injection, SSL settings, debug logs, etc.)
Click "Start" and let our cloud servers handle the crawling process
Download the cleaned dataset in JSON or CSV format

➡️ Input

Key Parameter Descriptions

Parameter	Type	Required	Default	Description
url	array	✅ Yes	-	List of starting URLs
maxCrawlingDepth	integer	✅ Yes	1	Maximum crawling depth (0 means starting page only)
maxPagesPerCrawl	integer	No	10	Maximum number of pages per crawl (0 means no limit)
pageLoadTimeoutSecs	integer	No	60	Page load timeout (seconds)
waitUntil	string	No	networkidle2	Page navigation completion condition
injectJQuery	boolean	No	false	Whether to inject the jQuery library
ignoreSslErrors	boolean	No	true	Whether to ignore SSL certificate errors
downloadMedia	boolean	No	false	Whether to download images/videos
downloadCss	boolean	No	true	Whether to download CSS stylesheets
debugLog	boolean	No	false	Whether to enable detailed debug logs

Usage Examples

Example 1: Basic Web Crawling

Starting URL: https://example.com
Max Depth: 1
Max Pages: 10
Result: Crawls the starting page and its first-level links, extracting titles, descriptions, keywords, etc., from all pages.

Example 2: Deep Crawling

Starting URL: https://blog.example.com
Max Depth: 3
Max Pages: 50
jQuery Injection: true
Result: Recursively crawls pages up to 3 levels deep, injecting jQuery to facilitate data extraction.

Example 3: Fast Crawling (Ignore Resources)

Starting URL: https://example.com
Download Media: false
Download CSS: false
Timeout: 30s
Result: Crawls only page text content, ignoring images, videos, and CSS for a significant speed boost.

⬅️ Output

For your convenience, output results are displayed in tables and tabs. You can choose to download the results in CSV/JSON format.

Output Content Description

Each crawled page will output the following data:

Basic Fields

url - Page URL
depth - Crawling Depth
title - Page Title
description - Page Description
keywords - Keywords

Structured Fields

h1 - Main Heading
h2List - List of subheadings
textLength - Text length
imageCount - Number of images
linkCount - Number of links

JSON Example :

json

{
  "url": "https://example.com/page",
  "depth": 1,
  "title": "Page Title",
  "description": "Page description text",
  "keywords": "keyword1, keyword2",
  "h1": "Main Heading",
  "h2List": ["Subheading 1", "Subheading 2"],
  "textLength": 5000,
  "imageCount": 12,
  "linkCount": 45
}

FAQ

How is the crawling depth calculated?

Crawling depth is calculated starting from the initial page:

Depth 0: Only the starting page is crawled.
Depth 1: Crawls the starting page + first-level linked pages.
Depth 2-10: Incremental depth, supporting up to 10 levels.

How can I control the number of pages crawled?

You can control the volume using two parameters:

maxPagesPerCrawl: Sets the maximum number of pages for each crawl.
maxCrawlingDepth: Limits how deep the crawler goes.

What is the purpose of jQuery injection?

The jQuery injection feature allows you to:

Easily extract specific data from the page.
Use jQuery selectors to find elements.
Simplify DOM manipulation and data extraction logic.

How are SSL certificate errors handled?

By default, the tool ignores SSL certificate errors (ignoreSslErrors: true).

✅ Self-signed certificates - Automatically ignored.
✅ Expired certificates - Automatically ignored.
✅ Incomplete certificate chains - Automatically ignored.

What are the options for the `waitUntil` parameter?

The following navigation completion conditions are supported:

load - Page is fully loaded (including all resources).
domcontentloaded - DOM is fully loaded (does not wait for images, etc.).
networkidle0 - At least 0 network connections for 500ms.
networkidle2 - At most 2 network connections for 500ms (Default).

How can I speed up the crawling process?

Use the following methods to boost performance:

Set downloadMedia: false - Skip images and videos.
Set downloadCss: false - Skip CSS stylesheets.
Reduce pageLoadTimeoutSecs - Shorten the timeout duration.
Use domcontentloaded as the wait condition.

What are the common use cases?

SEO Audits - Check titles, descriptions, heading structures, and counts for images and links.
Competitive Analysis - Extract product info, monitor pricing and descriptions, and track content changes.
Content Aggregation - Collect news articles, blog posts, and product listings from multiple sources.
Data Collection - Bulk extract structured data from websites.
Content Monitoring - Track changes in content and structure across multiple pages.

How can I customize the data extraction logic?

If you need to extract specific page data (such as prices, authors, dates, etc.), you can do so through custom data extraction logic. CoreClaw offers flexible configuration options, supporting customized extraction fields and rules based on your requirements.

Price Estimation

Results Limit

101,000 results

Estimated:

~$0.30

100 results × $0.003. You only pay for success.

Run Now

Buy Now

Quick Tips

New users get 2,000 free results
Failed requests are free
Export results in JSON or CSV

Explore more popular scrapers from our marketplace

View All Scrapers

Google Search Results (SERP) Scraper API

by CoreClaw

It queries the Google search engine by keyword and returns a structured SERP summary, including the final search parameters, organic results, related queries, and people-also-ask data.

4.6

133 runs

From $3/results

Dataset Deduplication & Merge Tool

by Odin Kael

Dedup Datasets Worker is a powerful tool for merging and deduplicating datasets from multiple JSON/JSONL files. Fully optimized for the CafeScraper platform with enhanced features and robust error handling.

4.7

13 runs

From $3/results

Google Sheets Import Export Tool

by Odin Kael

A powerful Google Sheets data import export tool designed for data synchronization, backup, and integration between Google Sheets and external systems. Supports three operation modes, two authentication methods, batch processing, data deduplication, and automatic backup.

4.8

2 runs

From $3/results

Cheerio Web Scraping

by Odin Kael

A high-speed static page scraper based on Cheerio, designed specifically for static HTML pages. Uses Cheerio for HTML parsing, delivering speeds 10-50 times faster than full browser rendering.

Web Scraper Tool

What is a Web Scraper Tool?

What data can you extract from websites?

How to use the Web Scraper Tool?

➡️ Input

Key Parameter Descriptions

Usage Examples

⬅️ Output

Output Content Description

FAQ

How is the crawling depth calculated?

How can I control the number of pages crawled?

What is the purpose of jQuery injection?

How are SSL certificate errors handled?

What are the options for the waitUntil parameter?

How can I speed up the crawling process?

What are the common use cases?

How can I customize the data extraction logic?

Price Estimation

You might also like

Google Search Results (SERP) Scraper API

Dataset Deduplication & Merge Tool

Google Sheets Import Export Tool

Cheerio Web Scraping

What are the options for the `waitUntil` parameter?