A powerful web scraping tool using Puppeteer for complete browser rendering. Supports full browser rendering, automatic Cookie banner closing, URL filtering, and more.
Puppeteer Web Scraping is an automated web data extraction tool based on Puppeteer, designed for handling complex websites that require full browser rendering and JavaScript execution. With CoreClaw, you can scrape dynamic pages, SPA applications, and infinite scroll pages without writing code, enabling scenarios such as dynamic content collection, SPA data extraction, and interactive page scraping.
| 🔗 Page URL | 📄 Page Title |
| 📏 Crawling Depth | 🔢 HTTP Status Code |
| 🔗 Number of Links Found | 📝 Page Content |
| 🌐 Dynamically Generated Content | 🎯 Custom Extracted Data |
| 📊 Page Structure Information | ⚠️ Error Information |
CoreClaw Puppeteer Web Scraping handles browser startup, page loading, JavaScript execution, link discovery, and data extraction in the background. In just a few minutes, you can extract data through these steps:
| Parameter | Type | Default | Description |
|---|---|---|---|
| startUrls | array | - | Required. List of start URLs |
| linkSelector | string | "a[href]" | CSS selector for discovering links |
| maxDepth | integer | 1 | Maximum crawling depth (0 means only crawl start pages) |
| maxPages | integer | 100 | Maximum number of pages to crawl |
| pageFunction | string | - | Custom page function (JavaScript code) |
| infiniteScroll | boolean | false | Enable infinite scroll |
| scrollMaxPages | integer | 5 | Maximum scroll times for infinite scroll |
| scrollDelay | integer | 2000 | Scroll delay in milliseconds |
| closeCookieModals | boolean | true | Automatically close Cookie banners |
| urlPattern | string | - | Glob pattern for URL filtering (e.g.,**/article/**) |
| regexPattern | string | - | Regular expression for URL filtering |
| waitForSelector | string | - | Wait for specific element to appear before extracting data |
| pageTimeout | integer | 30000 | Page load timeout in milliseconds |
| navigationTimeout | integer | 60000 | Page navigation timeout in milliseconds |
Example 1: Basic Scraping
Example 2: Scraping SPA Applications
.product-listExample 3: Infinite Scroll Pages
Example 4: Custom Data Extraction
a.article-linkExample 5: URL Filtering
**/article/**^https://example\.com/article/\d+$For your convenience, output results are displayed in tables and tabs. You can download results in JSON format.
Each scraped page will output the following data:
Basic Fields
Link Information
Custom Data
Other Information
Example Data:
Use regular scrapers for static websites. Use full browser scrapers for dynamic websites and SPA applications.
Page functions are custom JavaScript functions for extracting specific data from pages. Use page.evaluate() to execute code in the browser context for better performance.
Start with a smaller scrollMaxPages for testing, then increase after confirming the effect.
The tool has built-in automatic Cookie banner closing functionality.
For special banners, use waitForSelector to wait for main content to load.
maxPages to prevent scraping too many pagesRecommendation: In most scenarios, depth 1-2 is sufficient.
Automatic Handling:
Recommendation: Set scraping parameters reasonably and comply with website terms of use.
Our full browser web scraper only publicly accesses websites and extracts visible content. Users should comply with the terms of service and usage agreements of target websites when using scraped data. We recommend using it only for legitimate business analysis and research purposes.
Explore more popular scrapers from our marketplace
by CoreClaw
It queries the Google search engine by keyword and returns a structured SERP summary, including the final search parameters, organic results, related queries, and people-also-ask data.
by Odin Kael
A powerful Google Sheets data import export tool designed for data synchronization, backup, and integration between Google Sheets and external systems. Supports three operation modes, two authentication methods, batch processing, data deduplication, and automatic backup.
by Odin Kael
A high-speed static page scraper based on Cheerio, designed specifically for static HTML pages. Uses Cheerio for HTML parsing, delivering speeds 10-50 times faster than full browser rendering.
by Odin Kael
A powerful cross-browser web scraping tool using Playwright for complete browser rendering. Supports Chromium, Firefox, and WebKit browser engines. Perfect for dynamic pages, single-page applications (SPAs), infinite scroll pages, and cross-browser testing scenarios.