Puppeteer Web Scraping

Pricing

Puppeteer Web Scraping

odin-kael/full-browser-web-puppeteer-scraper

A powerful web scraping tool using Puppeteer for complete browser rendering. Supports full browser rendering, automatic Cookie banner closing, URL filtering, and more.

Try for Free

2,000 Free Results

What is Puppeteer Web Scraping?

Puppeteer Web Scraping is an automated web data extraction tool based on Puppeteer, designed for handling complex websites that require full browser rendering and JavaScript execution. With CoreClaw, you can scrape dynamic pages, SPA applications, and infinite scroll pages without writing code, enabling scenarios such as dynamic content collection, SPA data extraction, and interactive page scraping.

✅ Full Browser Rendering - Uses real browser engine to perfectly render dynamic content
✅ JavaScript Execution - Automatically executes page JavaScript to capture dynamically generated content
✅ Smart Link Discovery - Intelligently discovers and tracks page links using CSS selectors
✅ Custom Page Functions - Write custom JavaScript functions for flexible data extraction
✅ Infinite Scroll Support - Automatically scrolls to load more content for infinite scroll pages
✅ Automatic Banner Closing - Automatically identifies and closes Cookie banners and other distracting elements
✅ URL Filtering - Supports Glob patterns and regex for URL filtering
✅ Depth Control - Configurable crawling depth for precise control over scraping scope

What Data Can You Extract?

🔗 Page URL	📄 Page Title
📏 Crawling Depth	🔢 HTTP Status Code
🔗 Number of Links Found	📝 Page Content
🌐 Dynamically Generated Content	🎯 Custom Extracted Data
📊 Page Structure Information	⚠️ Error Information

How to Use Puppeteer Web Scraping?

CoreClaw Puppeteer Web Scraping handles browser startup, page loading, JavaScript execution, link discovery, and data extraction in the background. In just a few minutes, you can extract data through these steps:

Create a free CoreClaw account with your email
Open the Puppeteer Web Scraping dashboard
Enter the list of start URLs
Configure scraping parameters (depth, link selector, URL filtering, etc.)
Write custom page functions (optional, for extracting specific data)
Configure advanced options (infinite scroll, Cookie banner handling, etc.)
Click "Start" and let our cloud servers handle the scraping work
Download the cleaned dataset in JSON format

➡️ Input

Main Parameter Description

Parameter	Type	Default	Description
startUrls	array	-	Required. List of start URLs
linkSelector	string	`"a[href]"`	CSS selector for discovering links
maxDepth	integer	`1`	Maximum crawling depth (0 means only crawl start pages)
maxPages	integer	`100`	Maximum number of pages to crawl
pageFunction	string	-	Custom page function (JavaScript code)
infiniteScroll	boolean	`false`	Enable infinite scroll
scrollMaxPages	integer	`5`	Maximum scroll times for infinite scroll
scrollDelay	integer	`2000`	Scroll delay in milliseconds
closeCookieModals	boolean	`true`	Automatically close Cookie banners
urlPattern	string	-	Glob pattern for URL filtering (e.g.,`/article/`)
regexPattern	string	-	Regular expression for URL filtering
waitForSelector	string	-	Wait for specific element to appear before extracting data
pageTimeout	integer	`30000`	Page load timeout in milliseconds
navigationTimeout	integer	`60000`	Page navigation timeout in milliseconds

Usage Examples

Example 1: Basic Scraping

Start URL: https://example.com
Max Depth: 1
Max Pages: 10
Result: Scrapes start page and its first-level link pages, extracts basic data

Example 2: Scraping SPA Applications

Start URL: https://spa.example.com
Page Function: Extract dynamically generated product list
Wait Selector: .product-list
Result: Waits for dynamic content to load before extracting data

Example 3: Infinite Scroll Pages

Start URL: https://news.example.com
Infinite Scroll: true
Max Scroll Times: 10
Scroll Delay: 3000 milliseconds
Result: Automatically scrolls to load and extract all news content

Example 4: Custom Data Extraction

Start URL: https://blog.example.com
Page Function: Extract article title, author, and publish date
Link Selector: a.article-link
Result: Extracts detailed metadata from blog articles

Example 5: URL Filtering

Start URL: https://example.com
URL Pattern: **/article/**
Regex Pattern: ^https://example\.com/article/\d+$
Result: Only scrapes article pages matching the pattern

⬅️ Output

For your convenience, output results are displayed in tables and tabs. You can download results in JSON format.

Output Description

Each scraped page will output the following data:

Basic Fields

url - Page URL
title - Page title
depth - Crawling depth (starting from 0)
statusCode - HTTP status code

Link Information

linksFound - Number of links found on this page
links - List of discovered links (optional)

Custom Data

Custom data extracted via pageFunction

Other Information

error - Error message (if any)
timestamp - Scraping timestamp

Example Data:

json

{
  "url": "https://example.com/page",
  "title": "Page Title",
  "depth": 1,
  "statusCode": 200,
  "linksFound": 45,
  "links": [
    "https://example.com/page2",
    "https://example.com/page3"
  ],
  "customData": {
    "author": "Author Name",
    "publishDate": "2024-01-01"
  },
  "error": "",
  "timestamp": "2024-01-01T00:00:00.000Z"
}

FAQ

What's the Difference Between Full Browser Scraping and Regular Scraping?

Use regular scrapers for static websites. Use full browser scrapers for dynamic websites and SPA applications.

How to Write Page Functions?

Page functions are custom JavaScript functions for extracting specific data from pages. Use page.evaluate() to execute code in the browser context for better performance.

How to Handle Infinite Scroll Pages?

Start with a smaller scrollMaxPages for testing, then increase after confirming the effect.

How Are Cookie Banners Automatically Closed?

The tool has built-in automatic Cookie banner closing functionality.

For special banners, use waitForSelector to wait for main content to load.

How to Set Crawling Depth?

The greater the depth, the more pages grow exponentially
Recommend using with maxPages to prevent scraping too many pages
For large websites, recommend depth not exceeding 3

Recommendation: In most scenarios, depth 1-2 is sufficient.

What Are the Use Cases?

Dynamic Page Scraping - Scrape dynamic content requiring JavaScript rendering
SPA Data Extraction - Extract data from React, Vue, Angular and other single-page applications
Infinite Scroll Content - Scrape social media, news lists and other infinite scroll pages
Interactive Pages - Pages requiring clicks, scrolling and other interactions to get content
E-commerce Product Collection - Scrape product information, prices, and reviews from e-commerce sites
Social Media Monitoring - Monitor social media posts and updates
News Aggregation - Aggregate latest news from major news websites
Data Backup - Back up website content to prevent data loss

How to Handle Anti-Scraping Mechanisms?

Automatic Handling:

Uses real browser, not easily identified
Automatically handles Cookie and Session
Supports proxy rotation (if configured)

Recommendation: Set scraping parameters reasonably and comply with website terms of use.

Is Scraping Data Legal?

Our full browser web scraper only publicly accesses websites and extracts visible content. Users should comply with the terms of service and usage agreements of target websites when using scraped data. We recommend using it only for legitimate business analysis and research purposes.

Pricing

Failed results don't count

Rating

4.7

Developer

Kael Odin

Worker Stats

4 Total runs

Success rate: 100.00%

Last updated: Apr 15, 2026

Google Search Results (SERP) Scraper API

by CoreClaw

It queries the Google search engine by keyword and returns a structured SERP summary, including the final search parameters, organic results, related queries, and people-also-ask data.

4.6

442 runs

From $3/results

Dataset Deduplication & Merge Tool

by Kael Odin

Dedup Datasets Worker is a powerful tool for merging and deduplicating datasets from multiple JSON/JSONL files. Fully optimized for the CafeScraper platform with enhanced features and robust error handling.

4.7

15 runs

From $3/results

Google Sheets Import Export Tool

by Kael Odin

A powerful Google Sheets data import export tool designed for data synchronization, backup, and integration between Google Sheets and external systems. Supports three operation modes, two authentication methods, batch processing, data deduplication, and automatic backup.

4.8

2 runs

From $3/results

Cheerio Web Scraping

by Kael Odin

A high-speed static page scraper based on Cheerio, designed specifically for static HTML pages. Uses Cheerio for HTML parsing, delivering speeds 10-50 times faster than full browser rendering.