A high-speed static page scraper based on Cheerio, designed specifically for static HTML pages. Uses Cheerio for HTML parsing, delivering speeds 10-50 times faster than full browser rendering.
URLs to start crawling from | 开始爬取的 URL 列表
CSS selector for finding links. | 用于发现链接的 CSS 选择器
Only crawl URLs matching these glob patterns (e.g., https://example.com/blog/*). | 只爬取匹配这些 glob 模式的 URL
URL patterns to skip (e.g., /login, /admin, *.pdf). | 要跳过的 URL 模式
Maximum crawl depth (0 = start page only). | 最大爬取深度(0 = 仅起始页)
Maximum pages to crawl (0 = unlimited, recommend 50 for speed). | 最大爬取页面数(0 = 不限制,建议 50)
Maximum concurrent requests (recommend 3-5 for CDP browser). | 最大并发请求数(CDP 浏览器建议 3-5)
Page load timeout in seconds (lower = faster failures). | 页面加载超时时间(秒,越低失败越快)
Maximum retries for failed requests (0 = no retry). | 失败请求重试次数(0 = 不重试)
Custom JavaScript function to extract data. Use $ for Cheerio selector. | 自定义 JavaScript 函数提取数据,使用 $ 作为 Cheerio 选择器
Keep URL fragments (hash) in links. | 保留 URL 中的 Fragment(哈希部分)
Ignore SSL certificate errors. | 忽略 SSL 证书错误
Enable detailed logging. | 启用详细日志
Explore more popular scrapers from our marketplace
by CoreClaw
It queries the Google search engine by keyword and returns a structured SERP summary, including the final search parameters, organic results, related queries, and people-also-ask data.
by Kael Odin
Dedup Datasets Worker is a powerful tool for merging and deduplicating datasets from multiple JSON/JSONL files. Fully optimized for the CafeScraper platform with enhanced features and robust error handling.
by Kael Odin
A powerful Google Sheets data import export tool designed for data synchronization, backup, and integration between Google Sheets and external systems. Supports three operation modes, two authentication methods, batch processing, data deduplication, and automatic backup.
by Kael Odin
A powerful cross-browser web scraping tool using Playwright for complete browser rendering. Supports Chromium, Firefox, and WebKit browser engines. Perfect for dynamic pages, single-page applications (SPAs), infinite scroll pages, and cross-browser testing scenarios.