Playwright 网页抓取

定价

Try for free

Playwright 网页抓取

odin-kael/cross-browser-web-playwright-scraper

一款使用 Playwright 进行完整浏览器渲染的强大跨浏览器网页爬虫工具。支持 Chromium、Firefox 和 WebKit 三大浏览器引擎。完美适用于动态页面、单页应用（SPA）、无限滚动页面以及跨浏览器测试场景。

免费试用

起始 URL | Start URLs必填

URLs to start crawling from. Supports multiple URLs. | 开始爬取的 URL 列表，支持多个 URL

类型: array

链接选择器 | Link Selector可选

CSS selector for finding links to follow. | 用于发现并跟踪链接的 CSS 选择器

类型: select

默认: a[href]

选项：

All Links (a[href])Navigation Links (nav a)Article Links (article a)Custom...

Glob 模式 | Glob Patterns可选

Only crawl URLs matching these patterns (e.g., https://example.com/blog/*). | 只爬取匹配这些模式的 URL

类型: array

排除模式 | Exclude Patterns可选

URL patterns to skip (e.g., /login, /admin, *.pdf). | 要跳过的 URL 模式

类型: array

最大爬取深度 | Max Depth可选

Maximum crawl depth (0 = start page only, 1 = follow one level). | 最大爬取深度（0=仅起始页，1=跟踪一层）

类型: integer

默认: 1

最大页面数 | Max Pages可选

Maximum pages to crawl (0 = unlimited, recommend ≤50 for speed). | 最大爬取页面数（0=不限制，建议≤50）

类型: integer

默认: 50

最大结果数 | Max Results可选

Maximum results to output (0 = unlimited). | 最大输出结果数（0=不限制）

类型: integer

默认: 0

最大并发数 | Concurrency可选

Concurrent browser tabs (recommend 3-5 for best performance). | 并发浏览器标签数（建议3-5以获得最佳性能）

类型: integer

默认: 3

页面超时(秒) | Page Timeout (secs)可选

Page load timeout in seconds (lower = faster failure detection). | 页面加载超时秒数（越低失败检测越快）

类型: integer

默认: 30

页面函数超时(秒) | Function Timeout (secs)可选

Page function execution timeout in seconds. | 页面函数执行超时秒数

类型: integer

默认: 60

重试次数 | Retries可选

Retries for failed requests (0 = no retry). | 失败请求重试次数（0=不重试）

类型: integer

默认: 2

等待事件 | Wait Until可选

When to consider page navigation complete. 'domcontentloaded' is fastest. | 页面导航完成的判定条件，'domcontentloaded' 最快

类型: select

默认: domcontentloaded

选项：

Load EventDOM Content LoadedNetwork Idle

下载媒体 | Download Media可选

Download images and media files (slower). | 下载图片和媒体文件（会变慢）

类型: boolean

默认: false

下载 CSS | Download CSS可选

Download CSS stylesheets. | 下载 CSS 样式表

类型: boolean

默认: false

忽略 CORS 和 CSP | Ignore CORS/CSP可选

Bypass CORS and Content Security Policy restrictions. | 绕过 CORS 和内容安全策略限制

类型: boolean

默认: false

关闭 Cookie 弹窗 | Close Cookie Modals可选

Auto-close cookie consent popups. | 自动关闭 Cookie 同意弹窗

类型: boolean

默认: false

最大滚动高度 | Max Scroll Height可选

Auto-scroll height in pixels (0 = disabled). Useful for infinite scroll pages. | 自动滚动高度像素（0=禁用），适用于无限滚动页面

类型: integer

默认: 0

保留 URL Fragment | Keep URL Fragments可选

Keep URL hash fragments in crawled links. | 保留爬取链接中的 URL 哈希部分

类型: boolean

默认: false

忽略 SSL 错误 | Ignore SSL Errors可选

Ignore SSL certificate errors. | 忽略 SSL 证书错误

类型: boolean

默认: true

调试日志 | Debug Log可选

Enable detailed debug logging. | 启用详细调试日志

类型: boolean

默认: false

浏览器日志 | Browser Log可选

Log browser console messages. | 记录浏览器控制台消息

类型: boolean

默认: false

定价

失败结果不计费

用户评分

5.0

开发者

Kael Odin

Worker 数据

4次累计运行

成功率：100.00%

最后更新时间：2026.04.15

分类

Google

你可能也喜欢

探索商店中更多热门采集工具

查看全部采集工具

谷歌搜索结果（SERP）抓取API

by CoreClaw

通过关键词请求，返回结构化的搜索结果摘要，包括最终搜索参数、自然结果、相关搜索以及 People Also Ask 数据。

4.8

604 次运行

低至 $1.2/1,000 结果

数据集合并和去重工具

by Kael Odin

数据集去重采集器是一款功能强大的工具，用于合并多个 JSON/JSONL 文件中的数据集并进行数据去重。该工具针对 CafeScraper 平台完成全面优化，附加增强功能，并具备完善的异常处理机制。

5.0

15 次运行

低至 $1.2/1,000 结果

Google Sheets 导入导出工具

by Kael Odin

一款功能强大的 Google Sheets 数据导入导出工具，专用于实现 Google Sheets 与外部系统的数据同步、备份和集成。支持三种操作模式、两种认证方式、批量处理、数据去重、自动备份等功能。

5.0

2 次运行

低至 $1.2/1,000 结果

Cheerio网页抓取

by Kael Odin

一款基于 Cheerio 的高速静态页面爬虫工具，专为静态 HTML 页面设计。使用 Cheerio 进行 HTML 解析，速度比完整浏览器渲染快 10-50 倍。