Webpage Content Extractor

Pricing

Webpage Content Extractor

odin-kael/website-content-extractor

Intelligently extract website content using Crawl4AI, retrieving page content in various formats (Markdown, HTML, or plain text). Supports configurable depth, wait conditions, CSS selectors, and comprehensive link discovery. Zero-code operation, one-click export in CSV or JSON format.

Try for Free

Start URLs | 起始 URLRequired

Starting URLs to crawl (e.g. https://example.com). One or more URLs. | 开始抓取的 URL（如 https://example.com），支持多个 URL。

Type: array

Max Pages | 最大页面数Optional

Maximum pages to process in total (1–10000). | 总共处理的最大页面数（1-10000）。

Type: integer

Default: 50

Max Depth | 最大深度Optional

Maximum link depth from each start URL (0–10). | 从每个起始 URL 开始的最大链接深度（0-10）。

Type: integer

Default: 2

Concurrency | 并发数Optional

Number of concurrent page tasks (1–50). | 并发页面任务数（1-50）。

Type: integer

Default: 5

Request Timeout (secs) | 请求超时(秒)Optional

Timeout per page request in seconds (5–600). | 每个页面请求的超时时间（秒，5-600）。

Type: integer

Default: 60

Headless | 无头模式Optional

Run browser headless. | 以无头模式运行浏览器。

Type: boolean

Default: true

Extract Mode | 提取模式Optional

Output content format: markdown, html, or text. | 输出内容格式：Markdown、HTML 或纯文本。

Type: select

Default: markdown

Options:

MarkdownHTMLText | 纯文本

Max Results | 最大结果数Optional

Maximum output items to push (1–200000). | 推送的最大输出项数（1-200000）。

Type: integer

Default: 1000

Same Domain Only | 仅同域名Optional

Only follow links within start URL domains. | 仅跟踪起始 URL 域名内的链接。

Type: boolean

Default: true

Include URL Patterns | 包含 URL 模式Optional

Only include URLs matching these regex patterns (optional). | 仅包含匹配这些正则模式的 URL（可选）。

Type: array

Exclude URL Patterns | 排除 URL 模式Optional

Exclude URLs matching these regex patterns. | 排除匹配这些正则模式的 URL。

Type: array

Max Retries | 最大重试Optional

Retry failed pages up to this count (0–10). | 重试失败页面的次数（0-10）。

Type: integer

Default: 2

Clean Content | 清理内容Optional

Remove navigation-heavy lines and normalize whitespace. | 移除导航密集的行并规范化空白。

Type: boolean

Default: true

Include Raw Content | 包含原始内容Optional

Include unmodified content in a separate field. | 在单独字段中包含未修改的内容。

Type: boolean

Default: false

Max Content Chars | 最大内容字符数Optional

Truncate content to this length (0 = unlimited, max 500000). | 截断内容到此长度（0=不限制，最大 500000）。

Type: integer

Default: 0

Content Excerpt Chars | 内容摘要字符数Optional

Length of content excerpt for previews (0–5000). | 预览用内容摘要的长度（0-5000）。

Type: integer

Default: 300

Wait Until | 等待条件Optional

Page load strategy: domcontentloaded (fast), load, or networkidle (SPA/slow sites). | 页面加载策略：domcontentloaded（快）、load 或 networkidle（SPA/慢站点）。

Type: select

Default: domcontentloaded

Options:

DOM Content Loaded | DOM加载完成Load | 完全加载Network Idle | 网络空闲

Wait For Selector | 等待选择器Optional

CSS selector to wait for before extraction (e.g. .article-body). Leave empty to skip. | 提取前等待的 CSS 选择器（如 .article-body），留空表示不等待。

Type: string

CSS Selector (Extract Region) | CSS选择器(提取区域)Optional

Extract only content inside this CSS selector (e.g. main, .content). Leave empty for full page. | 仅提取此 CSS 选择器内的内容（如 main, .content），留空表示提取整页。

Type: string

Crawl Mode | 抓取模式Optional

full = extract content; discover_only = only URLs and links (no content). | full=提取内容；discover_only=仅 URL 和链接（无内容）。

Type: select

Default: full

Options:

Full (extract content) | 完整模式Discover only (links) | 仅发现链接

Include Link URLs | 包含链接 URLOptional

Include links_internal and links_external arrays in each item (full mode). | 在每项中包含内部链接和外部链接数组（完整模式）。

Type: boolean

Default: false

Pricing

Failed results don't count

Rating

5.0

Developer

Kael Odin

Worker Stats

4 Total runs

Success rate: 100.00%

Last updated: Apr 14, 2026

Google Search Results (SERP) Scraper API

by CoreClaw

It queries the Google search engine by keyword and returns a structured SERP summary, including the final search parameters, organic results, related queries, and people-also-ask data.

4.8

604 runs

From $1.2/1,000 results

Dataset Deduplication & Merge Tool

by Kael Odin

Dedup Datasets Worker is a powerful tool for merging and deduplicating datasets from multiple JSON/JSONL files. Fully optimized for the CafeScraper platform with enhanced features and robust error handling.

5.0

15 runs

From $1.2/1,000 results

Google Sheets Import Export Tool

by Kael Odin

A powerful Google Sheets data import export tool designed for data synchronization, backup, and integration between Google Sheets and external systems. Supports three operation modes, two authentication methods, batch processing, data deduplication, and automatic backup.

5.0

2 runs

From $1.2/1,000 results

Cheerio Web Scraping

by Kael Odin

A high-speed static page scraper based on Cheerio, designed specifically for static HTML pages. Uses Cheerio for HTML parsing, delivering speeds 10-50 times faster than full browser rendering.

5.0

3 runs

From $1.2/1,000 results

View All Scrapers