Product Hunt Scraper

Pricing

Product Hunt Scraper

odin-kael/product-hunt-scraper

Product Hunt Scraper CoreClaw Worker to scrape trending products by keyword for market research, competitor tracking, lead generation and AI startup trend monitoring. Support API, feed, browser and proxy auto strategy.

Try for Free

What is Product Hunt Scraper？

Product Hunt Scraper is a CoreClaw Worker for discovering trending products by keyword. It is designed for market research, competitor tracking, lead generation, product newsletter workflows, and AI/startup trend monitoring.

The worker is adapted from the open-source ph_ai_tracker / ProductHunt-Scraper project, but removes long-running services and local persistence so it fits CoreClaw's one-shot Worker model.

Key Features

Search Product Hunt by multiple commercial keywords, such as AI agents, developer tools, sales automation, and data analytics.
Uses search_terms as the CoreClaw task-splitting field.
Recommended auto strategy:
- Uses Product Hunt API first when a token is provided.
- Uses bounded Product Hunt Atom feed and product-page site search fallbacks; browser/HTTP page scraping are only used when explicitly selected.
Automatically uses CoreClaw's SOCKS5 proxy through PROXY_DOMAIN and PROXY_AUTH.
Streams one clean structured row per product.
Emits a structured failure row if all providers fail, instead of crashing silently.

Project Structure

text

producthunt-scraper/
├── main.py              # CoreClaw Worker entry
├── scraper.py           # Product Hunt providers: browser, HTTP scraper, API, feed, search fallback
├── input_schema.json    # CoreClaw input form
├── output_schema.json   # CoreClaw output table schema
├── requirements.txt     # Python dependencies
├── README.md            # English documentation
├── README_CN.md         # Chinese documentation
├── sdk.py               # CoreClaw SDK
├── sdk_pb2.py
└── sdk_pb2_grpc.py

Input Parameters

Name	Type	Default	Description
`search_terms`	array / stringList	`AI agents`, `developer tools`, `sales automation`	One keyword per line. CoreClaw splits tasks by this field.
`limit`	integer	`20`	Maximum products returned per search term.
`strategy`	select	`auto`	`auto`, `browser`, `scraper`, `feed`, `search`, or `api`. Use `auto` unless you have a specific reason.
`api_token`	string	empty	Optional Product Hunt API token. Required only for `api`.
`recent_days`	integer	`30`	Keep products posted within the last N days when timestamps exist. Use `0` to disable.
`max_enrich`	integer	`0`	Number of detail pages to visit for missing fields.`0` is fastest and recommended.
`timeout_seconds`	integer	`45`	Timeout for HTTP/API/browser navigation. Auto mode feed/search fallbacks use fixed short internal caps.

Recommended Input

json

{
  "search_terms": [
    { "string": "AI agents" },
    { "string": "developer tools" },
    { "string": "sales automation" }
  ],
  "limit": 20,
  "strategy": "auto",
  "api_token": "",
  "recent_days": 30,
  "max_enrich": 0,
  "timeout_seconds": 45
}

CoreClaw may split search_terms and pass a single subtask as {"string": "AI agents"}. This Worker explicitly supports that flattened input shape.

Output Fields

Field	Description
`status`	`success` or `failed`
`source`	Actual provider used:`api`, `browser`, `scraper`, `feed`, or `search`
`search_term`	Search keyword that produced the row
`rank`	Rank within the current search term, sorted by votes when available
`name`	Product name
`tagline`	Short Product Hunt tagline
`description`	Product description when available
`votes_count`	Product Hunt votes
`url`	Product Hunt product URL
`topics`	Product topics/categories
`posted_at`	Product posted time when available
`error`	Failure reason for failed rows

Strategy Guide

Strategy	Best For	Notes
`auto`	Production CoreClaw runs	Recommended. Uses token API when available, then Product Hunt Atom feed and bounded site-search fallback. Does not enter page scraping paths by default.
`browser`	Product Hunt blocks HTTP with 403	Uses CoreClaw remote fingerprint browser through `ChromeWs` + `PROXY_AUTH`.
`scraper`	Fast local parsing or simple cloud runs	Tries HTTP/browser paths and then non-page fallbacks when Product Hunt blocks pages.
`feed`	Product Hunt page access is blocked	Uses the public Product Hunt Atom feed. Best for recent launches.
`search`	Keyword-specific fallback	Uses product-page site-search results when Product Hunt pages/API/feed are insufficient.
`api`	Stable official API data	Requires Product Hunt API token.

CoreClaw Network Requirement

CoreClaw cloud does not reliably allow direct outbound access. The Worker is designed to use platform-provided network access:

text

PROXY_DOMAIN=<platform proxy endpoint>
PROXY_AUTH=<username:password>
ChromeWs=<remote fingerprint browser endpoint>

The Worker automatically builds:

text

socks5://<PROXY_AUTH>@<PROXY_DOMAIN>
ws://<PROXY_AUTH>@<ChromeWs>

Do not hard-code proxy credentials.

Local Verification

Syntax check:

bash

python -m py_compile main.py scraper.py

The actual main.py entry requires CoreClaw's SDK gRPC service, so run full end-to-end tests on CoreClaw.

Implementation Notes

Removed from the original project:

SQLite persistence
HTTP API server
scheduler / cron mode
AI tagging
Docker service mode

Kept and adapted:

Product Hunt structured product model
__NEXT_DATA__ parser
DOM fallback parser
Product Hunt GraphQL API, public Atom feed, and site-search fallback paths
CoreClaw proxy and browser runtime support

FAQ

What is this Product Hunt Scraper for?

It scrapes trending products from Product Hunt by keywords for market research, competitor tracking, lead generation, and startup trend monitoring.

Why am I getting 403 / access blocked errors?

Switch to browser strategy; it uses CoreClaw’s fingerprint browser + proxy.

What does search_terms do?

It’s your keyword list; CoreClaw automatically splits tasks by this field.

Pricing

Failed results don't count

Rating

5.0

Developer

Kael Odin

Worker Stats

12 Total runs

Success rate: 100.00%

Last updated: May 07, 2026

Made-in-China Supplier Intelligence Scraper | Extract Company Profiles, Contacts & Trade Data

by mmi0cuhn

Scrape Made-in-China supplier pages and collect structured company profiles, main products, audit report numbers, trade details, certificates, shipment images, and contact information for B2B sourcing workflows.

5.0

25 runs

From $0.6/1,000 results

Quince.com Product Scraper - Prices, Discounts, Reviews & More

by Techforce Global

Search products and walk away with selling prices, retail prices, discounts, hero images, and the latest customer reviews for every product, ready to drop into your spreadsheet, dashboard, or BI tool. The Quince.com Product Scraper turns catalog into clean, structured product data in minutes.

5.0

18 runs

From $0.6/1,000 results

SHEIN Single Product Extractor (URL/ID)

by yankun guo

A dedicated tool to extract structured detailed data for individual SHEIN products via product URL or product ID. It connects to a remote Chromium instance, automatically bypasses SHEIN's risk verification, loads the target product page, parses complete product attributes, and returns normalized data. Supports 10+ regional SHEIN sites and configurable workflow retries, ideal for product information monitoring, price tracking, competitor research, and trend analysis.

5.0

169 runs

From $0.6/1,000 results

Goodreads Book Info Extractor

by Adil Ayub

Instantly extract Goodreads book data including title, description, ISBN, ASIN, publisher, format, page count, language, genres, awards, characters, ratings, and rating counts. Receive structured JSON data for seamless integration into your applications and workflows.

5.0

1 runs

From $0.6/1,000 results

View All Scrapers