

Extract titles, prices, SKUs, variants, images, inventory status & metadata from any Shopify store. Just enter the URL—we handle sitemap discovery, WAF bypass & structured output automatically.
Shopify Product Scraper crawls product data from any Shopify store via its JSON API, sitemap discovery, and browser-based anti-bot fallback.Extracts title, price, description, SKU, variants, images, inventory status, and metadata.
robots.txt to find product sitemaps automatically/products.json bulk API first (fast), falls back to individual product endpoints via browser (robust)robots.txt doesn't declare one/es-US/products/) for correct JSON API access| Field | Type | Description |
|---|---|---|
url | string | Product page URL |
title | string | Product title |
id | string | Shopify product ID (GUID stripped) |
sku | string | Variant SKU |
description | string | Product description (HTML stripped) |
price | number | Variant price |
currency | string | Currency (defaults to "USD") |
availability | string | "in stock" or "out of stock" |
color | string | Option value for color |
size | string | Option value for size |
material | string | Option value for material |
display_name | string | Variant display name |
product_type | string | Shopify product type |
images_urls | string[] | Product + variant image URLs (deduped, query strings stripped) |
brand | string | Product vendor |
video_urls | string[] | Video URLs (reserved) |
created_at | string | ISO 8601 creation timestamp |
updated_at | string | ISO 8601 update timestamp |
published_at | string | ISO 8601 publish timestamp |
additional | object | Extra metadata:variant_attributes, variant_title, scraped_at, barcode, taxcode, stock_count, tags, weight, requires_shipping, plus any custom option keys |
The startUrl array controls how the platform splits work across concurrent subtasks (via the b field).
Use extendOutputFunction to transform or reject each row. Return null to skip.
Use extendScraperFunction to hook into different stages of the crawl lifecycle.
Enable fetchHtml to get the full HTML page alongside the JSON API response.
The HTML body is available in request.userData.body inside the output function.
In extendOutputFunction:
Set debugLog: true and failed responses (missing product title) are saved to storage for inspection.
| Parameter | Type | Default | Description |
|---|---|---|---|
startUrl | array | required | Shopify store URLs. Also the b (split) field for concurrency. |
maxRequestsPerCrawl | integer | 0 | Max products to crawl.0 = unlimited. |
maxConcurrency | integer | 20 | Max parallel requests (1-20). |
maxRequestRetries | integer | 3 | Retries on failure before giving up. |
checkForBanner | boolean | true | Verify robots.txt contains "Shopify" before crawling (non-Shopify stores still proceed). |
fetchHtml | boolean | false | Fetch HTML pages before JSON API calls (2x requests). |
debugLog | boolean | false | Verbose logging; saves failed JSON responses for inspection. |
extendOutputFunction | string | passthrough | JavaScript function (async) to transform/filter output rows. Return null to skip. |
extendScraperFunction | string | no-op | JavaScript function (async) for scraper lifecycle hooks. |
customData | object | {} | Arbitrary data accessible in both extend functions. |
extendOutputFunction:
products.json is blocked, all requests go through the browser, which is slower (~1 req/sec per concurrent browser). With 5 concurrent browsers, expect ~5 products/sec."USD" — multi-currency stores need custom parsing via extendOutputFunction.On CoreClaw, all outbound HTTP requests go through the platform's SOCKS5 proxy.
The proxy address is read from PROXY_AUTH and PROXY_DOMAIN environment variables (set automatically by the platform).
The browser is connected via WebSocket CDP (ChromeWs env var + PROXY_AUTH auth).
Both are platform-injected — no manual configuration needed.
All online stores built on the Shopify platform can be scraped, regardless of theme or language version. The tool automatically detects and handles localized URLs.
This tool is designed specifically for Shopify. Non-Shopify sites may be attempted, but data structures may not be compatible. The checkForBanner parameter verifies whether robots.txt contains Shopify identifiers.
The default of 20 concurrent requests works well for most scenarios. For stores with strict rate limiting, we recommend reducing it to 5-10.
Explore more popular scrapers from our marketplace
by Techforce Global
Search products and walk away with selling prices, retail prices, discounts, hero images, and the latest customer reviews for every product, ready to drop into your spreadsheet, dashboard, or BI tool. The Quince.com Product Scraper turns catalog into clean, structured product data in minutes.
by yankun guo
A dedicated tool to extract structured detailed data for individual SHEIN products via product URL or product ID. It connects to a remote Chromium instance, automatically bypasses SHEIN's risk verification, loads the target product page, parses complete product attributes, and returns normalized data. Supports 10+ regional SHEIN sites and configurable workflow retries, ideal for product information monitoring, price tracking, competitor research, and trend analysis.
by yankun guo
A scalable tool to automatically discover, parse, and extract structured SHEIN product data through three input modes (keyword, category URL, category ID). It supports multi-regional SHEIN sites (US/UK/DE/FR, etc.), customizable sorting rules, and extraction of core product attributes (price, rating, sales volume, badges, etc.), ideal for price tracking, competitor research, trend analysis, and listing monitoring.
by yankun guo
Enter questions or links,no coding required to extract full Perplexity AI answers with source citations in HTML format. Ideal for research, fact-checking and content analysis.