

A scalable tool to automatically discover, parse, and extract structured SHEIN product data through three input modes (keyword, category URL, category ID). It supports multi-regional SHEIN sites (US/UK/DE/FR, etc.), customizable sorting rules, and extraction of core product attributes (price, rating, sales volume, badges, etc.), ideal for price tracking, competitor research, trend analysis, and listing monitoring.
SHEIN Product Discovery
This project discovers SHEIN products from three different input modes:
shein_products_by-keywordshein_products_by-category-urlshein_products_by-category-idThe worker opens the corresponding SHEIN page, attempts to pass SHEIN risk verification automatically, parses the product grid, and returns a structured product list.
Typical use cases:
shein_products_by-keywordUse a keyword and build the SHEIN search URL as:
shein_products_by-category-urlUse a full category URL directly, then append the common query parameters:
shein_products_by-category-idUse a category ID and build the category page URL as:
type and value from the input payload.ChromeWs.The worker depends on:
playwrightselectolaxhttpxgrpcioprotobufEnvironment variables used at runtime:
ChromeWs: required, remote Chromium CDP websocket hostPROXY_AUTH: optional, authentication prefix added to the websocket URLIf ChromeWs is missing, the worker returns a failure result.
The input schema is defined in input_schema.json.
| Parameter | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Discovery mode. One of shein_products_by-keyword, shein_products_by-category-url, shein_products_by-category-id. |
value | string | Yes | Input value for the selected mode. It is a keyword, category URL, or category ID. |
flow_retry_num | integer | No | Total retry count for the full browser workflow. Default:3. |
country | string | No | SHEIN site region. Used by keyword mode and category ID mode. Default:us. |
sort | string | No | Search result or category page sort option. Default:recommend. |
page | integer | No | Page number. Starts from 1. Default: 1. |
limit | integer | No | Number of products requested per page. Default:20. |
| Value | Site |
|---|---|
us | United States |
uk | United Kingdom |
de | Germany |
fr | France |
it | Italy |
es | Spain |
ca | Canada |
au | Australia |
mx | Mexico |
jp | Japan |
The current code maps sort values as follows:
| Value | Description | SHEIN sort param |
|---|---|---|
recommend | Recommended | no sort parameter |
most_popular | Most Popular | 8 |
new_arrivals | New Arrivals | 9 |
top_rated | Top Rated | 7 |
price_low | Price Low to High | 10 |
price_high | Price High to Low | 11 |
The output schema is defined in output_schema.json.
| Field | Type | Description |
|---|---|---|
type | string | Request mode used for the current run. |
url | string | Final SHEIN URL opened by the worker. |
code | number | 1 for success, 0 for failure. |
count | number | Number of extracted products. |
products | array | Parsed product list. |
error | string | Error message when the request fails. Empty on success. |
error_code | string | Failure code when the request fails. Empty on success. |
Each item in products represents one product card from the SHEIN page.
| Field | Type | Description | Page / Parsing Position |
|---|---|---|---|
goods_id | string | Unique SHEIN product identifier. | Taken from data-id when available, otherwise parsed from the product URL suffix -p-<id>.html. |
product_url | string | Full product detail page URL. | Link target of the product card. |
title | string | Product title shown on the listing. | Product card title text, or fallback from data-title / image alt / aria-label. |
main_image | string | Main thumbnail image URL. | Main product image inside the card. |
price | number or null | Current selling price in site currency. | Sale price shown on the card, parsed from card attributes or visible sale price text. |
price_usd | number or null | Current selling price in USD when available. | Parsed from card attributes when available. |
currency | string | ISO-like currency code such as USD, GBP, EUR. | Derived from the visible sale price symbol. |
original_price | number or null | Original or strikethrough price before discount in site currency. | Visible strikethrough price. |
original_price_usd | number or null | Original or strikethrough price in USD when available. | Parsed from card attributes when available. |
discount_percent | number or null | Discount percentage. | Discount label from card attributes. |
rating | number or null | Average rating on a 0-5 scale. | Star rating area below price. |
reviews_count | number or null | Total review count. | Review count shown next to the rating. |
position | number | Position in the current result list, starting from 1. | Product card order in the parsed grid. |
sold_count | number or null | Sold quantity estimate. | Sales label such as 200+ sold or 1.5k+ sold. |
is_local | boolean | Whether the product is marked as local stock or local shipping. | Derived from local labels or local attributes. |
is_trending | boolean | Whether the product is marked as a trending item. | Derived from trend label attributes. |
free_shipping | boolean | Whether free shipping text is present. | Detected from the full product card text. |
quick_ship | boolean | Whether QuickShip is available. | Derived from QuickShip attributes or visible text. |
badges | array | Marketing or ranking badges. | Badge text such as BIG DEALS, Bestseller, #1, and similar labels. |
color_count | number or null | Number of available color variants. | Parsed from the color count area on the card. |
price_usd and original_price_usd depend on whether SHEIN provides US price attributes for the card.original_price, discount_percent, rating, reviews_count, sold_count, and color_count may be null when the card does not show the corresponding element.badges may be an empty array.| Error Code | Description |
|---|---|
400 | Invalid or missing input parameters. |
500 | Internal execution error. |
BROWSER_CONNECT_FAILED | Failed to connect to the remote Chromium instance. |
PAGE_OPEN_FAILED | Failed to open the SHEIN page. |
SHEIN_VERIFY_FAILED | SHEIN verification appeared and could not be passed. |
PRODUCT_LIST_NOT_FOUND | Product list container was not found on the page. |
PRODUCT_EXTRACT_FAILED | Product extraction failed after page load. |
shein_products_by-category-url uses the input URL directly, then appends the worker query parameters.shein_products_by-category-id builds the category page as https://{country}.shein.com/{category_id}.html.https://us.shein.com in the current parser implementation.Explore more popular scrapers from our marketplace
by Techforce Global
Search products and walk away with selling prices, retail prices, discounts, hero images, and the latest customer reviews for every product, ready to drop into your spreadsheet, dashboard, or BI tool. The Quince.com Product Scraper turns catalog into clean, structured product data in minutes.
by yankun guo
A dedicated tool to extract structured detailed data for individual SHEIN products via product URL or product ID. It connects to a remote Chromium instance, automatically bypasses SHEIN's risk verification, loads the target product page, parses complete product attributes, and returns normalized data. Supports 10+ regional SHEIN sites and configurable workflow retries, ideal for product information monitoring, price tracking, competitor research, and trend analysis.
by yankun guo
Enter questions or links,no coding required to extract full Perplexity AI answers with source citations in HTML format. Ideal for research, fact-checking and content analysis.
by yankun guo
Input questions to get full HTML content with cited sources from ChatGPT replies. Supports bulk scraping, automatic retry and source extraction. No technical skills required. Free trial available.