CoreClaw
Store
Pricing
Start Free Trial
yankun guo

SHEIN Product Scraper (Keyword/Category-Driven)

Pricing
Try for free
yankun guo

SHEIN Product Scraper (Keyword/Category-Driven)

yankun-guo/shein_keyword

A scalable tool to automatically discover, parse, and extract structured SHEIN product data through three input modes (keyword, category URL, category ID). It supports multi-regional SHEIN sites (US/UK/DE/FR, etc.), customizable sorting rules, and extraction of core product attributes (price, rating, sales volume, badges, etc.), ideal for price tracking, competitor research, trend analysis, and listing monitoring.

Try for Free
2,000 Free Results

SHEIN Product Discovery

Overview

This project discovers SHEIN products from three different input modes:

  • shein_products_by-keyword
  • shein_products_by-category-url
  • shein_products_by-category-id

The worker opens the corresponding SHEIN page, attempts to pass SHEIN risk verification automatically, parses the product grid, and returns a structured product list.

Typical use cases:

  • Product discovery by search term
  • Product discovery by category landing page
  • Product discovery by category ID
  • Listing monitoring
  • Price tracking
  • Competitor research
  • Trend and badge analysis

Discovery Modes

1. shein_products_by-keyword

Use a keyword and build the SHEIN search URL as:

text
https://{country}.shein.com/pdsearch/{keyword}/?sort=...&page=...&limit=...

2. shein_products_by-category-url

Use a full category URL directly, then append the common query parameters:

text
{category_url}?sort=...&page=...&limit=...

3. shein_products_by-category-id

Use a category ID and build the category page URL as:

text
https://{country}.shein.com/{category_id}.html?sort=...&page=...&limit=...

Workflow

  1. Read type and value from the input payload.
  2. Build the target SHEIN URL according to the selected mode.
  3. Connect to a remote Chromium instance through ChromeWs.
  4. Open the SHEIN page and attempt to pass SHEIN verification automatically when required.
  5. Wait until the SHEIN product page is actually ready.
  6. Parse product cards from the page.
  7. Return a normalized product array.

Runtime Requirements

The worker depends on:

  • playwright
  • selectolax
  • httpx
  • grpcio
  • protobuf

Environment variables used at runtime:

  • ChromeWs: required, remote Chromium CDP websocket host
  • PROXY_AUTH: optional, authentication prefix added to the websocket URL

If ChromeWs is missing, the worker returns a failure result.

Input Parameters

The input schema is defined in input_schema.json.

Request Example

json
{
  "type": "shein_products_by-keyword",
  "value": "dress",
  "country": "us",
  "sort": "recommend",
  "page": 1,
  "limit": 20,
  "flow_retry_num": 3
}

Parameters

ParameterTypeRequiredDescription
typestringYesDiscovery mode. One of shein_products_by-keyword, shein_products_by-category-url, shein_products_by-category-id.
valuestringYesInput value for the selected mode. It is a keyword, category URL, or category ID.
flow_retry_numintegerNoTotal retry count for the full browser workflow. Default:3.
countrystringNoSHEIN site region. Used by keyword mode and category ID mode. Default:us.
sortstringNoSearch result or category page sort option. Default:recommend.
pageintegerNoPage number. Starts from 1. Default: 1.
limitintegerNoNumber of products requested per page. Default:20.

Supported Countries

ValueSite
usUnited States
ukUnited Kingdom
deGermany
frFrance
itItaly
esSpain
caCanada
auAustralia
mxMexico
jpJapan

Sort Options

The current code maps sort values as follows:

ValueDescriptionSHEIN sort param
recommendRecommendedno sort parameter
most_popularMost Popular8
new_arrivalsNew Arrivals9
top_ratedTop Rated7
price_lowPrice Low to High10
price_highPrice High to Low11

Output Structure

The output schema is defined in output_schema.json.

Response Example

json
{
  "type": "shein_products_by-keyword",
  "url": "https://us.shein.com/pdsearch/dress/?source=sort&sourceStatus=4&page=1&force_suggest=1&limit=20",
  "code": 1,
  "count": 20,
  "products": [
    {
      "goods_id": "123456789",
      "product_url": "https://us.shein.com/example-p-123456789.html",
      "title": "Mock Neck Bodycon Dress",
      "main_image": "https://img.ltwebstatic.com/...",
      "price": 7.51,
      "price_usd": 7.51,
      "currency": "USD",
      "original_price": 17.18,
      "original_price_usd": 17.18,
      "discount_percent": 56,
      "rating": 4.5,
      "reviews_count": 1300,
      "position": 1,
      "sold_count": 1500,
      "is_local": true,
      "is_trending": false,
      "free_shipping": true,
      "quick_ship": true,
      "badges": [
        "Bestseller",
        "#1"
      ],
      "color_count": 12
    }
  ],
  "error": "",
  "error_code": ""
}

Top-Level Response Fields

FieldTypeDescription
typestringRequest mode used for the current run.
urlstringFinal SHEIN URL opened by the worker.
codenumber1 for success, 0 for failure.
countnumberNumber of extracted products.
productsarrayParsed product list.
errorstringError message when the request fails. Empty on success.
error_codestringFailure code when the request fails. Empty on success.

Product Fields

Each item in products represents one product card from the SHEIN page.

FieldTypeDescriptionPage / Parsing Position
goods_idstringUnique SHEIN product identifier.Taken from data-id when available, otherwise parsed from the product URL suffix -p-<id>.html.
product_urlstringFull product detail page URL.Link target of the product card.
titlestringProduct title shown on the listing.Product card title text, or fallback from data-title / image alt / aria-label.
main_imagestringMain thumbnail image URL.Main product image inside the card.
pricenumber or nullCurrent selling price in site currency.Sale price shown on the card, parsed from card attributes or visible sale price text.
price_usdnumber or nullCurrent selling price in USD when available.Parsed from card attributes when available.
currencystringISO-like currency code such as USD, GBP, EUR.Derived from the visible sale price symbol.
original_pricenumber or nullOriginal or strikethrough price before discount in site currency.Visible strikethrough price.
original_price_usdnumber or nullOriginal or strikethrough price in USD when available.Parsed from card attributes when available.
discount_percentnumber or nullDiscount percentage.Discount label from card attributes.
ratingnumber or nullAverage rating on a 0-5 scale.Star rating area below price.
reviews_countnumber or nullTotal review count.Review count shown next to the rating.
positionnumberPosition in the current result list, starting from 1.Product card order in the parsed grid.
sold_countnumber or nullSold quantity estimate.Sales label such as 200+ sold or 1.5k+ sold.
is_localbooleanWhether the product is marked as local stock or local shipping.Derived from local labels or local attributes.
is_trendingbooleanWhether the product is marked as a trending item.Derived from trend label attributes.
free_shippingbooleanWhether free shipping text is present.Detected from the full product card text.
quick_shipbooleanWhether QuickShip is available.Derived from QuickShip attributes or visible text.
badgesarrayMarketing or ranking badges.Badge text such as BIG DEALS, Bestseller, #1, and similar labels.
color_countnumber or nullNumber of available color variants.Parsed from the color count area on the card.

Notes About Field Availability

  • Not every product card exposes every field.
  • price_usd and original_price_usd depend on whether SHEIN provides US price attributes for the card.
  • original_price, discount_percent, rating, reviews_count, sold_count, and color_count may be null when the card does not show the corresponding element.
  • badges may be an empty array.

Error Codes

Error CodeDescription
400Invalid or missing input parameters.
500Internal execution error.
BROWSER_CONNECT_FAILEDFailed to connect to the remote Chromium instance.
PAGE_OPEN_FAILEDFailed to open the SHEIN page.
SHEIN_VERIFY_FAILEDSHEIN verification appeared and could not be passed.
PRODUCT_LIST_NOT_FOUNDProduct list container was not found on the page.
PRODUCT_EXTRACT_FAILEDProduct extraction failed after page load.

Important Implementation Notes

  • shein_products_by-category-url uses the input URL directly, then appends the worker query parameters.
  • shein_products_by-category-id builds the category page as https://{country}.shein.com/{category_id}.html.
  • Relative product detail links are still normalized with https://us.shein.com in the current parser implementation.
  • The actual returned fields are determined by main.py.

Pricing

Failed results don't count

Rating

4.7

Developer

yankun guo

Worker Stats

176 Total runs
Success rate: 99.43%
Last updated: Jun 08, 2026

Categories

Other

Share

You might also like

Explore more popular scrapers from our marketplace

View All Scrapers
Quince.com Product Scraper - Prices, Discounts, Reviews & More

Quince.com Product Scraper - Prices, Discounts, Reviews & More

by Techforce Global

Search products and walk away with selling prices, retail prices, discounts, hero images, and the latest customer reviews for every product, ready to drop into your spreadsheet, dashboard, or BI tool. The Quince.com Product Scraper turns catalog into clean, structured product data in minutes.

4.9
12 runs
From $1.5/results
SHEIN Single Product Extractor (URL/ID)

SHEIN Single Product Extractor (URL/ID)

by yankun guo

A dedicated tool to extract structured detailed data for individual SHEIN products via product URL or product ID. It connects to a remote Chromium instance, automatically bypasses SHEIN's risk verification, loads the target product page, parses complete product attributes, and returns normalized data. Supports 10+ regional SHEIN sites and configurable workflow retries, ideal for product information monitoring, price tracking, competitor research, and trend analysis.

4.7
46 runs
From $1.5/results
Perplexity AI Answer Scraper with Sources

Perplexity AI Answer Scraper with Sources

by yankun guo

Enter questions or links,no coding required to extract full Perplexity AI answers with source citations in HTML format. Ideal for research, fact-checking and content analysis.

4.6
294 runs
From $1.5/results
ChatGPT Answer Scraper

ChatGPT Answer Scraper

by yankun guo

Input questions to get full HTML content with cited sources from ChatGPT replies. Supports bulk scraping, automatic retry and source extraction. No technical skills required. Free trial available.

4.5
272 runs
From $1.5/results
View All Scrapers
CoreClaw

Deploy ready-to-use Workers to accelerate your data collection workflows.

Email: support@coreclaw.com

Resources

  • Quick Start
  • API Reference
  • Leads

Recommend

  • Store
  • Pricing

Address

Apex DataWorks Limited

UNIT 9, 1/F, THE CLOUD, 111 TUNG CHAU STREET, TAI KOK TSUI, KOWLOON,HONG KONG