CoreClaw
Store
Pricing
Start Free Trial
Techforce Global

TikTok Data Extractor

Pricing
Try for free
Techforce Global

TikTok Data Extractor

q9w5f5h8/tiktok-data-extractor

Extract comprehensive TikTok data with a single click profiles with detailed metrics (followers, engagement, verification status), video analytics (views, likes, comments, hashtags), and hashtag trending data. Built with anti-detection technology for reliable scraping.

Try for Free
2,000 Free Results

TikTok Multi-Mode Scraping Worker

Overview

It is a TikTok data collection worker that follows the CoreClaw Worker specification.

It collects TikTok data through a remote browser connection and supports four collection modes:

  • author
  • video
  • search
  • tag

The worker is orchestrated by main.py. Each request is normalized, dispatched to the corresponding scraping module by collection_type, and the standardized results are sent to the platform via CoreSDK.Result.push_data().

Project Structure

text
main.py                 # Main entry: input parsing, mode dispatch, retry, result limiting, and result push
rep_author.py           # Author profile scraping
rep_video.py            # Video detail scraping
rep_search.py           # User search scraping, including captcha handling
rep_tag.py              # Tag page scraping
input_schema.json       # Platform input parameter definition
output_schema.json      # Platform output field definition
sdk.py                  # CoreClaw SDK main module
sdk_pb2.py              # Generated protobuf file
sdk_pb2_grpc.py         # Generated gRPC file
requirements.txt        # Python dependency manifest

Supported Collection Modes

ModeDescriptionExample Input
authorAuthor profile scrapingbellapoarch, @bellapoarch, https://www.tiktok.com/@bellapoarch
videoVideo detail scrapinghttps://www.tiktok.com/@user/video/1234567890
searchUser search scrapingapple
tagTag page scrapingfyp

Input Parameters

The worker reads platform input via CoreSDK.Parameter.get_input_json_dict().

ParameterTypeRequiredDefaultDescription
collection_typestringYesauthorCollection mode. One of:author, video, search, tag
targetsarrayYes-List of targets to collect. Depending on the mode, this can be a username, URL, search keyword, or tag word
max_resultsintegerNo10Maximum number of rows returned for search and tag modes
retry_timesintegerNo3Maximum retry count for retryable network-related errors
retry_delay_secondsnumberNo1Delay in seconds between retries
page_timeout_msintegerNo180000Page timeout in milliseconds
wait_after_load_msintegerNo3000Additional wait time after page load, in milliseconds

Example Requests

User search

json
{
  "collection_type": "search",
  "targets": [
    { "string": "apple" }
  ],
  "max_results": 10
}

Tag scraping

json
{
  "collection_type": "tag",
  "targets": [
    { "string": "fyp" }
  ],
  "max_results": 10
}

Author scraping

json
{
  "collection_type": "author",
  "targets": [
    { "string": "bellapoarch" }
  ]
}

Video scraping

json
{
  "collection_type": "video",
  "targets": [
    { "string": "https://www.tiktok.com/@user/video/1234567890" }
  ]
}

Output Schema

This worker currently uses a shared superset output schema across all four modes. The main output fields include:

  • input_type
  • input_value
  • url
  • profile_url
  • title
  • desc
  • entity_id
  • video_id
  • author_id
  • username
  • nickname
  • tag
  • create_time
  • duration
  • play_count
  • digg_count
  • comment_count
  • share_count
  • collect_count
  • verified
  • signature
  • avatar_url
  • original_avatar_url
  • private_account
  • following_count
  • friends_count
  • fans_count
  • heart_count
  • video_count
  • music_name
  • music_author
  • music_id
  • music_play_url
  • cover_url
  • width
  • height
  • status
  • error
  • data_json

Result Limiting Rules

max_results is enforced centrally inside main.py:

  • search: returns at most max_results rows
  • tag: returns at most max_results rows
  • author: no row truncation
  • video: no row truncation

If max_results is not provided, the default value is 10.

Environment Variables

This worker relies on a remote browser connection. The following environment variables are supported:

Env VariableRequiredDefaultDescription
BROWSER_WSNo-When set, this WebSocket endpoint is used directly to connect to the browser
ChromeWsNochrome-ws-inner.coreclaw.comBrowser host address
PROXY_AUTHNo-Browser connection authentication credential
PROXY_DOMAINNo-Currently used mainly for logging

The browser endpoint is resolved in the following order:

  1. If BROWSER_WS is set, it is used directly.
  2. If BROWSER_WS is not set but PROXY_AUTH is set, the endpoint becomes ws://{PROXY_AUTH}@{ChromeWs}.
  3. If neither is set, the endpoint becomes ws://{ChromeWs}.

Processing Flow

1. Input normalization

normalize_request_items() converts the platform input into an internal task object:

json
{
  "input_type": "search",
  "input_value": "apple"
}

2. Sequential execution

The worker processes tasks sequentially and does not run collection jobs in parallel.

run() iterates over the normalized task list and calls process_item() for each entry.

3. Retry mechanism

The worker only retries recognizable network-related errors. Common retryable error markers include:

  • network is unreachable
  • failed to establish a new connection
  • max retries exceeded
  • name or service not known
  • temporary failure in name resolution
  • connection refused
  • connection timed out
  • read timed out
  • timeout 30000ms exceeded
  • timeout 180000ms exceeded
  • target page, context or browser has been closed
  • browser closed
  • websocket
  • socket hang up

Current Implementation Notes

  • All four modes share one common output field structure.
  • search mode automatically converts a keyword into a search URL.
  • tag mode automatically converts a tag word into a tag page URL.
  • rep_search.py includes captcha handling logic.
  • The current version prioritizes output stability and structural completeness for platform integration.

Quick Start

  1. Upload this worker via CoreClaw CLI or the platform UI.
  2. Choose the collection mode.
  3. Provide usernames, video URLs, search keywords, or tag words in targets.
  4. Run the task and retrieve standardized results.

Pricing

Failed results don't count

Rating

5.0

Developer

Techforce Global

Worker Stats

22 Total runs
Success rate: 100.00%
Last updated: Jun 17, 2026

Categories

Social mediaTikTok

Share

You might also like

Explore more popular scrapers from our marketplace

View All Scrapers
TikTok Bulk Video Scraper

TikTok Bulk Video Scraper

by CoreClaw

Extract public TikTok post data via profile URLs, including engagement, viral trends and audio info. One-click CSV/JSON export, zero code required.

4.8
31 runs
From $2.7/1,000 results
TikTok Profile Scraper(by search URL )

TikTok Profile Scraper(by search URL )

by CoreClaw

Extract public TikTok creator profile data using search URLs, including bio, follower counts, content performance and engagement metrics, without platform API limitations. Supports data export, API calls and third-party integrations.

4.6
29 runs
From $2.7/1,000 results
TikTok Comment Scraper(by posts URL)

TikTok Comment Scraper(by posts URL)

by CoreClaw

Extract public TikTok video comment data in batches by entering video URLs, including comment content, user information, like counts, reply lists, etc., outputting in CSV or JSON format. Supports sentiment analysis and user insights with zero-code operation and one-click structured data export.

4.4
28 runs
From $2.7/1,000 results
TikTok Profile Data Scraper (by URL)

TikTok Profile Data Scraper (by URL)

by CoreClaw

By entering URLs, batch extract public TikTok creator profile data, including bio, follower count, content performance, engagement metrics, and more, outputting in CSV or JSON format. Support user analysis and marketing decisions with zero-code operation and one-click export of structured data.

4.3
25 runs
From $2.7/1,000 results
View All Scrapers
CoreClaw

Deploy ready-to-use Workers to accelerate your data collection workflows.

Email: support@coreclaw.com

Resources

  • Quick Start
  • API Reference
  • Leads
  • Affiliate Program

Recommend

  • Store
  • Pricing

Address

Apex DataWorks Limited

UNIT 9, 1/F, THE CLOUD, 111 TUNG CHAU STREET, TAI KOK TSUI, KOWLOON,HONG KONG