TikTok Data Extractor

Pricing

TikTok Data Extractor

q9w5f5h8/tiktok-data-extractor

Extract comprehensive TikTok data with a single click profiles with detailed metrics (followers, engagement, verification status), video analytics (views, likes, comments, hashtags), and hashtag trending data. Built with anti-detection technology for reliable scraping.

Try for Free

2,000 Free Results

TikTok Multi-Mode Scraping Worker

Overview

It is a TikTok data collection worker that follows the CoreClaw Worker specification.

It collects TikTok data through a remote browser connection and supports four collection modes:

author
video
search
tag

The worker is orchestrated by main.py. Each request is normalized, dispatched to the corresponding scraping module by collection_type, and the standardized results are sent to the platform via CoreSDK.Result.push_data().

Project Structure

text

main.py                 # Main entry: input parsing, mode dispatch, retry, result limiting, and result push
rep_author.py           # Author profile scraping
rep_video.py            # Video detail scraping
rep_search.py           # User search scraping, including captcha handling
rep_tag.py              # Tag page scraping
input_schema.json       # Platform input parameter definition
output_schema.json      # Platform output field definition
sdk.py                  # CoreClaw SDK main module
sdk_pb2.py              # Generated protobuf file
sdk_pb2_grpc.py         # Generated gRPC file
requirements.txt        # Python dependency manifest

Supported Collection Modes

Mode	Description	Example Input
`author`	Author profile scraping	`bellapoarch`, `@bellapoarch`, `https://www.tiktok.com/@bellapoarch`
`video`	Video detail scraping	`https://www.tiktok.com/@user/video/1234567890`
`search`	User search scraping	`apple`
`tag`	Tag page scraping	`fyp`

Input Parameters

The worker reads platform input via CoreSDK.Parameter.get_input_json_dict().

Parameter	Type	Required	Default	Description
`collection_type`	`string`	Yes	`author`	Collection mode. One of:`author`, `video`, `search`, `tag`
`targets`	`array`	Yes	-	List of targets to collect. Depending on the mode, this can be a username, URL, search keyword, or tag word
`max_results`	`integer`	No	`10`	Maximum number of rows returned for `search` and `tag` modes
`retry_times`	`integer`	No	`3`	Maximum retry count for retryable network-related errors
`retry_delay_seconds`	`number`	No	`1`	Delay in seconds between retries
`page_timeout_ms`	`integer`	No	`180000`	Page timeout in milliseconds
`wait_after_load_ms`	`integer`	No	`3000`	Additional wait time after page load, in milliseconds

Example Requests

User search

json

{
  "collection_type": "search",
  "targets": [
    { "string": "apple" }
  ],
  "max_results": 10
}

Tag scraping

json

{
  "collection_type": "tag",
  "targets": [
    { "string": "fyp" }
  ],
  "max_results": 10
}

Author scraping

json

{
  "collection_type": "author",
  "targets": [
    { "string": "bellapoarch" }
  ]
}

Video scraping

json

{
  "collection_type": "video",
  "targets": [
    { "string": "https://www.tiktok.com/@user/video/1234567890" }
  ]
}

Output Schema

This worker currently uses a shared superset output schema across all four modes. The main output fields include:

input_type
input_value
url
profile_url
title
desc
entity_id
video_id
author_id
username
nickname
tag
create_time
duration
play_count
digg_count
comment_count
share_count
collect_count
verified
signature
avatar_url
original_avatar_url
private_account
following_count
friends_count
fans_count
heart_count
video_count
music_name
music_author
music_id
music_play_url
cover_url
width
height
status
error
data_json

Result Limiting Rules

max_results is enforced centrally inside main.py:

search: returns at most max_results rows
tag: returns at most max_results rows
author: no row truncation
video: no row truncation

If max_results is not provided, the default value is 10.

Environment Variables

This worker relies on a remote browser connection. The following environment variables are supported:

Env Variable	Required	Default	Description
`BROWSER_WS`	No	-	When set, this WebSocket endpoint is used directly to connect to the browser
`ChromeWs`	No	`chrome-ws-inner.coreclaw.com`	Browser host address
`PROXY_AUTH`	No	-	Browser connection authentication credential
`PROXY_DOMAIN`	No	-	Currently used mainly for logging

The browser endpoint is resolved in the following order:

If BROWSER_WS is set, it is used directly.
If BROWSER_WS is not set but PROXY_AUTH is set, the endpoint becomes ws://{PROXY_AUTH}@{ChromeWs}.
If neither is set, the endpoint becomes ws://{ChromeWs}.

Processing Flow

1. Input normalization

normalize_request_items() converts the platform input into an internal task object:

json

{
  "input_type": "search",
  "input_value": "apple"
}

2. Sequential execution

The worker processes tasks sequentially and does not run collection jobs in parallel.

run() iterates over the normalized task list and calls process_item() for each entry.

3. Retry mechanism

The worker only retries recognizable network-related errors. Common retryable error markers include:

network is unreachable
failed to establish a new connection
max retries exceeded
name or service not known
temporary failure in name resolution
connection refused
connection timed out
read timed out
timeout 30000ms exceeded
timeout 180000ms exceeded
target page, context or browser has been closed
browser closed
websocket
socket hang up

Current Implementation Notes

All four modes share one common output field structure.
search mode automatically converts a keyword into a search URL.
tag mode automatically converts a tag word into a tag page URL.
rep_search.py includes captcha handling logic.
The current version prioritizes output stability and structural completeness for platform integration.

Quick Start

Upload this worker via CoreClaw CLI or the platform UI.
Choose the collection mode.
Provide usernames, video URLs, search keywords, or tag words in targets.
Run the task and retrieve standardized results.

Pricing

Failed results don't count

Rating

5.0

Developer

Techforce Global

Worker Stats

22 Total runs

Success rate: 100.00%

Last updated: Jun 17, 2026

TikTok Bulk Video Scraper

by CoreClaw

Extract public TikTok post data via profile URLs, including engagement, viral trends and audio info. One-click CSV/JSON export, zero code required.

4.8

31 runs

From $2.7/1,000 results

TikTok Profile Scraper(by search URL )

by CoreClaw

Extract public TikTok creator profile data using search URLs, including bio, follower counts, content performance and engagement metrics, without platform API limitations. Supports data export, API calls and third-party integrations.

4.6

29 runs

From $2.7/1,000 results

TikTok Comment Scraper(by posts URL)

by CoreClaw

Extract public TikTok video comment data in batches by entering video URLs, including comment content, user information, like counts, reply lists, etc., outputting in CSV or JSON format. Supports sentiment analysis and user insights with zero-code operation and one-click structured data export.

4.4

28 runs

From $2.7/1,000 results

TikTok Profile Data Scraper (by URL)

by CoreClaw

By entering URLs, batch extract public TikTok creator profile data, including bio, follower count, content performance, engagement metrics, and more, outputting in CSV or JSON format. Support user analysis and marketing decisions with zero-code operation and one-click export of structured data.

4.3

25 runs

From $2.7/1,000 results

View All Scrapers