Unified YouTube Scraper | Bulk Extract Video Metadata, Subtitles & Comments

Pricing

Unified YouTube Scraper | Bulk Extract Video Metadata, Subtitles & Comments

marianne-turner/youtube-scraper

Auto-detect URLs or keywords to bulk extract full metadata for YouTube videos, channels & playlists. Download subtitles & comments, export to JSON/CSV. Perfect for market research & competitor analysis.

Try for Free

Learn More

Unified YouTube Scraper. Give it YouTube URLs or search keywords and it
auto-detects the input type and returns videos with full metadata. One scraper
covers every YouTube data type:

Input	What it returns
Video URL (`watch?v=`, `youtu.be`, `/shorts/`)	That single video with full detail
Channel URL (`/@handle`, `/channel/UC...`, `/c/`, `/user/`)	The channel's videos / shorts / streams
Playlist URL (`?list=`)	Every video in the playlist
Search results URL (`/results?search_query=`)	Matching videos
Search keyword	Matching videos / Shorts / streams (respecting the filters)

Per-type result caps are independent and applied per search term and per
channel: maxResults (videos), maxResultsShorts, maxResultStreams.

Search terms support the full YouTube search filters — Sorting order,
Date filter, Length filter, Video type filter and Features
(HD, Subtitles/CC, Creative Commons, 3D, Live, Purchased, 4K, 360°, Location,
HDR, VR180) — all encoded into the search sp parameter. Channels additionally
support publishedAfter (date) and Channel Sort By.

Optional add-ons (per video): subtitles / transcript (downloadSubtitles,
with subtitleFormat srt/vtt/plaintext/json and preferAutoGeneratedSubtitles)
and comments (includeComments).

What fields can it extract?

Video: title, ID, URL, thumbnail, views, publish date, likes, duration, comments count
Channel: name, URL, subscriber count
Description text + links, monetization status, comment-disabled status, location
Subtitles availability + downloaded transcript (segments + plain text)
Top-level comments (text, likes, replies, author, date)

What output formats are supported?

JSON, CSV.

Table columns = one column per logical field (stable column count — exactly the fields in the Field Dictionary below). Array/object fields (subtitles, descriptionLinks, hashtags, collaborators, aboutChannelInfo, comments, …) stay as a single column: the console collapses them to "N items" / "N fields" (click to expand); CSV/XLSX export serializes the whole value as a JSON string in that one cell.

Long-subtitle note: the full SRT is inlined inside the subtitles cell's JSON. CSV has no per-cell length limit (full subtitles for long videos); Excel/XLSX caps cells at 32,767 chars, so very long subtitles get truncated in XLSX — export as CSV for complete long subtitles.

Result

This is an example of how results will look like.

json

[
  {
    "title": "I Built 100 Wells In Africa",
    "id": "0e3GPea1Tyg",
    "url": "https://www.youtube.com/watch?v=0e3GPea1Tyg",
    "type": "video",
    "sourceType": "channel",
    "input": "https://www.youtube.com/@MrBeast",
    "thumbnailUrl": "https://i.ytimg.com/vi/0e3GPea1Tyg/maxresdefault.jpg",
    "viewCount": "182,418,030 views",
    "date": "Nov 26, 2022",
    "likes": "4.2M",
    "commentsCount": "120K",
    "duration": "0:08:01",
    "channelName": "MrBeast",
    "channelUrl": "https://www.youtube.com/@MrBeast",
    "numberOfSubscribers": "320M subscribers",
    "text": "We built 100 wells in Africa to provide clean water ...",
    "descriptionLinks": ["https://www.beastphilanthropy.org/"],
    "subtitles": true,
    "subtitleLanguage": "en",
    "subtitlesText": "Today we're building 100 wells across Africa ...",
    "transcript": [
      {"start": 0.12, "dur": 3.44, "text": "Today we're building 100 wells"}
    ],
    "comments": [
      {"cid": "Ug...", "replyToCid": null, "type": "comment", "publishedTimeText": "1 year ago", "pageUrl": "https://www.youtube.com/watch?v=...", "videoId": "dQw4w9WgXcQ", "comment": "This is amazing", "author": "@viewer", "authorIsChannelOwner": false, "voteCount": 12000, "replyCount": 30, "hasCreatorHeart": false, "title": "We built 100 wells in Africa", "commentsCount": 48000}
    ],
    "isMonetized": true,
    "commentsTurnedOff": false,
    "location": null,
    "error": null,
    "error_code": null,
    "warning": null,
    "warning_code": null,
    "success": true
  }
]

Dictionary

Field	Type	Description
title	string	Video title
id	string	YouTube video ID
url	string	Canonical watch URL
type	string	Item type: video, short, or stream
sourceType	string	How the video was discovered: video, search, channel, playlist
input	string	The original input (URL or keyword) that produced this video
thumbnailUrl	string	Highest-resolution thumbnail URL
viewCount	string	View count text
date	string	Publish date or relative published time
likes	string	Like count text
commentsCount	string	Number of comments
duration	string	Video duration (H:MM:SS)
channelName	string	Channel name
channelUrl	string	Channel URL
numberOfSubscribers	string	Channel subscriber count text
text	string	Video description text
descriptionLinks	array	Links extracted from the description
subtitles	boolean	Whether subtitles / captions are available
subtitleLanguage	string	Language code of the downloaded subtitle track
subtitlesText	string	Full subtitle transcript in the chosen `subtitleFormat` (srt/vtt/plaintext/json)
transcript	array	Subtitle segments `[{start, dur, text}]`
comments	array	Top-level comments (only when Include Comments is on)`[{cid, replyToCid, type, publishedTimeText, pageUrl, videoId, comment, author, authorIsChannelOwner, voteCount, replyCount, hasCreatorHeart, title, commentsCount}]`
isMonetized	boolean	Whether the video appears monetized
commentsTurnedOff	boolean	Whether comments are disabled
liveStatus	string	Live status: is_live / is_upcoming / was_live / not_live
availableQualities	array	Available video qualities, e.g.`["2160p","1080p","720p"]` (from the player response)
location	string	Recording location; null if the video has none
error	string	Error message (null on success)
error_code	string	Error code (null on success)
warning	string	Non-blocking warning message
warning_code	string	Non-blocking warning code
success	boolean	Whether this record was scraped successfully

Input

Param	Type	Required	Default	Description
startURLs	array	No	`[]` (empty)	Direct video / channel / playlist / search / shorts URLs (search filters do NOT apply here).When set, URLs take priority and search terms are ignored.
searchKeywords	array	No	`python tutorial`	Keywords to search; each resolves to matching results. Ignored when `startURLs` is provided.
maxResults	integer	No	10	Max regular videos per search term / channel (0 = skip videos)
maxResultsShorts	integer	No	0	Max Shorts per search term / channel (0 = skip)
maxResultStreams	integer	No	0	Max streams per search term / channel (0 = skip)
sortBy	string	No	relevance	Search sort: relevance / date / views / rating
dateFilter	string	No	any	Upload date: any / hour / today / week / month / year
lengthFilter	string	No	any	Duration: any / short (<4m) / medium (4–20m) / long (>20m)
videoTypeFilter	string	No	any	Search type chip: any / video / channel / playlist / movie
features	array	No	`[]`	Search features: hd, subtitles, creativeCommons, 3d, live, purchased, 4k, 360, location, hdr, vr180
downloadSubtitles	boolean	No	false	Fetch each video's subtitles / transcript
subtitleLanguages	array	No	`[]`	Preferred subtitle language codes (e.g. en, es)
preferAutoGeneratedSubtitles	boolean	No	false	Prefer the auto-generated (ASR) track over manual captions
subtitleFormat	string	No	srt	`subtitlesText` format: srt / vtt / plaintext / json
publishedAfter	string	No	`""`	Channel only: keep videos published on/after YYYY-MM-DD
channelSortBy	string	No	latest	Channel videos order: latest / popular / oldest
includeComments	boolean	No	false	Fetch top-level comments with author + interaction details
maxComments	integer	No	20	Max top-level comments per video
maxConcurrency	integer	No	8	Videos enriched in parallel (raise if your proxy sustains many concurrent connections; lower on empty-response churn)
perVideoTimeoutSecs	integer	No	30	Abandon a video if enrichment exceeds this many seconds (0 = no limit); recorded as a 504

At least one of startURLs or searchKeywords must be provided. If both are set, URLs take priority and search terms are ignored — which is why the defaults pre-fill searchKeywords and leave startURLs empty.

Proxy

Requires PROXY_AUTH (username:password) and PROXY_DOMAIN (host:port)
environment variables; the request proxy is built as
socks5://{PROXY_AUTH}@{PROXY_DOMAIN}.

Locale coherence (anti-detection): the scraper sends a self-consistent locale
(accept-language + PREF timezone + hl/gl) selected by the GEO env var
(US/GB/DE/FR/JP/BR/IN, default US); ACCEPT_LANGUAGE can override the
language. With a rotating multi-country proxy, pin the egress to the SAME country as
GEO (most rotating proxies accept a country-XX token in the proxy username) — a
fixed locale over random-country IPs (timezone/language ≠ IP country) is a strong bot
signal. InnerTube API calls (youtubei/v1/*) use a real fetch/XHR header profile
(accept: */*, origin, x-youtube-client-*, JSON content-type) rather than a page
navigation one, and the request identity (TLS + UA + client-hints) stays consistent
per process. A CONSENT cookie is sent to skip EU consent walls.

Notes

This scraper reuses the repository's proven InnerTube fetch + video-detail /
comment parsers. The list-resolution paths are new and were validated against
live YouTube data: search → videos, channel → videos, and playlist → videos
all extract correctly. YouTube periodically migrates its videoRenderer /
lockupViewModel structures, so re-check on first run after long gaps.

Per-type caps & filters: maxResults / maxResultsShorts / maxResultStreams
are counted independently and applied per search term and per channel. The search
filters (sort / date / length / type / features) apply to search terms only — they
are encoded into the search sp protobuf — and are ignored for direct URLs.

Channel date range: publishedAfter filters a channel's videos by publish date
using a hybrid strategy — a cheap relative-time early-stop while paginating plus an
exact ISO-date filter from each video's detail. channelSortBy=latest is exact;
popular / oldest are best-effort via the sort chip and fall back to latest when
YouTube does not expose it.

Not supported: Apify's "Save subtitles to key-value store" — this platform has no
key-value store, so subtitles are returned only as the subtitlesText / transcript
fields.

Run model: runs as a single CoreClaw task (no per-URL subtask split), matching Apify — all search terms and all direct URLs are processed once in one run, with concurrency handled internally (the input schema sets no b).

Performance: inputs (search terms / channels / URLs) are resolved in parallel and
videos are enriched on a pipeline — enrichment starts as soon as the first targets are
found, overlapping ongoing resolution. Enrichment runs maxConcurrency videos at once
(default 8). Each video fans out several sub-requests (watch page, subtitles, comments),
so a very high concurrency can saturate a single proxy endpoint and trigger empty
responses regardless of how large the rotating IP pool is — raise maxConcurrency only
if your proxy sustains many concurrent connections, and lower it if you see empty-response
/ 404 churn. perVideoTimeoutSecs (default 30) bounds only the optional add-ons
(subtitles / comments): if they exceed the budget they are left empty and the video
record is still emitted with its full detail — a stuck add-on never discards a video.
Subtitles and comments for a video run concurrently and both reuse the watch page already
fetched for its detail (no duplicate page fetches).

Subtitles / transcript: caption availability and language (subtitles,
subtitleLanguage) are detected reliably. For the transcript body the scraper
uses a yt-dlp-style cascade: (1) the watch page's /api/timedtext track (json3 → xml);
YouTube increasingly gates this behind a PO / BotGuard token and returns an empty body,
so on miss it (2) re-fetches the player response via several mobile / embedded / TV
InnerTube clients (ANDROID_VR → TVHTML5 → IOS → MWEB → WEB_EMBEDDED_PLAYER →
ANDROID) whose caption baseUrls are usually not PO-gated (each client is gated
independently, so trying more raises the odds of hitting an un-gated track; the first one
that yields text wins) — this is how yt-dlp recovers captions without a browser, and it
brings back the transcript for most gated videos; and finally (3) the InnerTube
get_transcript endpoint. Every step is plain InnerTube over curl_cffi — no browser,
no JS runtime, no PO-token provider. If all steps miss, subtitlesText / transcript
come back empty and the scraper degrades gracefully. The few videos gated across all
clients would still need a PO-token provider or a browser engine (e.g. Camoufox), which
remains out of scope for this version. Client constants mirror yt-dlp master and may need
refreshing if YouTube rotates client versions.

Pricing

Failed results don't count

Rating

5.0

Developer

Marianne Turner

Worker Stats

66 Total runs

Success rate: 90.62%

Last updated: Jun 08, 2026

YouTube Channel Scraper(by URL)

by CoreClaw

Extract public profile data in bulk by entering a URL, including channel name, subscriber count, video count, view count, description and popular videos. Export in CSV or JSON format for competitor analysis and user research, with one-click structured data export.

5.0

119 runs

From $0.6/1,000 results

YouTube Video List Scraper（by keywords ）

by CoreClaw

By entering keywords, batch extract public YouTube channel data, including channel name, subscriber count, video count, view count, description, popular videos, etc., outputting in CSV or JSON format. Supports competitor analysis, user research, zero-code operation, one-click export of structured data.

4.4

50 runs

From $0.6/1,000 results

YouTube Comments & Replies Scraper（by ID）

by CoreClaw

Extract public YouTube video comments in bulk via video IDs, including content, commenter details, likes, replies, and author interactions. Export structured data to CSV or JSON with one click for sentiment analysis and user insights.

4.8

27 runs

From $0.6/1,000 results

YouTube Scraper（by ID）

by CoreClaw

Extract public YouTube video data in bulk via video IDs, including title, description, channel info, views, likes, comments and duration. Export structured data to CSV or JSON with one click for content analysis and statistics.

4.7

17 runs

From $0.6/1,000 results

View All Scrapers