
Use YouTube Scraper to extract YouTube videos, Shorts, channels, playlists, search results, subtitles, transcripts, and comments. Export structured data in JSON or CSV for YouTube SEO, competitor analysis, content research, and automation workflows.
Unified YouTube Scraper. Give it YouTube URLs or search keywords and it auto-detects the input type and returns videos with full metadata. One scraper covers every YouTube data type:
| Input | What it returns |
|---|---|
Video URL (watch?v=, youtu.be, /shorts/) | That single video with full detail |
Channel URL (/@handle, /channel/UC..., /c/, /user/) | The channel's videos / shorts / streams |
Playlist URL (?list=) | Every video in the playlist |
Search results URL (/results?search_query=) | Matching videos |
| Search keyword | Matching videos / Shorts / streams (respecting the filters) |
Per-type result caps are independent and applied per search term and per channel: maxResults (videos), maxResultsShorts, maxResultStreams.
Search terms support the full YouTube search filters — Sorting order, Date filter, Length filter, Video type filter and Features (HD, Subtitles/CC, Creative Commons, 3D, Live, Purchased, 4K, 360°, Location, HDR, VR180) — all encoded into the search sp parameter. Channels additionally support publishedAfter (date) and Channel Sort By.
Optional add-ons (per video): subtitles / transcript (downloadSubtitles, with subtitleFormat srt/vtt/plaintext/json and preferAutoGeneratedSubtitles) and comments (includeComments).
JSON, CSV.
This is an example of how results will look like.
| Field | Type | Description |
|---|---|---|
| title | string | Video title |
| id | string | YouTube video ID |
| url | string | Canonical watch URL |
| type | string | Item type: video, short, or stream |
| sourceType | string | How the video was discovered: video, search, channel, playlist |
| input | string | The original input (URL or keyword) that produced this video |
| thumbnailUrl | string | Highest-resolution thumbnail URL |
| viewCount | string | View count text |
| date | string | Publish date or relative published time |
| likes | string | Like count text |
| commentsCount | string | Number of comments |
| duration | string | Video duration (H:MM:SS) |
| channelName | string | Channel name |
| channelUrl | string | Channel URL |
| numberOfSubscribers | string | Channel subscriber count text |
| text | string | Video description text |
| descriptionLinks | array | Links extracted from the description |
| subtitles | boolean | Whether subtitles / captions are available |
| subtitleLanguage | string | Language code of the downloaded subtitle track |
| subtitlesText | string | Full subtitle transcript in the chosen subtitleFormat (srt/vtt/plaintext/json) |
| transcript | array | Subtitle segments [{start, dur, text}] |
| comments | array | Top-level comments [{comment_id, text, likes, replies, author, authorChannel, date}] |
| isMonetized | boolean | Whether the video appears monetized |
| commentsTurnedOff | boolean | Whether comments are disabled |
| liveStatus | string | Live status: is_live / is_upcoming / was_live / not_live |
| availableQualities | array | Available video qualities, e.g.["2160p","1080p","720p"] (from the player response) |
| location | string | Recording location (Apify-style; null if the video has none) |
| error | string | Error message (null on success) |
| error_code | string | Error code (null on success) |
| warning | string | Non-blocking warning message |
| warning_code | string | Non-blocking warning code |
| success | boolean | Whether this record was scraped successfully |
| Param | Type | Required | Default | Description |
|---|---|---|---|---|
| startURLs | array | No | @MrBeast, a watch URL | Direct video / channel / playlist / search / shorts URLs (search filters do NOT apply here) |
| searchKeywords | array | No | python tutorial | Keywords to search; each resolves to matching results |
| maxResults | integer | No | 10 | Max regular videos per search term / channel (0 = skip videos) |
| maxResultsShorts | integer | No | 0 | Max Shorts per search term / channel (0 = skip) |
| maxResultStreams | integer | No | 0 | Max streams per search term / channel (0 = skip) |
| sortBy | string | No | relevance | Search sort: relevance / date / views / rating |
| dateFilter | string | No | any | Upload date: any / hour / today / week / month / year |
| lengthFilter | string | No | any | Duration: any / short (<4m) / medium (4–20m) / long (>20m) |
| videoTypeFilter | string | No | any | Search type chip: any / video / channel / playlist / movie |
| features | array | No | [] | Search features: hd, subtitles, creativeCommons, 3d, live, purchased, 4k, 360, location, hdr, vr180 |
| downloadSubtitles | boolean | No | false | Fetch each video's subtitles / transcript |
| subtitleLanguages | array | No | [] | Preferred subtitle language codes (e.g. en, es) |
| preferAutoGeneratedSubtitles | boolean | No | false | Prefer the auto-generated (ASR) track over manual captions |
| subtitleFormat | string | No | srt | subtitlesText format: srt / vtt / plaintext / json |
| publishedAfter | string | No | "" | Channel only: keep videos published on/after YYYY-MM-DD |
| channelSortBy | string | No | latest | Channel videos order: latest / popular / oldest |
| includeComments | boolean | No | false | Fetch top-level comments (extra, beyond Apify's base actor) |
| maxComments | integer | No | 20 | Max top-level comments per video |
| maxConcurrency | integer | No | 8 | Videos enriched in parallel (raise if your proxy sustains many concurrent connections; lower on empty-response churn) |
| perVideoTimeoutSecs | integer | No | 30 | Abandon a video if enrichment exceeds this many seconds (0 = no limit); recorded as a 504 |
At least one of startURLs or searchKeywords must be provided.
Requires PROXY_AUTH (username:password) and PROXY_DOMAIN (host:port) environment variables; the request proxy is built as socks5://{PROXY_AUTH}@{PROXY_DOMAIN}.
Locale coherence (anti-detection): the scraper sends a self-consistent locale (accept-language + PREF timezone + hl/gl) selected by the GEO env var (US/GB/DE/FR/JP/BR/IN, default US); ACCEPT_LANGUAGE can override the language. With a rotating multi-country proxy, pin the egress to the SAME country as GEO (most rotating proxies accept a country-XX token in the proxy username) — a fixed locale over random-country IPs (timezone/language ≠ IP country) is a strong bot signal. InnerTube API calls (youtubei/v1/*) use a real fetch/XHR header profile (accept: */*, origin, x-youtube-client-*, JSON content-type) rather than a page navigation one, and the request identity (TLS + UA + client-hints) stays consistent per process. A CONSENT cookie is sent to skip EU consent walls.
This scraper reuses the repository's proven InnerTube fetch + video-detail / comment parsers. The list-resolution paths are new and were validated against live YouTube data: search → videos, channel → videos, and playlist → videos all extract correctly. YouTube periodically migrates its videoRenderer / lockupViewModel structures, so re-check on first run after long gaps.
Per-type caps & filters: maxResults / maxResultsShorts / maxResultStreams are counted independently and applied per search term and per channel. The search filters (sort / date / length / type / features) apply to search terms only — they are encoded into the search sp protobuf — and are ignored for direct URLs.
Channel date range: publishedAfter filters a channel's videos by publish date using a hybrid strategy — a cheap relative-time early-stop while paginating plus an exact ISO-date filter from each video's detail. channelSortBy=latest is exact; popular / oldest are best-effort via the sort chip and fall back to latest when YouTube does not expose it.
Not supported: Apify's "Save subtitles to key-value store" — this platform has no key-value store, so subtitles are returned only as the subtitlesText / transcript fields.
Run model: runs as a single task (no per-URL subtask split), matching Apify — all search terms and all direct URLs are processed once in one run, with concurrency handled internally.
Performance: inputs (search terms / channels / URLs) are resolved in parallel and videos are enriched on a pipeline — enrichment starts as soon as the first targets are found, overlapping ongoing resolution. Enrichment runs maxConcurrency videos at once (default 8). Each video fans out several sub-requests (watch page, subtitles, comments), so a very high concurrency can saturate a single proxy endpoint and trigger empty responses regardless of how large the rotating IP pool is — raise maxConcurrency only if your proxy sustains many concurrent connections, and lower it if you see empty-response / 404 churn. perVideoTimeoutSecs (default 30) bounds only the optional add-ons (subtitles / comments): if they exceed the budget they are left empty and the video record is still emitted with its full detail — a stuck add-on never discards a video. Subtitles and comments for a video run concurrently and both reuse the watch page already fetched for its detail (no duplicate page fetches).
Subtitles / transcript: caption availability and language (subtitles, subtitleLanguage) are detected reliably. For the transcript body the scraper uses a yt-dlp-style cascade: (1) the watch page's /api/timedtext track (json3 → xml); YouTube increasingly gates this behind a PO / BotGuard token and returns an empty body, so on miss it (2) re-fetches the player response via several mobile / embedded / TV InnerTube clients (ANDROID_VR → TVHTML5 → IOS → MWEB → WEB_EMBEDDED_PLAYER → ANDROID) whose caption baseUrls are usually not PO-gated (each client is gated independently, so trying more raises the odds of hitting an un-gated track; the first one that yields text wins) — this is how yt-dlp recovers captions without a browser, and it brings back the transcript for most gated videos; and finally (3) the InnerTube **get_transcript** endpoint. Every step is plain InnerTube over curl_cffi — no browser, no JS runtime, no PO-token provider. If all steps miss, subtitlesText / transcript come back empty and the scraper degrades gracefully. The few videos gated across all clients would still need a PO-token provider or a browser engine (e.g. Camoufox), which remains out of scope for this version. Client constants mirror yt-dlp master and may need refreshing if YouTube rotates client versions.
Explore more popular scrapers from our marketplace
by CoreClaw
Extract public profile data in bulk by entering a URL, including channel name, subscriber count, video count, view count, description and popular videos. Export in CSV or JSON format for competitor analysis and user research, with one-click structured data export.
by CoreClaw
By entering keywords, batch extract public YouTube channel data, including channel name, subscriber count, video count, view count, description, popular videos, etc., outputting in CSV or JSON format. Supports competitor analysis, user research, zero-code operation, one-click export of structured data.
by CoreClaw
Extract public YouTube video comments in bulk via video IDs, including content, commenter details, likes, replies, and author interactions. Export structured data to CSV or JSON with one click for sentiment analysis and user insights.
by CoreClaw
Extract public YouTube video data in bulk via video IDs, including title, description, channel info, views, likes, comments and duration. Export structured data to CSV or JSON with one click for content analysis and statistics.