Skip to content
← All posts
TutorialPython

How to Scrape YouTube Comments with Python

Hussein Hakizimana··5 min read

The official YouTube Data API has a comments endpoint, but the quota runs out fast and the data model requires multiple requests to get everything you need. Scrapers work until they don't — YouTube changes its page structure regularly.

This post shows how to pull comments in Python using the Stophy API. You get a full page of comments per request, continuation tokens for pagination, and a separate call for replies. No browser automation, no quota headaches.

Setup

Terminal
1pip install httpx

Get a Stophy API key from stophy.dev/dashboard.

Fetch the first page

python
1import os
2import httpx
3
4def get_comments(video_url: str, sort_by: str = "top") -> dict:
5 res = httpx.post(
6 "https://api.stophy.dev/v1/video",
7 headers={"Authorization": f"Bearer {os.environ['STOPHY_API_KEY']}"},
8 json={"videoUrl": video_url, "type": "comments", "sortBy": sort_by},
9 )
10 res.raise_for_status()
11 return res.json()["data"]
12
13data = get_comments("https://www.youtube.com/watch?v=YOUR_VIDEO_ID")
14
15for comment in data["items"]:
16 print(f"{comment['author']}: {comment['text']}")

sort_by takes "top" or "latest". Each comment has author, text, likeCount, replyCount, publishedAt, isPinned, isHearted, isChannelOwner, and a repliesToken.

Paginate through all comments

Pass continuationToken back in the next request to get the next page:

python
1def get_all_comments(video_url: str, sort_by: str = "top") -> list:
2 headers = {"Authorization": f"Bearer {os.environ['STOPHY_API_KEY']}"}
3 all_comments = []
4 token = None
5
6 while True:
7 body = {"videoUrl": video_url, "type": "comments", "sortBy": sort_by}
8 if token:
9 body["continuationToken"] = token
10
11 res = httpx.post("https://api.stophy.dev/v1/video", headers=headers, json=body)
12 res.raise_for_status()
13 data = res.json()["data"]
14
15 all_comments.extend(data["items"])
16 token = data.get("continuationToken")
17 if not token:
18 break
19
20 return all_comments

Fetch replies

Use the repliesToken from any top-level comment:

python
1def get_replies(replies_token: str) -> list:
2 res = httpx.post(
3 "https://api.stophy.dev/v1/video",
4 headers={"Authorization": f"Bearer {os.environ['STOPHY_API_KEY']}"},
5 json={"type": "replies", "continuationToken": replies_token},
6 )
7 res.raise_for_status()
8 return res.json()["data"]["items"]

Full script: export top 200 comments to CSV

python
1import csv
2import os
3import httpx
4
5def fetch_comments(video_url: str, max_comments: int = 200) -> list:
6 headers = {"Authorization": f"Bearer {os.environ['STOPHY_API_KEY']}"}
7 results = []
8 token = None
9
10 while len(results) < max_comments:
11 body = {"videoUrl": video_url, "type": "comments", "sortBy": "top"}
12 if token:
13 body["continuationToken"] = token
14
15 res = httpx.post("https://api.stophy.dev/v1/video", headers=headers, json=body)
16 res.raise_for_status()
17 data = res.json()["data"]
18
19 results.extend(data["items"])
20 token = data.get("continuationToken")
21 if not token:
22 break
23
24 return results[:max_comments]
25
26def save_csv(comments: list, filename: str) -> None:
27 fields = ["author", "text", "likeCount", "replyCount", "publishedAt", "isPinned"]
28 with open(filename, "w", newline="", encoding="utf-8") as f:
29 writer = csv.DictWriter(f, fieldnames=fields, extrasaction="ignore")
30 writer.writeheader()
31 writer.writerows(comments)
32
33if __name__ == "__main__":
34 comments = fetch_comments("https://www.youtube.com/watch?v=YOUR_VIDEO_ID", 200)
35 save_csv(comments, "comments.csv")
36 print(f"Saved {len(comments)} comments to comments.csv")

The docs have the full reference.