TutorialPython

How to Get a YouTube Transcript in Python

Hussein Hakizimana·May 28, 2026·4 min read

Getting YouTube transcript data in Python usually means youtube-transcript-api, which breaks whenever YouTube changes something, or wrestling with the official API, which doesn't give you transcript text at all. This post shows a simpler approach.

We'll use Stophy. Send a video URL, get back the full transcript and timestamps. No scraping, no fragile dependencies.

Setup

You need Python 3.8+ and httpx:

Terminal

1pip install httpx

Grab a Stophy API key from stophy.dev/dashboard.

Get the transcript

python

1import os
2import httpx
3
4def get_transcript(video_url: str) -> dict:
5    res = httpx.post(
6        "https://api.stophy.dev/v1/video",
7        headers={"Authorization": f"Bearer {os.environ['STOPHY_API_KEY']}"},
8        json={"videoUrl": video_url, "type": "transcript"},
9    )
10    res.raise_for_status()
11    return res.json()["data"]
12
13data = get_transcript("https://www.youtube.com/watch?v=YOUR_VIDEO_ID")
14print(data["text"])

data["text"] is the full transcript as one string — pass it straight to an LLM, write it to a file, whatever you need.

Working with timestamps

The segments list gives you each chunk with its timing:

python

1for segment in data["segments"]:
2    minutes = int(segment["start"] // 60)
3    seconds = int(segment["start"] % 60)
4    print(f"[{minutes:02d}:{seconds:02d}] {segment['text']}")

Each segment has text, start (seconds from the beginning), and duration. Useful when you're chunking for embeddings and want to keep position data, or when you need to link to a specific moment.

Check if it's auto-generated

python

1lang = data["language"]
2print(lang["isAutoGenerated"])  # True or False

Auto-generated captions work fine for most videos. For anything where accuracy really matters — interviews, medical content, technical talks — good to know what you're dealing with.

Some videos have no transcript

The API returns 200 with an empty field rather than an error:

python

1if "empty" in data:
2    print("No transcript:", data["empty"]["code"])
3else:
4    print(data["text"])

Full script

python

1import os
2import httpx
3
4def get_transcript(video_url: str) -> str | None:
5    res = httpx.post(
6        "https://api.stophy.dev/v1/video",
7        headers={"Authorization": f"Bearer {os.environ['STOPHY_API_KEY']}"},
8        json={"videoUrl": video_url, "type": "transcript"},
9    )
10    res.raise_for_status()
11    data = res.json()["data"]
12
13    if "empty" in data:
14        return None
15
16    return data["text"]
17
18if __name__ == "__main__":
19    transcript = get_transcript("https://www.youtube.com/watch?v=YOUR_VIDEO_ID")
20    if transcript:
21        print(transcript)
22    else:
23        print("No transcript for this video.")

The docs have the full response shape and the rest of the endpoints.

← Back to blog