Skip to content

Transcripts

YouTube Transcript API

Pull timestamped transcripts from any YouTube video, clean and ready for your AI agent.

Extract the complete spoken text from any YouTube video as timestamped segments. Each segment includes a start time and duration precise to hundredths of a second. The full transcript is also returned as a single string, so you can drop it straight into a prompt without any preprocessing.

Language is detected automatically. You get the language code, the track name, and whether captions were uploaded by the creator or generated by YouTube. One credit per request.

Endpoint

POST https://api.stophy.dev/v1/video

Request
1curl -X POST https://api.stophy.dev/v1/video \
2 -H "Authorization: Bearer $STOPHY_API_KEY" \
3 -H "Content-Type: application/json" \
4 -d '{"videoUrl":"https://youtu.be/YkOSUVzOAA4","type":"transcript"}'
Response
1{
2 "videoId": "YkOSUVzOAA4",
3 "language": { "code": "en", "name": "English (auto-generated)", "isAutoGenerated": true },
4 "segments": [
5 { "start": 0.04, "duration": 4.08, "text": "about a year and a half ago I made my" },
6 { "start": 1.68, "duration": 3.20, "text": "first ever video where I coined the T3" },
7 { "start": 4.12, "duration": 3.60, "text": "stack in that video I made an app where" }
8 ],
9 "text": "about a year and a half ago I made my first ever video..."
10}

What you can do

Sub-second timestamps

Start time and duration on every segment at hundredths of a second. Deep-link to a moment, generate chapters, or sync text to playback.

Full text included

The complete transcript as a single string alongside the segment array. No joining logic needed before passing it to a model.

Language and caption source

Language code, track name, and an isAutoGenerated flag on every response.

Common use cases

  • Summarize a conference talk or tutorial and publish it as a blog post
  • Index spoken content across a video library so users can search inside it
  • Generate chapter markers from segment timestamps
  • Extract timestamped quotes for clips or citations

Frequently asked questions

What does a transcript response look like?
You get a videoId, a language object with the code and whether captions are auto-generated, a segments array where each item has start, duration, and text, and the full transcript as a single concatenated string at the top level.
Does it work on videos without manual captions?
Yes. The API returns YouTube's auto-generated captions when creator captions are not available, and tells you which kind you received via the isAutoGenerated flag. If a video has neither, it returns a clear status instead of an error.
Is the full transcript returned in one response?
Yes. The entire transcript comes back in one call regardless of video length. A three-hour tutorial with thousands of segments comes back without pagination.