A guide to creating AI-generated transcripts with summaries and key quotes from talks and podcasts using Whisper and Claude.
See https://wesmckinney.com/presentations for example.
Overview
This workflow takes a video or podcast URL and produces a formatted markdown transcript with:
- YAML frontmatter for metadata
- A first-person summary of key topics
- Extracted "money quotes"
- Full verbatim transcript with speaker labels
Prerequisites
Required Tools
# yt-dlp for downloading audio from YouTube/Vimeo/etc brew install yt-dlp # ffmpeg for audio compression (if needed) brew install ffmpeg # Python 3.8+ with these packages pip install openai pyyaml
API Keys
- OpenAI API key for Whisper transcription (set as
OPENAI_API_KEYenvironment variable)
Complete Workflow
Step 1: Get Video Metadata
First, extract metadata from the video URL:
yt-dlp --print "%(title)s|||%(upload_date)s|||%(duration)s" "VIDEO_URL"
Important: The upload date may differ from the actual talk date. Verify the actual event date from the video description or event website.
Step 2: Download Audio
# Download as MP3 (quality 5 is good balance of size/quality) yt-dlp -x --audio-format mp3 --audio-quality 5 -o "/tmp/talk-audio.%(ext)s" "VIDEO_URL"
Or use the provided script:
./scripts/download-audio.sh "VIDEO_URL" "output-name"
Step 3: Check File Size
Whisper API has a 25MB limit. Check and compress if needed:
ls -lh /tmp/talk-audio.mp3
# If > 25MB, compress:
ffmpeg -i /tmp/talk-audio.mp3 -b:a 64k -ac 1 /tmp/talk-audio-compressed.mp3Step 4: Transcribe with Whisper
Using the OpenAI Whisper API:
python scripts/transcribe.py /tmp/talk-audio.mp3 > /tmp/raw-transcript.txtOr manually via the API:
from openai import OpenAI client = OpenAI() with open("/tmp/talk-audio.mp3", "rb") as audio_file: transcript = client.audio.transcriptions.create( model="whisper-1", file=audio_file, response_format="text" ) print(transcript)
Step 5: Save Raw Transcript
Always save the raw transcript before formatting:
cp /tmp/raw-transcript.txt transcripts/raw/YYYY-MM-DD-event-slug-raw.txt
Step 6: Generate Summary with Claude
Use Claude to create a first-person summary. Provide the raw transcript and this prompt:
Please analyze this transcript and create:
1. A SUMMARY section written in first person ("I", "my") that covers the key topics discussed. Organize into subsections if there are distinct topics. Be factual - avoid grandiose language like "groundbreaking" or "revolutionary".
2. A KEY QUOTES section with 3-5 impactful direct quotes from the transcript, formatted as blockquotes with context.
3. A cleaned TRANSCRIPT with speaker names in bold followed by colons.
Raw transcript:
[paste transcript here]
Step 7: Create Formatted Transcript File
Create the file at transcripts/YYYY-MM-DD-event-slug.md:
--- title: "Talk Title" date: YYYY-MM-DD event: "Event Name" location: "City, State/Country" video_url: "https://..." video_type: "Talk" transcribed: YYYY-MM-DD --- *This transcript and summary were AI-generated and may contain errors.* ## Summary [First-person summary here] ## Key Quotes > "Quote text here" — Context or speaker ## Transcript **Speaker Name:** Dialogue text... **Other Speaker:** Response text...
See templates/transcript-template.md for a complete template.
Step 8: Add to talks.yml (if applicable)
If you're integrating with a Quarto blog like the original:
- date: 'YYYY-MM-DD' type: podcast # or: talk, interview, keynote, tutorial role: guest # for podcasts: guest or co-host event: "Event Name" title: "Talk Title" location: Remote links: - type: Video url: https://...
The date must match the transcript filename exactly for auto-linking to work.
Step 9: Preview and Verify
quarto preview # Check that transcript renders correctly # Verify links work
File Naming Convention
Pattern: YYYY-MM-DD-event-slug.md
- Use the actual talk date, not the upload date
- Use lowercase with hyphens for the slug
- Keep slugs short but descriptive
Examples:
2024-05-15-talk-python-to-me-pandas.md2023-09-20-pycon-keynote.md2022-03-21-gresearch-interview.md
YAML Frontmatter Reference
| Field | Required | Description |
|---|---|---|
title |
Yes | Talk or episode title |
date |
Yes | Actual talk date (YYYY-MM-DD) |
event |
Yes | Event, conference, or podcast name |
location |
Yes | City, State/Country or "Remote" |
video_url |
No* | URL to video/podcast |
video_type |
No* | Talk, Keynote, Podcast, Interview, Tutorial |
slides_url |
No* | URL to slides (if no video) |
transcribed |
No | Date transcript was created |
*Use either video_url + video_type OR slides_url, not both.
Summary Writing Guidelines
Voice and Tone
- First person: Write as if you gave the talk ("I discussed...", "My approach...")
- Factual: Focus on what was actually said, not interpretation
- No puffery: Avoid "groundbreaking", "revolutionary", "transformative", etc.
- Organized: Use subsections (
### Topic) for distinct themes
Structure
## Summary Brief overview paragraph of the main topics covered. ### First Major Topic Details about this topic... ### Second Major Topic Details about this topic...
Key Quotes Formatting
## Key Quotes > "The exact quote from the transcript" — Context about when/why this was said > "Another impactful quote" — Speaker attribution if multiple speakers
Transcript Formatting
- Speaker names in bold followed by colon
- Each speaker turn on its own paragraph
- Preserve natural speech (can clean up minor filler words)
- Use
*[brackets]*for non-speech elements:*[laughter]*,*[applause]*
## Transcript **Host:** Welcome to the show. Today we're talking about data science. **Guest:** Thanks for having me. I'm excited to discuss this topic. *[Brief pause]* **Host:** Let's start with your background.
Quality Checklist
Before publishing, verify:
- Filename follows
YYYY-MM-DD-event-slug.mdpattern - Date is actual talk date (not upload date)
- All required YAML fields present
- AI disclaimer included
- Summary is in first person
- Summary avoids grandiose language
- Key quotes use blockquote format
- Speaker names are bolded in transcript
- No obvious transcription errors or gaps
- Raw transcript saved in
raw/folder
Troubleshooting
Whisper API 25MB Limit
Compress audio to reduce file size:
ffmpeg -i input.mp3 -b:a 64k -ac 1 output.mp3
Missing Transcript Sections
If Whisper output has gaps (repeated symbols, [inaudible]):
- Re-run transcription on that section
- Manually transcribe from video
- Note gaps with
[inaudible]markers
Directory Structure
your-project/
├── transcripts/
│ ├── raw/ # Raw transcript backups
│ │ └── YYYY-MM-DD-*.txt
│ ├── _metadata.yml # Quarto metadata (optional)
│ ├── transcript-styles.css # Custom styles (optional)
│ └── YYYY-MM-DD-*.md # Formatted transcripts
├── scripts/
│ ├── download-audio.sh
│ ├── transcribe.py
│ └── format-transcript.py
└── templates/
└── transcript-template.md
Cost Considerations
- Whisper API: ~$0.006 per minute of audio
- Claude: Varies by usage for summary generation
A 1-hour talk costs approximately $0.36 for Whisper transcription.
License
These scripts and templates are provided as-is for educational purposes.