YouTube Collector

Overview

Monitors YouTube channels for new videos, fetches transcripts (auto-generated or manual), and submits them as signals for embedding and clustering.

Schedule

Every 6 hours via Dagu.

Sources

65 channels: 12 trusted, 10 pro-Kremlin proxy, 23 unverified commentators, 12 unverified independent, 8 unverified media.

Full source list: Media Monitoring → YouTube channels.

Processing

  1. Check each channel’s RSS feed for new video IDs
  2. Fetch transcript via YouTube transcript API
  3. Extract title, description, transcript text, publication date
  4. Tag with channel handle, category, tier, language
  5. Submit to ingest API

Configuration

dagu/config/watchlists/youtube_channels.yaml — defines all channels with handle, name, YouTube channel ID, category, tier, language, notes, and rationality scores.