Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in miliseconds.
Adapted from the blog post, Build an AI-powered interactive video transcript with Mux Player CuePoints
This Val exposes an HTTP endpoint that takes a Mux Asset ID and a list of speakers and
- Uses Mux's auto-generated captions to generate a CuePoints object for Mux Player
- Uses AssemblyAI for speaker labeling (diarization)
- Uses GPT-4o to format text
Fork it and use it as a foundation for your own interactive video transcript project.
Required environment variables:
- Mux Access token details (
MUX_TOKEN_ID
,MUX_TOKEN_SECRET
) This endpoint requires an existing Mux asset that's ready with an audio-only static rendition associated with it. You can run this val to create a new one for testing. - AssemblyAI API key (
ASSEMBLYAI_API_KEY
). Get it from their dashboard here - OpenAI API key (
OPENAI_API_KEY
). Get it from their dashboard here
Make a POST request to the Val's endpoint with the following body, replacing the values with your own asset ID and the list of speakers. Speakers are listed in order of appearance.
{ "asset_id": "00OZ8VnQ01wDNQDdI8Qw3kf01FkGTtkMq2CW901ltq64Jyc", "speakers": ["Matt", "Nick"] }
This is just a demo, so it's obviously not battle hardened. The biggest issue is that it does this whole process synchronously, so if the any step takes longer than the Val's timeout, you're hosed.