emcho

README.mdemcho/hello-transcription/README.md

hello-transcription

Real-time speech transcription using OpenAI's Realtime API - a demonstration of transcription-only mode without AI responses.

Features

Real-time Transcription: Live speech-to-text conversion as you speak
Multiple Models: Choose between GPT-4o Transcribe, GPT-4o Mini Transcribe, or Whisper-1
Language Support: Transcribe in multiple languages (English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean)
Voice Activity Detection (VAD): Automatic detection of speech segments
Logprobs Support: Optional confidence scores for transcriptions
Split View: See transcriptions and event logs side-by-side

How It Works

This app uses OpenAI's Realtime API in transcription-only mode:

Your voice is captured via WebRTC
Audio is streamed to OpenAI's transcription service
Transcriptions are returned in real-time
No AI responses are generated (transcription only)

Model Differences

GPT-4o Transcribe: Streaming transcription with incremental updates
GPT-4o Mini Transcribe: Smaller model, streaming transcription
Whisper-1: Complete transcription after each speech segment (no streaming)

Configuration Options

Model: Select the transcription model
Language: Choose the primary language for better accuracy
VAD: Enable/disable automatic voice activity detection
Logprobs: Include confidence scores (for advanced use)

Usage

Click "Start" to begin transcription
Allow microphone access when prompted
Start speaking - transcriptions appear in real-time
Partial transcriptions update as you speak
Final transcriptions are marked in green
Click "Stop" to end the session

API Endpoints

GET / - Serves the transcription interface
POST /rtc - Creates a WebRTC transcription session
POST /observer/:callId - WebSocket observer for transcription events

Environment Variables

Set in your Val Town environment:

OPENAI_API_KEY - Your OpenAI API key (required)

Local Development

# Install Deno
curl -fsSL https://deno.land/install.sh | sh

# Run locally
deno run --allow-all main.tsx

Val Town Deployment

Fork/remix this val on Val Town
Add your OPENAI_API_KEY to Val Town secrets
Your app will be available at https://[your-val-name].val.run

Technical Details

The app uses OpenAI's Realtime API in transcription mode:

Session type: transcription (not realtime)
Audio format: PCM16
Noise reduction: Near-field (optimized for close microphones)
WebRTC data channel for receiving transcription events

Event Flow

input_audio_buffer.committed - Audio chunk received
conversation.item.input_audio_transcription.delta - Partial transcription
conversation.item.input_audio_transcription.completed - Final transcription

Credits

Built with OpenAI's Realtime API for transcription-only use cases.