Real-time speech transcription using OpenAI's Realtime API - a demonstration of transcription-only mode without AI responses.
- Real-time Transcription: Live speech-to-text conversion as you speak
- Multiple Models: Choose between GPT-4o Transcribe, GPT-4o Mini Transcribe, or Whisper-1
- Language Support: Transcribe in multiple languages (English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean)
- Voice Activity Detection (VAD): Automatic detection of speech segments
- Logprobs Support: Optional confidence scores for transcriptions
- Split View: See transcriptions and event logs side-by-side
This app uses OpenAI's Realtime API in transcription-only mode:
- Your voice is captured via WebRTC
- Audio is streamed to OpenAI's transcription service
- Transcriptions are returned in real-time
- No AI responses are generated (transcription only)
- GPT-4o Transcribe: Streaming transcription with incremental updates
- GPT-4o Mini Transcribe: Smaller model, streaming transcription
- Whisper-1: Complete transcription after each speech segment (no streaming)
- Model: Select the transcription model
- Language: Choose the primary language for better accuracy
- VAD: Enable/disable automatic voice activity detection
- Logprobs: Include confidence scores (for advanced use)
- Click "Start" to begin transcription
- Allow microphone access when prompted
- Start speaking - transcriptions appear in real-time
- Partial transcriptions update as you speak
- Final transcriptions are marked in green
- Click "Stop" to end the session
GET /
- Serves the transcription interface
POST /rtc
- Creates a WebRTC transcription session
POST /observer/:callId
- WebSocket observer for transcription events
Set in your Val Town environment:
OPENAI_API_KEY
- Your OpenAI API key (required)
# Install Deno
curl -fsSL https://deno.land/install.sh | sh
# Run locally
deno run --allow-all main.tsx
- Fork/remix this val on Val Town
- Add your
OPENAI_API_KEY
to Val Town secrets
- Your app will be available at
https://[your-val-name].val.run
The app uses OpenAI's Realtime API in transcription mode:
- Session type:
transcription
(not realtime
)
- Audio format: PCM16
- Noise reduction: Near-field (optimized for close microphones)
- WebRTC data channel for receiving transcription events
input_audio_buffer.committed
- Audio chunk received
conversation.item.input_audio_transcription.delta
- Partial transcription
conversation.item.input_audio_transcription.completed
- Final transcription
Built with OpenAI's Realtime API for transcription-only use cases.