Search

Code
3,172

jubertioai/hello-transcription/realtime-conversations.md

2 matches

All these components together make up a Realtime Session. You will use client events to update the state of the session, and listen for server events to react to state changes within the session.
![diagram realtime state](https://openaidevs.retool.com/api/file/11fe71d2-611e-4a26-a587-881719a90e56)
Session lifecycle events
**To play output audio back on a client device like a web browser, we recommend using WebRTC rather than WebSockets**. WebRTC will be more robust sending media to client devices over uncertain network conditions.
But to work with audio output in server-to-server applications using a WebSocket, you will need to listen for [`response.audio.delta`](/docs/api-reference/realtime-server-events/response/audio/delta) events containing the Base64-encoded chunks of audio data from the model. You will either need to buffer these chunks and write them out to a file, or maybe immediately stream them to another source like [a phone call with Twilio](https://www.twilio.com/en-us/blog/twilio-openai-realtime-api-launch-integration).
Note that the [`response.audio.done`](/docs/api-reference/realtime-server-events/response/audio/done) and [`response.done`](/docs/api-reference/realtime-server-events/response/done) events won't actually contain audio data in them - just audio content transcriptions. To get the actual bytes, you'll need to listen for the [`response.audio.delta`](/docs/api-reference/realtime-server-events/response/audio/delta) events.

jubertioai/hello-transcription/README.md

7 matches

# hello-transcription
Real-time speech transcription using OpenAI's Realtime API - a demonstration of transcription-only mode without AI responses.
## Features
## How It Works
This app uses OpenAI's Realtime API in transcription-only mode:
1. Your voice is captured via WebRTC
2. Audio is streamed to OpenAI's transcription service
3. Transcriptions are returned in real-time
4. No AI responses are generated (transcription only)
Set in your Val Town environment:
- `OPENAI_API_KEY` - Your OpenAI API key (required)
## Local Development
1. Fork/remix this val on Val Town
2. Add your `OPENAI_API_KEY` to Val Town secrets
3. Your app will be available at `https://[your-val-name].val.run`
## Technical Details
The app uses OpenAI's Realtime API in transcription mode:
- Session type: `transcription` (not `realtime`)
- Audio format: PCM16
## Credits
Built with OpenAI's Realtime API for transcription-only use cases.

jubertioai/hello-transcription/index.html

3 matches

  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>OpenAI Realtime Transcription</title>
  <style>
    * { margin: 0; padding: 0; box-sizing: border-box; }
</head>
<body>
  <h1>📝 OpenAI Realtime Transcription</h1>
  <p class="description">
    Transcribe audio in real-time using OpenAI's Realtime API. 
    <a href="/source" target="_blank">View source</a>
  </p>

jubertioai/hello-transcription/CLAUDE.md

13 matches

# Hello-Transcription - OpenAI Realtime API Transcription Demo
## 🎯 Project Overview
Hello-Transcription demonstrates the transcription-only mode of OpenAI's Realtime API. Unlike the conversational mode, this implementation focuses purely on speech-to-text conversion without generating AI responses, making it ideal for subtitles, live captions, meeting transcriptions, and other transcription-focused use cases.
**Created:** September 2, 2025  
**Platform:** Val Town  
**API:** OpenAI Realtime API (Transcription Mode)  
**Key Feature:** Real-time streaming transcription with multiple model support
- **Runtime:** Deno (Val Town platform)
- **Framework:** Hono (lightweight web framework)
- **Transcription:** OpenAI Realtime API in transcription mode
- **Connection:** WebRTC with data channel for events
- **Frontend:** Vanilla JavaScript with split-view interface
1. **Audio Input**
   ```
   User speaks → Microphone → WebRTC → OpenAI
   ```
   ```bash
   # Create .env file
   echo "OPENAI_API_KEY=sk-..." > .env
   
   # Install Deno
**Solutions:**
- Check microphone permissions
- Verify OPENAI_API_KEY is set
- Check browser console for errors
- Ensure WebRTC connection established
2. **Set Environment**
   - Add `OPENAI_API_KEY` in Val Town secrets
3. **Deploy**
### Environment Variables
- `OPENAI_API_KEY` - Required for OpenAI API access
## 📝 Future Enhancements
### Documentation
- [OpenAI Realtime Transcription Guide](https://platform.openai.com/docs/guides/realtime-transcription)
- [Realtime API Reference](https://platform.openai.com/docs/api-reference/realtime)
- [Voice Activity Detection Guide](https://platform.openai.com/docs/guides/realtime-vad)
- [Val Town Documentation](https://docs.val.town)
## 🎯 Summary
Hello-Transcription successfully demonstrates the transcription-only capabilities of OpenAI's Realtime API. Key achievements:
1. **Pure Transcription**: No AI responses, focused solely on speech-to-text

emcho/hello-transcription/index.html

3 matches

  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>OpenAI Realtime Transcription</title>
  <style>
    * { margin: 0; padding: 0; box-sizing: border-box; }
</head>
<body>
  <h1>📝 OpenAI Realtime Transcription</h1>
  <p class="description">
    Transcribe audio in real-time using OpenAI's Realtime API. 
    <a href="/source" target="_blank">View source</a>
  </p>

emcho/hello-transcription/realtime-sip.md

14 matches

If you want to connect a phone number to the Realtime API, use a SIP trunking provider (e.g., Twilio). This is a service that converts your phone call to IP traffic. After you purchase a phone number from your SIP trunking provider, follow the instructions below.
Start by creating a [webhook](/docs/guides/webhooks) for incoming calls, at platform.openai.com. Then, point your SIP trunk at the OpenAI SIP endpoint, using the project ID for which you configured the webhook, e.g., `sip:$PROJECT_ID@sip.api.openai.com;transport=tls`. To find your `$PROJECT_ID`, go to your \[settings\] > **General**. The page displays the project ID. It should have a `proj_` prefix.
When OpenAI receives SIP traffic associated with your project, the webhook that you configured will be fired. The event fired will be a [`realtime.call.incoming`](/docs/api-reference/webhook_events/realtime/call/incoming) event.
This webhook lets you accept or reject the call. When accepting the call, you'll provide the configuration (instructions, voice, etc) for the Realtime API session. Once established, you can set up a web socket and monitor the session as usual. The APIs to accept, reject, and monitor the call are documented below.
URIs used for interacting with Realtime API and SIP:
|SIP URI|sip:$PROJECT_ID@sip.api.openai.com;transport=tls|
|Accept URI|https://api.openai.com/v1/realtime/calls/$CALL_ID/accept|
|Reject URI|https://api.openai.com/v1/realtime/calls/$CALL_ID/reject|
|Refer URI|https://api.openai.com/v1/realtime/calls/$CALL_ID/refer|
|Events URI|wss://api.openai.com/v1/realtime?call_id=$CALL_ID|
Find your `$CALL_ID` in the `call_id` field in data object present in the webhook. See an example in the next section.
```python
from flask import Flask, request, Response, jsonify, make_response
from openai import OpenAI, InvalidWebhookSignatureError
import asyncio
import json
app = Flask(__name__)
client = OpenAI(
    webhook_secret=os.environ["OPENAI_WEBHOOK_SECRET"]
)
AUTH_HEADER = {
    "Authorization": "Bearer " + os.getenv("OPENAI_API_KEY")
}
    try:
        async with websockets.connect(
            "wss://api.openai.com/v1/realtime?call_id=" + call_id,
            additional_headers=AUTH_HEADER,
        ) as websocket:
        if event.type == "realtime.call.incoming":
            requests.post(
                "https://api.openai.com/v1/realtime/calls/"
                + event.data.call_id
                + "/accept",
It's also possible to redirect the call to another number. During the call, make a POST to the `refer` endpoint:
|URL|https://api.openai.com/v1/realtime/calls/$CALL_ID/refer|
|Payload|JSON with one key target_uriThis is the value used in the Refer-To. You can use Tel-URI for example tel:+14152909007|
|Headers|Authorization: Bearer YOUR_API_KEYSubstitute YOUR_API_KEY with a standard API key|

emcho/hello-transcription/realtime-models-prompting.md

6 matches

Our most advanced speech-to-speech model is [gpt-realtime](/docs/models/gpt-realtime).
This model shows improvements in following complex instructions, calling tools, and producing speech that sounds natural and expressive. For more information, see the [announcement blog post](https://openai.com/index/introducing-gpt-realtime/).
Update your session to use a prompt
----------------------
Here are top tips for prompting the realtime speech-to-speech model. For a more in-depth guide to prompting, see the [realtime prompting cookbook](https://cookbook.openai.com/examples/realtime_prompting_guide).
### General usage tips
--------------------------------------------
Here are 10 tips for creating effective, consistently performing prompts with gpt-realtime. These are just an overview. For more details and full system prompt examples, see the [realtime prompting cookbook](https://cookbook.openai.com/examples/realtime_prompting_guide).
#### 1\. Be precise. Kill conflicts.
You can include sample phrases for preambles to add variety and better tailor to your use case.
There are several other ways to improve the model's behavior when performing tool calls and keeping the conversation going with the user. Ideally, the model is calling the right tools proactively, checking for confirmation for any important write actions, and keeping the user informed along the way. For more specifics, see the [realtime prompting cookbook](https://cookbook.openai.com/examples/realtime_prompting_guide).
#### 9\. Use LLMs to improve your prompt.
This guide is long but not exhaustive! For more in a specific area, see the following resources:
*   [Realtime prompting cookbook](https://cookbook.openai.com/examples/realtime_prompting_guide): Full prompt examples and a deep dive into when and how to use them
*   [Inputs and outputs](/docs/guides/realtime-inputs-outputs): Text and audio input requirements and output options
*   [Managing conversations](/docs/guides/realtime-conversations): Learn to manage a conversation for the duration of a realtime session
*   [MCP servers](/docs/guides/realtime-mcp): How to use MCP servers to access additional tools in realtime apps
*   [Realtime transcription](/docs/guides/realtime-transcription): How to transcribe audio with the Realtime API
*   [Voice agents](https://openai.github.io/openai-agents-js/guides/voice-agents/quickstart/): A quickstart for building a voice agent with the Agents SDK
Was this page useful?

emcho/hello-transcription/realtime-conversations.md

2 matches

All these components together make up a Realtime Session. You will use client events to update the state of the session, and listen for server events to react to state changes within the session.
![diagram realtime state](https://openaidevs.retool.com/api/file/11fe71d2-611e-4a26-a587-881719a90e56)
Session lifecycle events
**To play output audio back on a client device like a web browser, we recommend using WebRTC rather than WebSockets**. WebRTC will be more robust sending media to client devices over uncertain network conditions.
But to work with audio output in server-to-server applications using a WebSocket, you will need to listen for [`response.audio.delta`](/docs/api-reference/realtime-server-events/response/audio/delta) events containing the Base64-encoded chunks of audio data from the model. You will either need to buffer these chunks and write them out to a file, or maybe immediately stream them to another source like [a phone call with Twilio](https://www.twilio.com/en-us/blog/twilio-openai-realtime-api-launch-integration).
Note that the [`response.audio.done`](/docs/api-reference/realtime-server-events/response/audio/done) and [`response.done`](/docs/api-reference/realtime-server-events/response/done) events won't actually contain audio data in them - just audio content transcriptions. To get the actual bytes, you'll need to listen for the [`response.audio.delta`](/docs/api-reference/realtime-server-events/response/audio/delta) events.

emcho/hello-transcription/realtime.md

5 matches

Build low-latency, multimodal LLM applications with the Realtime API.
The OpenAI Realtime API enables low-latency communication with [models](/docs/models) that natively support speech-to-speech interactions as well as multimodal inputs (audio, images, and text) and outputs (audio and text). These APIs can also be used for [realtime audio transcription](/docs/guides/realtime-transcription).
Voice agents
------------
One of the most common use cases for the Realtime API is building voice agents for speech-to-speech model interactions in the browser. Our recommended starting point for these types of applications is the [Agents SDK for TypeScript](https://openai.github.io/openai-agents-js/guides/voice-agents/), which uses a [WebRTC connection](/docs/guides/realtime-webrtc) to the Realtime model in the browser, and [WebSocket](/docs/guides/realtime-websocket) when used on the server.
```js
import { RealtimeAgent, RealtimeSession } from "@openai/agents/realtime";
const agent = new RealtimeAgent({
Follow the voice agent quickstart to build Realtime agents in the browser.
](https://openai.github.io/openai-agents-js/guides/voice-agents/quickstart/)
To use the Realtime API directly outside the context of voice agents, check out the other connection options below.
------------------
While building [voice agents with the Agents SDK](https://openai.github.io/openai-agents-js/guides/voice-agents/) is the fastest path to one specific type of application, the Realtime API provides an entire suite of flexible tools for a variety of use cases.
There are three primary supported interfaces for the Realtime API:

emcho/hello-transcription/realtime-server-controls.md

7 matches


```javascript
const baseUrl = "https://api.openai.com/v1/realtime/calls";
const model = "gpt-realtime";
const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
// Connect to a WebSocket for the in-progress call
const url = "wss://api.openai.com/v1/realtime?call_id=" + callId;
const ws = new WebSocket(url, {
    headers: {
        Authorization: "Bearer " + process.env.OPENAI_API_KEY,
    },
});
### With SIP
1.  A user connects to OpenAI via phone over SIP.
2.  OpenAI sends a webhook to your application’s backend webhook URL, notifying your app of the state of the session.
```text
POST https://my_website.com/webhook_endpoint
user-agent: OpenAI/1.0 (+https://platform.openai.com/docs/webhooks)
content-type: application/json
webhook-id: wh_685342e6c53c8190a1be43f081506c52 # unique id for idempotency
```
3.  The application server opens a WebSocket connection to the Realtime API using the `call_id` value provided in the webhook. This `call_id` looks like this: `wss://api.openai.com/v1/realtime?call_id={callId}`. The WebSocket connection will live for the life of the SIP call.
Was this page useful?