Twilio Realtime Translation Demo
This is a server-side demo for two-person phone translation with Twilio Media Streams and gpt-realtime-translate.
Each caller dials the same Twilio number, says the language they want to hear, and waits for a second caller. The server pairs the calls and opens two Realtime Translation sessions:
Caller A audio -> Realtime Translation -> Caller B hears B's selected language
Caller B audio -> Realtime Translation -> Caller A hears A's selected language
Why WebSocket here
Twilio Media Streams are already server-side WebSocket connections. The server receives audio/x-mulaw 8 kHz audio from Twilio, converts it to 24 kHz PCM16 for Realtime Translation, converts translated 24 kHz PCM16 output back to 8 kHz mulaw, and sends it back to Twilio as media messages.
Setup
From the cookbook repo root:
cd examples/voice_solutions/realtime_translation_guide/twilio-translation-demo
npm install
cp .env.example .env
Set:
OPENAI_API_KEY=your-openai-api-key
PUBLIC_URL=https://your-public-host.example.com
PORT=5050
OPENAI_TRANSLATION_MODEL=gpt-realtime-translate
TWILIO_AUTH_TOKEN=your-twilio-auth-token
ALLOWED_CALLER_NUMBERS=+15551234567,+15557654321
MAX_ACTIVE_CALLERS=4
Run:
npm run dev
Configure your Twilio phone number Voice webhook to:
POST https://your-public-host.example.com/incoming-call
For local testing, expose port 5050 through a public HTTPS/WSS tunnel or deploy this server somewhere Twilio can reach over port 443.
Public deployment notes
Twilio must be able to reach the same public HTTPS/WSS origin for /incoming-call, /choose-language, and /media-stream. Use a stable public deployment for repeatable demos.
Set OPENAI_API_KEY and TWILIO_AUTH_TOKEN as deployment secrets. Set PUBLIC_URL to the deployed origin so Twilio signature validation uses the same public URL that Twilio requested.
Use ALLOWED_CALLER_NUMBERS when you want the phone num