Walk Harness
The walk harness replays saved audio to make realtime eval runs comparable. It streams G.711 mu-law WAV files in fixed-size chunks, commits the user turn manually (VAD off), and records responses, tool calls, and latency metrics.
What it does
- Loads a CSV with the crawl columns (excluding
expected_keywords) plusaudio_path. - Streams saved audio in fixed chunk sizes (default 20 ms) at a deterministic cadence.
- Commits the input buffer explicitly to avoid VAD variability.
- Captures transcript deltas, audio deltas, tool calls, and completion events.
- Writes
results.csvandsummary.jsonwith accuracy and latency stats. - Renders styled PNG plots under
results/<run>/plots/by default. - Stores a JSONL stream of all realtime events per example under
results/<run>/events/.
Files
walk_harness/generate_audio.py: Generates G.711 mu-law WAV files from the crawl CSV using TTS + ffmpeg.walk_harness/data/customer_service_synthetic.csv: Walk dataset withaudio_pathpointing to WAV files.walk_harness/run_realtime_evals.py: Runs the walk evals.walk_harness/results/<run>/events/*.jsonl: Event logs per datapoint.walk_harness/results/<run>/audio/<example_id>/output.wav: Assistant output audio per datapoint.
How to run
- Install ffmpeg (required for mu-law WAV encoding):
brew install ffmpeg
- Generate audio assets:
python walk_harness/generate_audio.py
- Run the eval harness:
python walk_harness/run_realtime_evals.py
- Quick smoke test:
python walk_harness/run_realtime_evals.py --max-examples 2
Inputs and audio format
- The dataset CSV must include an
audio_pathcolumn pointing to WAV files. - This harness currently expects G.711 mu-law audio at 8 kHz (
g711_ulaw), which is a common telephony format and makes runs more realistic than pure PCM. - Output audio is saved as PCM16 WAV (typically 24 kHz) for easy playback.
Adapting the harness
- C