referenceFree

bootstrap-realtime-eval

Bootstrap a new realtime eval folder inside this cookbook repo by choosing the right harness from examples/evals/realtime_evals, scaffolding prompt/tools/data files, generating a useful README, and validating it with smoke, full eval, and test runs. Use when a user wants to start a new crawl, walk, or run realtime eval in this repository.

Bootstrap Realtime Eval

Use this skill when the user wants a new realtime eval scaffold under examples/evals/realtime_evals/.

This skill is repo-specific. Do not copy harness code into the generated folder. The generated eval should point at the shared harnesses already in:

  • examples/evals/realtime_evals/crawl_harness
  • examples/evals/realtime_evals/walk_harness
  • examples/evals/realtime_evals/run_harness

Inputs To Collect

Always ask the user for the minimum set needed to choose and scaffold the eval before you create files, run the scaffold script, or author starter data. Do not skip this just because you can infer a default.

Ask for:

  • Eval name
  • Goal or scenario
  • Harness choice, or enough context to recommend one
  • System prompt path or inline text
  • Tools JSON path or tool descriptions
  • Data path or source materials
  • Desired graders

If the user does not know which harness they want, explain the options briefly and recommend one. See references/harness-selection.md.

When the user asks for synthetic audio but does not specify a harness, default to crawl text-to-TTS unless they need the generated audio to carry particular noise, telephony artifacts, speaker characteristics, or other replay-specific properties. Use walk for those cases.

Keep the questions concise and grouped into one short batch whenever possible.

If the user only provides user_text or a short task description, still ask the questions above first. If they answer only partially, then infer the remaining low-risk details, call out the assumptions, and make the scaffold easy to revise later.

Workflow

  1. Ask the user for the required inputs first.

    • Do this before making files or selecting a final harness.
    • If the user already supplied some of the inputs, ask only for the missing ones.
    • If you recommend a harness, wait for the user response before scaffolding.
  2. Pick the harness.

    • crawl: single-turn text-to-TTS.
    • walk: replay saved audi