referenceFree

imagegen_evals

This folder contains a lightweight vision eval harness plus example runners for

Image Generation + Editing Evals

This folder contains a lightweight vision eval harness plus example runners for image generation and image editing. The code mirrors the structure of examples/evals/realtime_evals/ so you can adapt it quickly.

Directory layout

  • vision_harness/: minimal shared library (types, runners, graders, evaluate loop)
  • generation_harness/: text-to-image evals (UI mockups + marketing flyer)
  • editing_harness/: image-edit evals (virtual try-on + logo edit)
  • shared/: reporting and optional rendering helpers

Quickstart

Python 3.9+ required.

pip install -r requirements.txt
export OPENAI_API_KEY="your_api_key"

Run a harness:

  • Generation: python generation_harness/run_imagegen_evals.py
  • Editing: python editing_harness/run_imagegen_evals.py

What the harness does

  1. Builds a small set of TestCase objects (prompt + criteria).
  2. Runs the image model for each case.
  3. Grades each output with an LLM judge using a strict JSON schema.
  4. Writes results and artifacts to results/<run_id>/.

The harness is intentionally small so you can copy/paste parts into your own production eval setup.

Example cases

Generation cases:

  • ui_checkout_mockup: mobile checkout screen with strict text + layout rules
  • coffee_flyer_generation: marketing flyer with exact copy constraints

Editing cases:

  • vto_jacket_tryon: virtual try-on with reference person + garment
  • logo_year_edit: precision logo text edit

Required assets (editing harness)

The editing harness expects these files in images/:

  • images/base_woman.png
  • images/jacket.png
  • images/logo_generation_1.png

Results layout

Each harness writes into its own results/ folder:

  • results/<run_id>/results.json: per-example outputs and grades
  • results/<run_id>/results.csv: tabular results
  • results/<run_id>/summary.json: aggregated metrics
  • results/<run_id>/artifacts/*.png: generated or edited images

Tip: keep results/ out of