Editing Harness
Best for: high-precision checks where edits must be exact and non-target content should remain unchanged.
This harness runs two example cases:
vto_jacket_tryon: virtual try-on using reference person + garmentlogo_year_edit: precision logo text edit
Run
python editing_harness/run_imagegen_evals.py
Run a single case (use --cases):
python editing_harness/run_imagegen_evals.py --cases vto_jacket_tryon
Run multiple cases (comma-separated, no spaces):
python editing_harness/run_imagegen_evals.py --cases vto_jacket_tryon,logo_year_edit
Flags (when to use them)
--cases: limit runs to specific case ids. Valid values:vto_jacket_tryonlogo_year_edit
--model: image model under test (defaults togpt-image-1.5).--judge-model: LLM used to grade outputs (defaults togpt-5.2).--num-images: number of edited images to generate per case (defaults to 1).
Inputs
This harness expects the following reference images in images/:
images/base_woman.pngimages/jacket.pngimages/logo_generation_1.png
Outputs
Results are written under editing_harness/results/<run_id>/:
results.jsonandresults.csvwith per-example scoressummary.jsonwith aggregated metricsartifacts/for edited images
Adapting the harness
- Replace the input images and edit prompts in
run_imagegen_evals.py. - Update the schemas and verdict rules to fit your workflow.