DataEvolver is an autonomous synthetic data construction framework where goal-driven loop agents orchestrate the full pipeline — from text-to-image generation through 3D reconstruction to scene-aware rendering — and iteratively refine output quality by reading VLM feedback, diagnosing visual failures, and adjusting rendering parameters until the data meets production standards.
Naive automated rendering produces artifacts — flat lighting, color shifts, floating objects, missing shadows. Traditional pipelines rely on rigid scoring rules that lack semantic understanding. Manual tuning doesn't scale. What's needed are goal-driven agents that can perceive, diagnose, and act — closing the loop between data generation and quality control.
Auto-rendered 3D objects often exhibit flat lighting, implausible shadows, and color mismatches with the scene environment.
Rigid numeric thresholds can't diagnose "this lighting feels flat" or "the object appears to float." Goal-driven agents with VLM perception can.
Human artists spend minutes per object adjusting Blender parameters. At 50+ objects with 8+ viewpoints, this becomes infeasible.
From a natural language seed concept to quality-verified rendered pairs — fully automated by goal-driven loop agents, no human intervention required between stages.
The heart of DataEvolver: not a scripted parameter scan, but a goal-driven loop agent that perceives rendered outputs via VLM review, diagnoses semantic issues like "flat lighting" or "weak grounding," selects targeted rendering adjustments from a structured action space, and repeats until quality goals are met.
Sign-flip tracking (freeze after 3 flips), dead-zone detection, and step-scale scheduling (Round 0: 100%, Round 1: 70%, Round 2: 50%, Round 3+: 40%, with score-adaptive ×1.2 boost when hybrid_score < 0.65) prevent infinite loops and parameter thrashing.
Objects placed in real Blender scenes with HDRI environments. Raycast ground detection ensures physical plausibility. Original scene lighting preserved — no artificial studio setups.
Structured action space across 5 groups: lighting (key intensity & yaw), object (elevation, yaw, scale), scene (env rotation, env intensity, contact shadow), and material (saturation, value, hue, roughness, specular/sheen).
A benchmark dataset for rotation-conditioned image editing. Each sample pairs a canonical front-view image with a target view specified in natural language.
import json from pathlib import Path from PIL import Image root = Path("dataset_scene_v7_full50_rotation8_trainready_front2others_splitobj_seed42_final_20260410") rows = [] with (root / "pairs/train_pairs.jsonl").open("r") as f: for line in f: rows.append(json.loads(line)) row = rows[0] source = Image.open(root / row["source_image"]).convert("RGB") target = Image.open(root / row["target_image"]).convert("RGB") instruction = row["instruction"]
Best VLM-gated renders across 7 diverse Blender scenes. Each object is automatically placed, lit, and iteratively refined by goal-driven loop agents until quality standards are met.
The AI agent selects from a discrete, structured action space to address VLM-identified issues. Each action targets a specific rendering parameter.
×1.2 / ×0.8 multiplicative, bounded [0.5, 2.0]
±15° step, bounded [-90°, 90°]
×1.2 / ×0.8 multiplicative, bounded [0.5, 2.0]
±30° step, bounded [-180°, 180°]
±0.02 step, bounded [-0.1, 0.1]
±0.08 step, bounded [-0.3, 0.6]
+ 18 more actions covering object yaw, object scale, contact shadow, color saturation, value/brightness, hue offset, specular/sheen, and more — see scene_action_space.json
What sets DataEvolver apart from other synthetic data pipelines and why goal-driven loop agents produce better training data.
Free-form natural language feedback provides semantic diagnosis that numeric scores cannot. The reviewer identifies why a render fails, not just that it fails.
The agent reads raw review text and reasons about which action to take — no scripted rule engine, no score-threshold mapping. Full decision autonomy with bounded safety constraints.
LoRA fine-tuning on DataEvolver-Rotate demonstrably improves Qwen Image Edit 2511 on PSNR, SSIM, and LPIPS vs. the base model, confirming real training utility.
Beyond RGB pairs: mask, depth, normal maps, and geometry metadata — enabling research on multi-modal conditioning for image editing models.
The models, frameworks, and infrastructure powering the DataEvolver pipeline.
DataEvolver is designed to run on a Linux server with GPU access and Blender. Clone the repo and start building self-improving data pipelines.
git clone https://github.com/Kamisato520/DataEvolver.git cd DataEvolver # Explore the pipeline ls pipeline/ # 6-stage data synthesis (stage1 → stage5) ls configs/ # Action space, scene templates, dataset profiles ls scripts/ # Agent monitor, dataset builders, eval tools # Read the full documentation cat CLAUDE.md # Comprehensive project guide
If you use DataEvolver or DataEvolver-Rotate in your research, please cite our work.
@misc{dataevolver2026,
title = {{DataEvolver}: Let Your Data Build and Improve
Itself via Goal-Driven Loop Agents},
author = {Zhang, Qisong and Wu, Wenzhuo},
year = {2026},
howpublished = {\url{https://github.com/Kamisato520/DataEvolver}},
note = {Technical Report}
}