Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning

May 28, 2026·
Ciara Rowles
,
Reshinth Adithyan
,
Nikhil Pinnaparaju
,
Vikram Voleti
Mark Boss
Mark Boss
· 0 min read
Abstract
We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision by fine-tuning a pretrained layer decomposition model using only feedback from a vision-language model (VLM). Starting from Qwen-Image-Layered, we apply Flow-GRPO with LoRA adaptation, sampling multiple candidate decompositions per image, scoring them with a VLM, and optimising the policy from group-relative advantages. Trained entirely on unlabelled images, Stable-Layers produces decompositions with stronger layer separation, fewer blank or artifact-heavy layers, and lower per-layer reconstruction error on the Crello dataset compared to the base model.
Type
Publication
arXiv preprint
publications
Mark Boss
Authors
Co-Head of 3D & Image
I’m a research lead in 3D at Stability AI with research interests in the intersection of machine learning and computer graphics.