Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning
An RL framework that fine-tunes image layer decomposition models using VLM-as-judge rewards, eliminating paired supervision.

An RL framework that fine-tunes image layer decomposition models using VLM-as-judge rewards, eliminating paired supervision.
A rotation-preconditioned KV cache codec that jointly quantizes coordinate triplets via an octahedral map, achieving state-of-the-art compression across text, video, and audio …
A unified feed-forward pipeline for relightable 3D reconstruction from sparse views in under one second.
Distribution matching is central to many vision and graphics tasks, where the widely used Wasserstein distance is too costly to compute for high dimensional distributions. The …
We present Stable Video Materials 3D (SViM3D), a framework to predict multi-view consistent physically based rendering (PBR) materials, given a single image. Recently, video …
Editing materials of objects in images based on exemplar images is an active area of research in computer vision and graphics. We propose MARBLE, a method for performing material …
We present Stable Virtual Camera (Seva), a generalist diffusion model that creates novel views of a scene, given any number of input views and target cameras. Existing works …
We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. Regression methods …
We present SF3D, a novel method for rapid and high-quality textured mesh reconstruction from a single image. Utilizing the Large Reconstruction Model (LRM) as its foundation, SF3D …