Explore In-Context Segmentation via Latent Diffusion Models

1Peking University 2S-Lab, Nanyang Technological University 3Skywork AI 4The University of California, Merced 5Zhejiang University

We propose Ref LDM-Seg, a minimalist LDM framework for in-context segmentation.

Video

Abstract

In-context segmentation has drawn more attention with the introduction of vision foundation models. Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries. In this work, we explore this problem from a new perspective, using one representative generation model, the latent diffusion model (LDM). We observe a task gap between generation and segmentation in diffusion models, but LDM is still an effective minimalist for in-context segmentation. In particular, we propose two meta-architectures and correspondingly design several output alignment and optimization strategies. We have conducted comprehensive ablation studies and empirically found that the segmentation quality counts on output alignment and in-context instructions. Moreover, we build a new and fair in-context segmentation benchmark that includes both image and video datasets. Experiments validate the efficiency of our approach, demonstrating comparable or even stronger results than previous specialist models or visual foundation models. Our study shows that LDMs can also achieve good enough results for challenging in-context segmentation tasks.

Ref LDM-Seg Framework

Ref LDM-Seg operates as a minimalist and generates the mask under the guidance of in-context instructions. The two variants differ in input formulation, denoising time steps, and optimization target.

Visual Results

The output of Ref LDM-Seg varies based on the in-context instructions.

Visualizations at different time steps.

BibTeX

@article{RefLDMSeg,
      title={Explore In-Context Segmentation via Latent Diffusion Models},
      author={Wang, Chaoyang and Li, Xiangtai and Ding, Henghui and Qi, Lu and Zhang, Jiangning and Tong, Yunhai and Loy, Chen Change and Yan, Shuicheng},
      journal={arXiv preprint arXiv:2403.09616},
      year={2024}
    }