SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

1Peking University 2The University of California, Merced 3Fudan University

We propose SemFlow, a unified framework that binds semantic segmentation and image synthesis via rectified flow.

Abstract

Semantic segmentation and semantic image synthesis are two representative tasks in visual perception and generation. While existing methods consider them as two distinct tasks, we propose a unified diffusion-based framework (SemFlow) and model them as a pair of reverse problems. Specifically, motivated by rectified flow theory, we train an ordinary differential equation (ODE) model to transport between the distributions of real images and semantic masks. As the training object is symmetric, samples belonging to the two distributions, images and semantic masks, can be effortlessly transferred reversibly. For semantic segmentation, our approach solves the contradiction between the randomness of diffusion outputs and the uniqueness of segmentation results. For image synthesis, we propose a finite perturbation approach to enhance the diversity of generated results without changing the semantic categories. Experiments show that our SemFlow achieves competitive results on semantic segmentation and semantic image synthesis tasks.

Visual Results

Visualizations for semantic segmentation.

Visualizations for semantic image synthesis.

Trajectory from z_0 to z_1 (semantic segmentation).

Trajectory from z_1 to z_0 (semantic image synthesis).

BibTeX

@article{wang2024semflow,
      author = {Wang, Chaoyang and Li, Xiangtai and Qi, Lu and Ding, Henghui and Tong, Yunhai and Yang, Ming-Hsuan},
      title = {SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow},
      journal = {arXiv preprint arXiv:2405.20282},
      year = {2024}
    }