Conditional Panoramic Image Generation via Masked Autoregressive Modeling

1Peking University 2Insta360 Research 3National University of Singapore 4Jilin University

Video

Abstract

Recent progress in panoramic image generation has underscored two critical limitations in existing approaches. First, most methods are built upon diffusion models, which are inherently ill-suited for equirectangular projection (ERP) panoramas due to the violation of the identically and independently distributed (i.i.d.) Gaussian noise assumption caused by their spherical mapping. Second, these methods often treat text-conditioned generation (text-to-panorama) and image-conditioned generation (panorama outpainting) as separate tasks, relying on distinct architectures and task-specific data. In this work, we propose a unified framework, Panoramic AutoRegressive model (PAR), which leverages masked autoregressive modeling to address these challenges. PAR avoids the i.i.d. assumption constraint and integrates text and image conditioning into a cohesive architecture, enabling seamless generation across tasks. To address the inherent discontinuity in existing generative models, we introduce circular padding to enhance spatial coherence and propose a consistency alignment strategy to improve generation quality. Extensive experiments demonstrate competitive performance in text-to-image generation and panorama outpainting tasks while showcasing promising scalability and generalization capabilities.

PAR Framework

PAR utilizes a transformer to predict regions obscured by mask, then employs these predictions as conditions to drive an MLP in generating continuous tokens. Both original and augmented triples are processed by the same model, and then aligned through a consistency loss.

Visual Results

Visual results on several tasks

Visualizations for text to panorama

Visualizations for panoramic image editing

BibTeX

@article{wang2025conditional,
      title={Conditional Panoramic Image Generation via Masked Autoregressive Modeling},
      author={Wang, Chaoyang and Li, Xiangtai and Qi, Lu and Lin, Xiaofan and Bai, Jinbin and Zhou, Qianyu and Tong, Yunhai},
      journal={arXiv preprint arXiv:2505.16862},
      year={2025}
    }