CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model
Aoran Xiao*1
Weihao Xuan*2
Heli Qi3
Yun Xing1
Ruijie Ren4
Xiaoqin Zhang5
Ling Shao6
Shijian Lu1
Nanyang Technological University1
The University of Tokyo2
Nara Institute of Science and Technology3
Waseda University4
Wenzhou University5
UCAS-Terminus AI Lab, UCAS6
*Equally contributing first authors
[arXiv]
[code]
[video]

Abstract

The Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, it often struggles in domains that are either sparsely represented or lie outside its training distribution, such as aerial, medical, and non-RGB images. Recent efforts have predominantly focused on adapting SAM to these domains using fully supervised methods, which necessitate large amounts of annotated training data and pose practical challenges in data collection. This paper presents CAT-SAM, a ConditionAl Tuning network that explores \textit{few-shot} adaptation of SAM toward various challenging downstream domains in a data-efficient manner. The core design is a \textit{prompt bridge} structure that enables \textit{decoder-conditioned joint tuning} of the heavyweight image encoder and the lightweight mask decoder. The bridging maps the domain-specific features of the mask decoder to the image encoder, fostering synergic adaptation of both components with mutual benefits with few-shot target samples only, ultimately leading to superior segmentation in various downstream tasks. We develop two CAT-SAM variants that adopt two tuning strategies for the image encoder: one injecting learnable prompt tokens in the input space and the other inserting lightweight adapter networks. Extensive experiments over 11 downstream tasks show that CAT-SAM achieves superior segmentation consistently even under the very challenging one-shot adaptation setup.


Citation

		@article{xiao2024cat,
		  title={CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model},
		  author={Xiao, Aoran and Xuan, Weihao and Qi, Heli and Xing, Yun and Ren, Ruijie and Zhang, Xiaoqin and Ling, Shao and Lu, Shijian},
		  journal={arXiv preprint arXiv:2402.03631},
		  year={2024}
		}