Shaoteng Liu

I am a Research Scientist at Adobe Research. I completed my Ph.D. at CUHK and was a research assistant at the BAIR, Berkeley. My research interests lie in VLMs, Agents, and AIGC, including applications such as image and video generation, editing, and manipulation.

We are hiring self-motivated and creative interns. If you are interested in an Adobe internship or a university collaboration, please feel free to contact me.

Selected Research Full List

	Generative Video Propagation Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang, Qing Liu, Zhifei Zhang, Joon-Young Lee, Yijun Li, Bei Yu, Zhe Lin, Soo Ye Kim, Jiaya Jia CVPR, 2025. arXiv / Project Page / Video / Data / Adobe Firefly Adobe News / Twitter / 机器之心 We demonstrate that through a careful design of a generative video propagation framework, various video tasks can be addressed in a unified way by leveraging the generative power of such models.
	EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning Xuan Ju, Tianyu Wang, Yuqian Zhou, He Zhang, Qing Liu, Nanxuan Zhao, Zhifei Zhang, Yijun Li, Yuanhao Cai, Shaoteng Liu, Daniil Pakhomov, Daniil Pakhomov, Zhe Lin, Soo Ye Kim, Qiang Xu Preprint, 2025. arXiv / Project Page / Code EditVerse unifies a diverse range of generation and editing tasks for both images and videos within a single, powerful model.
	Training-Free Efficient Video Generation via Dynamic Token Carving Yuechen Zhang, Jinbo Xing, Bin Xia, Shaoteng Liu, Bohao Peng, Xin Tao, Pengfei Wan, Eric Lo, Jiaya Jia NeurIPS, 2025, arXiv / Project Page / Code Jenga accelerates HunyuanVideo by 4.68-10.35× through dynamic attention carving and progressive resolution generation.
	Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia Preprint*, 2024. arXiv / Project Page / 机器之心 / Demo / Model / Data / Code Mining potential of open-source VLMs! Mini-Gemini is a novel framework ranges from 2B to 34B VLMs for hi-resolution image understanding. It has a impressive OCR capability, and can generate HQ images powered by its multi-modal reasoning ability.
	RL-GPT: Integrating Reinforcement Learning and Code-as-policy Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia NeurIPS, 2024. Oral arXiv / Project Page The slow agent decomposes the task and determines "which actions" to learn. The fast agent writes code and RL configurations for low-level execution.
	Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu ICLR, 2024. arXiv / Project Page / Video / Data / Code Rethinking the inversion process. Boosting Diffusion-based Editing with 3 Lines of Code.
	Video-P2P: Video Editing with Cross-attention Control Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia CVPR, 2024. Most Influential CVPR Papers (Paper Digest ) arXiv / Project Page/ Twitter/ Code Add 'Lego' attribute to the child, an edited video is generated. Powered by a novel video inversion process and cross-attention control. We also find that a Decoupled-Guidance strategy is essential for video editing.
	Tent: Fully test-time adaptation by entropy minimization Dequan Wang, Evan Shelhamer Shaoteng Liu, Bruno Olshausen, Trevor Darrell ICLR, 2021. Spotlight arxiv/ Code Tent equips a model to adapt itself to new and different data during testing.