Shaoteng Liu

I am an Incoming Research Scientist at Adobe Research. I completed my Ph.D. at CUHK and was a research assistant at the BAIR, Berkeley. My research interests lie in VLMs, Agents, and AIGC, including applications such as image and video generation, editing, and manipulation.

We are hiring self-motivated and creative interns. If you are interested in an Adobe internship or a university collaboration, please feel free to contact me.

profile photo
Selected Research Full List
Generative Video Propagation
Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang, Qing Liu, Zhifei Zhang, Joon-Young Lee, Yijun Li, Bei Yu, Zhe Lin, Soo Ye Kim, Jiaya Jia
CVPR, 2025.
arXiv / Project Page / Video / Data / Adobe Firefly
Adobe News / Twitter / 机器之心

We demonstrate that through a careful design of a generative video propagation framework, various video tasks can be addressed in a unified way by leveraging the generative power of such models.

Training-Free Efficient Video Generation via Dynamic Token Carving
Yuechen Zhang, Jinbo Xing, Bin Xia, Shaoteng Liu, Bohao Peng, Xin Tao, Pengfei Wan, Eric Lo, Jiaya Jia
Preprint, 2025,
arXiv / Project Page / Code GitHub Repo stars

Jenga accelerates HunyuanVideo by 4.68-10.35× through dynamic attention carving and progressive resolution generation.

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li*, Yuechen Zhang*, Chengyao Wang*, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia
Preprint, 2024.
arXiv / Project Page / 机器之心 / Demo / Model / Data / Code GitHub Repo stars

Mining potential of open-source VLMs! Mini-Gemini is a novel framework ranges from 2B to 34B VLMs for hi-resolution image understanding. It has a impressive OCR capability, and can generate HQ images powered by its multi-modal reasoning ability.

RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia
NeurIPS, 2024. Oral
arXiv / Project Page

The slow agent decomposes the task and determines "which actions" to learn. The fast agent writes code and RL configurations for low-level execution.

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu
ICLR, 2024.
arXiv / Project Page / Video / Data / Code GitHub Repo stars

Rethinking the inversion process. Boosting Diffusion-based Editing with 3 Lines of Code.

Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia
CVPR, 2024. Most Influential CVPR Papers (Paper Digest )
arXiv / Project Page/ Twitter/ Code GitHub Repo stars

Add 'Lego' attribute to the child, an edited video is generated. Powered by a novel video inversion process and cross-attention control. We also find that a Decoupled-Guidance strategy is essential for video editing.

Tent: Fully test-time adaptation by entropy minimization
Dequan Wang*, Evan Shelhamer* Shaoteng Liu, Bruno Olshausen, Trevor Darrell
ICLR, 2021. Spotlight
arxiv/ Code GitHub Repo stars

Tent equips a model to adapt itself to new and different data during testing.

Selected Awards
  • Excellent Teaching Assistantship, CUHK, 2023

  • Hong Kong PhD Fellowship Scheme (HKPFS), 2021

  • Vice-Chancellor’s Scholarship, CUHK, 2021

  • Scientist Scholarship of China (top 1%), 2019

  • Top 10 Undergraduate of XJTU (top 0.1%), 2019

  • National Scholarship of China, 2018

Teaching
engg5104 ENGG5104 | Image Processing and Computer Vision | 2023 Spring
ENGG2780A | Probability for Engineers | 2022 Spring
CSCI1540 | Computer Principles and C++ Programming | 2021 Fall

Last updated: July 2025
Web page design credit to Jon Barron