Shaoteng Liu

I am a Research Scientist at Adobe Research. My research interests lie in the fundamental performance of visual generation models and enhancing AI creativity. Recently, I have been exploring Interactive Video Generation and World Models.

I completed my Ph.D. at CUHK, advised by Prof. Jiaya Jia, where I worked on visual generation and multimodal LLMs. I was a research assistant at the BAIR, UC Berkeley, focusing on early-stage Test-Time Training (Tent).

💡 We are hiring self-motivated and creative interns. If you are interested in an Adobe internship or a university collaboration, please feel free to contact me.

profile photo
Selected Research Full List
Visual Generation
Rolling Sink arXiv 2026
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion
Haodong Li, Shaoteng Liu, Zhe Lin, Manmohan Chandraker
arXiv, 2026.

Rolling Sink effectively scales autoregressive video synthesis to ultra-long durations (5-30 minutes) at test time, with consistent subjects, stable colors, and smooth motions.

EditVerse
EditVerse ICLR 2026 Oral
EditVerse
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning
Xuan Ju, Tianyu Wang, Yuqian Zhou, He Zhang, Qing Liu, Nanxuan Zhao, Zhifei Zhang, Yijun Li, Yuanhao Cai, Shaoteng Liu, Daniil Pakhomov, Daniil Pakhomov, Zhe Lin, Soo Ye Kim, Qiang Xu
ICLR, 2026. Oral

EditVerse unifies a diverse range of generation and editing tasks for both images and videos within a single, powerful model.

GenProp
GenProp CVPR 2025
Generative Video Propagation
Generative Video Propagation
Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang, Qing Liu, Zhifei Zhang, Joon-Young Lee, Yijun Li, Bei Yu, Zhe Lin, Soo Ye Kim, Jiaya Jia
CVPR, 2025.

We demonstrate that through a careful design of a generative video propagation framework, various video tasks can be addressed in a unified way by leveraging the generative power of such models.

Jenga
Jenga NeurIPS 2025
Jenga
Training-Free Efficient Video Generation via Dynamic Token Carving
Yuechen Zhang, Jinbo Xing, Bin Xia, Shaoteng Liu, Bohao Peng, Xin Tao, Pengfei Wan, Eric Lo, Jiaya Jia
NeurIPS, 2025.

Jenga accelerates HunyuanVideo by 4.68-10.35x through dynamic attention carving and progressive resolution generation.

Video-P2P
Video-P2P CVPR 2024
Video-P2P
Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia
CVPR, 2024. Most Influential CVPR Papers (Paper Digest)

Add 'Lego' attribute to the child, an edited video is generated. Powered by a novel video inversion process and cross-attention control. We also find that a Decoupled-Guidance strategy is essential for video editing.

Direct Inversion
PnPInversion ICLR 2024
Direct Inversion
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu
ICLR, 2024.

Rethinking the inversion process. Boosting Diffusion-based Editing with 3 Lines of Code.

Multimodal LLMs
PS-VAE
PS-VAE arXiv 2026
PS-VAE
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing
Shilong Zhang, He Zhang, Zhifei Zhang, Chongjian Ge, Shuchen Xue, Shaoteng Liu, Mengwei Ren, Soo Ye Kim, Yuqian Zhou, Qing Liu, et al.
arXiv, 2026.

PS-VAE introduces a semantic-pixel reconstruction objective to regularize the latent space, enabling compression of both semantic information and fine-grained details into a compact representation for SOTA T2I and editing.

HBridge
HBridge CVPR 2026
HBridge
HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation
Xiang Wang, Zhifei Zhang, He Zhang, Zhe Lin, Yuqian Zhou, Qing Liu, Shiwei Zhang, Yijun Li, Shaoteng Liu, Haitian Zheng, Jason Kuen, Yuehuan Wang, Changxin Gao, Nong Sang
CVPR, 2026.

HBridge introduces an asymmetric H-shaped architecture that bridges heterogeneous experts through mid-layer semantic connections, achieving superior unified multimodal understanding and generation with lower training cost.

RL-GPT
RL-GPT NeurIPS 2024 Oral
RL-GPT
RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia
NeurIPS, 2024. Oral

The slow agent decomposes the task and determines "which actions" to learn. The fast agent writes code and RL configurations for low-level execution.

Mini-Gemini
Mini-Gemini TPAMI 2025
Mini-Gemini
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li*, Yuechen Zhang*, Chengyao Wang*, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia
TPAMI, 2025.

Mining potential of open-source VLMs! Mini-Gemini is a novel framework ranges from 2B to 34B VLMs for hi-resolution image understanding. It has an impressive OCR capability, and can generate HQ images powered by its multi-modal reasoning ability.

Industry Products
Frame Forward
Project Frame Forward

Project Frame Forward applies changes across entire videos based on one annotated frame and a simple text prompt, bringing the precision of photo editing to video.

Firefly I2V
Firefly I2V
Firefly Image-to-Video
Image-to-Video in Firefly

Adobe Firefly Image-to-Video turns static images into animated video clips with AI-powered motion, depth, and cinematic flair.

Academic Service
  • Conference Reviewer: CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, AAAI, SIGGRAPH, SIGGRAPH ASIA

  • Journal Reviewer: TPAMI, IJCV

  • Organizer: HiGen Workshop (CVPR 2026, ICCV 2025)

Honors & Achievements
  • Adobe MAX 2025 Sneaks, Adobe, 2025

  • Doctoral Consortium, ICCV, 2025

  • Most Influential CVPR Papers, Paper Digest, 2024

  • Excellent Teaching Assistantship, CUHK, 2023

  • Hong Kong PhD Fellowship Scheme (HKPFS), 2021

  • Vice-Chancellor's Scholarship, CUHK, 2021

  • Scientist Scholarship of China (top 1%), 2019

  • Top 10 Undergraduate of XJTU (top 0.1%), 2019

  • National Scholarship of China, 2018, 2019

Teaching
engg5104 ENGG5104 | Image Processing and Computer Vision | 2023 Spring
ENGG2780A | Probability for Engineers | 2022 Spring
CSCI1540 | Computer Principles and C++ Programming | 2021 Fall