Shaoteng Liu

I am a third-year PhD student at CUHK, advised by Prof. Jiaya Jia.

I hold a B.Eng. from XJTU and previously worked as a research assistant at the BAIR Lab with Dequan Wang.

My research interests are in LLMs, VLMs, and AIGC including applications such as image/video editing and manipulation.

Email / Google Scholar / Github / Twitter

Work Experiences

	Adobe Research Research Scientist Intern, 2024.5- Advisor: Soo Ye Kim and Zhe Lin
	The Chinese University of Hong Kong PhD Candidate, 2021.7- Advisor: Jiaya Jia
	Berkeley Artificial Intelligence Research (BAIR) Research Assistant, 2019-2020 Advisor: Dequan Wang

Selected Research

	Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia Preprint*, 2024 arXiv / Project Page / 机器之心 / Demo / Model / Data / Code Mining potential of open-source VLMs! Mini-Gemini is a novel framework ranges from 2B to 34B VLMs for hi-resolution image understanding. It has a impressive OCR capability, and can generate HQ images powered by its multi-modal reasoning ability.
	RL-GPT: Integrating Reinforcement Learning and Code-as-policy Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia NeurIPS (Oral), 2024 arXiv / Project Page The slow agent decomposes the task and determines "which actions" to learn. The fast agent writes code and RL configurations for low-level execution.
	Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu ICLR, 2024 arXiv / Project Page / Video / Data / Code Rethinking the inversion process. Boosting Diffusion-based Editing with 3 Lines of Code.
	Video-P2P: Video Editing with Cross-attention Control Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia CVPR, 2024 arXiv / Project Page/ Twitter/ Code Add 'Lego' attribute to the child, an edited video is generated. Powered by a novel video inversion process and cross-attention control. We also find that a Decoupled-Guidance strategy is essential for video editing.
	Self-supervised Learning by View Synthesis Shaoteng Liu, Xiangyu Zhang, Tao Hu, Jiaya Jia Preprint, 2022 arXiv A simple transformer is already able to synthesize high-quality images.
	On-target Adaptation Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell Preprint, 2022 arxiv Target accuracy is the goal, so we argue for optimizing as much as possible on the target data.
	Tent: Fully test-time adaptation by entropy minimization Dequan Wang, Evan Shelhamer Shaoteng Liu, Bruno Olshausen, Trevor Darrell ICLR (Spotlight), 2021 arxiv/ Code Tent equips a model to adapt itself to new and different data during testing.
	Hyperbolic Visual Embedding Learning for Zero-Shot Recognition Shaoteng Liu, Jingjing Chen, Liangming Pan, Chong-Wah Ngo, Tat-Seng Chua, Yu-Gang Jiang CVPR, 2020 Paper / Code Learn visual representation in hyperbolic space for Zero-Shot Recognition.

Large Models & Software

Morph Studio
Core Team Member

Video Generation Foundation Model, 2023
Twitter / App / TechCrunch / 机器之心

Morph Studio, which has its own text-to-video model, just introduced an AI filmmaking platform. The eponymous tool takes the form of a storyboard, where users can create and edit shots by entering text prompts for different scenes and combine them into a cohesive narrative. It has partnership with Stability AI.

Selected Awards

Excellent Teaching Assistantship, CUHK, 2023
Hong Kong PhD Fellowship Scheme (HKPFS), 2021
Vice-Chancellor’s Scholarship, CUHK, 2021
Scientist Scholarship of China (top 1%), 2019
Top 10 Undergraduate of XJTU (top 0.1%), 2019
National Scholarship of China, 2018

Teaching

Last updated: Jun 2024
Web page design credit to Jon Barron and Julian