Shaoteng Liu

I am a Third-year PhD student at CUHK, advised by Prof. Jiaya Jia.

I got my B.Eng. degree at XJTU. I was a research assistant at the Berkeley Artificial Intelligence Research (BAIR) Lab, working with Dequan Wang.

My research interests mainly focus on image/video generation and editing currently.

Email  /  Google Scholar  /  Github  /  Twitter

Work Experiences
Research Scientist Intern, 2024.5-
Advisor: Soo Ye Kim and Zhe Lin
PhD Candidate, 2021.7-
Advisor: Jiaya Jia
Research Assistant, 2019-2020
Advisor: Dequan Wang
Selected Research
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li*, Yuechen Zhang*, Chengyao Wang*, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

Preprint, 2024
arXiv / Project Page / 机器之心 / Demo / Model / Data / Code GitHub Repo stars

Mining potential of open-source VLMs! Mini-Gemini is a novel framework ranges from 2B to 34B VLMs for hi-resolution image understanding. It has a impressive OCR capability, and can generate HQ images powered by its multi-modal reasoning ability.

RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia

Preprint, 2024
arXiv / Project Page

The slow agent decomposes the task and determines "which actions" to learn. The fast agent writes code and RL configurations for low-level execution.

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu

ICLR, 2024
arXiv / Project Page / Video / Data / Code GitHub Repo stars

Rethinking the inversion process. Boosting Diffusion-based Editing with 3 Lines of Code.

Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

CVPR, 2024
arXiv / Project Page/ Code GitHub Repo stars

Add 'Lego' attribute to the child, an edited video is generated. Powered by a novel video inversion process and cross-attention control. We also find that a Decoupled-Guidance strategy is essential for video editing.

Self-supervised Learning by View Synthesis
Shaoteng Liu, Xiangyu Zhang, Tao Hu, Jiaya Jia

Preprint, 2022

A simple transformer is already able to synthesize high-quality images.

On-target Adaptation
Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell

Preprint, 2022

Target accuracy is the goal, so we argue for optimizing as much as possible on the target data.

Tent: Fully test-time adaptation by entropy minimization
Dequan Wang*, Evan Shelhamer* Shaoteng Liu, Bruno Olshausen, Trevor Darrell

ICLR (Spotlight), 2021
arxiv/ Code GitHub Repo stars

Tent equips a model to adapt itself to new and different data during testing.

Hyperbolic Visual Embedding Learning for Zero-Shot Recognition
Shaoteng Liu, Jingjing Chen, Liangming Pan, Chong-Wah Ngo, Tat-Seng Chua, Yu-Gang Jiang

CVPR, 2020
Paper / Code GitHub Repo stars

Learn visual representation in hyperbolic space for Zero-Shot Recognition.

Large Models & Software
Morph Studio
Core Team Member

Video Generation Foundation Model, 2023
Twitter / App / TechCrunch / 机器之心

Morph Studio, which has its own text-to-video model, just introduced an AI filmmaking platform. The eponymous tool takes the form of a storyboard, where users can create and edit shots by entering text prompts for different scenes and combine them into a cohesive narrative. It has partnership with Stability AI.

Selected Awards
  • Excellent Teaching Assistantship, CUHK, 2023

  • Hong Kong PhD Fellowship Scheme (HKPFS), 2021

  • Vice-Chancellor’s Scholarship, CUHK, 2021

  • Scientist Scholarship of China (top 1%), 2019

  • Top 10 Undergraduate of XJTU (top 0.1%), 2019

  • National Scholarship of China, 2018

engg5104 ENGG5104 | Image Processing and Computer Vision | 2023 Spring
ENGG2780A | Probability for Engineers | 2022 Spring
CSCI1540 | Computer Principles and C++ Programming | 2021 Fall

Last updated: Jun 2024
Web page design credit to Jon Barron and Julian