Shaoteng Liu

I am a Third-year PhD student at CUHK, advised by Prof. Jiaya Jia.

I got my B.Eng. degree at XJTU. I was a research assistant at the Berkeley Artificial Intelligence Research (BAIR) Lab, working with Dequan Wang.

My research interests mainly focus on image/video generation and editing currently.

Email  /  Google Scholar  /  Github

Selected Research
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li*, Yuechen Zhang*, Chengyao Wang*, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

Preprint, 2024
arXiv / Project Page / 机器之心 / Demo / Model / Data / Code GitHub Repo stars

Mining potential of open-source VLMs! Mini-Gemini is a novel framework ranges from 2B to 34B VLMs for hi-resolution image understanding. It has a impressive OCR capability, and can generate HQ images powered by its multi-modal reasoning ability.

RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia

Preprint, 2024
arXiv / Project Page

The slow agent decomposes the task and determines "which actions" to learn. The fast agent writes code and RL configurations for low-level execution.

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu

ICLR, 2024
arXiv / Project Page / Video / Data / Code GitHub Repo stars

Rethinking the inversion process. Boosting Diffusion-based Editing with 3 Lines of Code.

Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

CVPR, 2024
arXiv / Project Page/ Code GitHub Repo stars

Add 'Lego' attribute to the child, an edited video is generated. Powered by a novel video inversion process and cross-attention control. We also find that a Decoupled-Guidance strategy is essential for video editing.

MR-NeuS: Learning Neural Implicit Surfaces with Multiple Radiance Fields
Shaoteng Liu, Sida Peng, Tao Hu, Xiangyu Zhang, Jiaya Jia

Preprint, 2022

When adding an extra radiance field, the man sculpture becomes clearer.

Self-supervised Learning by View Synthesis
Shaoteng Liu, Xiangyu Zhang, Tao Hu, Jiaya Jia

Preprint, 2022
arXiv

A simple transformer is already able to synthesize high-quality images.

On-target Adaptation
Dequan Wang, Shaoteng Liu, Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell

Preprint, 2022
arxiv

Target accuracy is the goal, so we argue for optimizing as much as possible on the target data.

Tent: Fully test-time adaptation by entropy minimization
Dequan Wang*, Evan Shelhamer* Shaoteng Liu, Bruno Olshausen, Trevor Darrell

ICLR (Spotlight), 2021
arxiv/ Code GitHub Repo stars

Tent equips a model to adapt itself to new and different data during testing.

Hyperbolic Visual Embedding Learning for Zero-Shot Recognition
Shaoteng Liu, Jingjing Chen, Liangming Pan, Chong-Wah Ngo, Tat-Seng Chua, Yu-Gang Jiang

CVPR, 2020
Paper / Code GitHub Repo stars

Learn visual representation in hyperbolic space for Zero-Shot Recognition.

Selected Awards
  • Hong Kong PhD Fellowship Scheme (HKPFS), 2021

  • Vice-Chancellor’s Scholarship, CUHK, 2021

  • Scientist Scholarship of China (top 1%), 2019

  • Top 10 Undergraduate of XJTU (top 0.1%), 2019

  • National Scholarship of China, 2018

Teaching
ENGG5104 | Image Processing and Computer Vision | 2023 Spring | Excellent Teaching Assistantship
ENGG2780A | Probability for Engineers | 2022 Spring
CSCI1540 | Computer Principles and C++ Programming | 2021 Fall

Last updated: Mar 2023
Web page design credit to Jon Barron and Julian