Shaoteng Liu
I am a third-year PhD student at CUHK, advised by Prof. Jiaya Jia.
I hold a B.Eng. from XJTU and previously worked as a research assistant at the BAIR Lab with Dequan Wang.
My research interests are in LLMs, VLMs, and AIGC including applications such as image/video editing and manipulation.
Email  / 
Google Scholar  / 
Github  / 
Twitter
|
|
|
Adobe Research
Research Scientist Intern, 2024.5-
Advisor: Soo Ye Kim and Zhe Lin
|
|
The Chinese University of Hong Kong
PhD Candidate, 2021.7-
Advisor: Jiaya Jia
|
|
Berkeley Artificial Intelligence Research (BAIR)
Research Assistant, 2019-2020
Advisor: Dequan Wang
|
|
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Yanwei Li*, Yuechen Zhang*, Chengyao Wang*, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia
Preprint, 2024
arXiv /
Project Page /
机器之心 /
Demo /
Model /
Data /
Code
Mining potential of open-source VLMs! Mini-Gemini is a novel framework ranges from 2B to 34B VLMs for hi-resolution image understanding. It has a impressive OCR capability, and can generate HQ images powered by its multi-modal reasoning ability.
|
|
RL-GPT: Integrating Reinforcement Learning and Code-as-policy
Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia
NeurIPS (Oral), 2024
arXiv /
Project Page
The slow agent decomposes the task and determines "which actions" to learn. The fast agent writes code and RL configurations for low-level execution.
|
|
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu
ICLR, 2024
arXiv /
Project Page /
Video /
Data /
Code
Rethinking the inversion process. Boosting Diffusion-based Editing with 3 Lines of Code.
|
|
Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu, Yuechen Zhang, Wenbo Li,
Zhe Lin, Jiaya Jia
CVPR, 2024
arXiv /
Project Page/
Twitter/
Code
Add 'Lego' attribute to the child, an edited video is generated. Powered by a novel video inversion process and cross-attention control. We also find that a Decoupled-Guidance strategy is essential for video editing.
|
|
Self-supervised Learning by View Synthesis
Shaoteng Liu,
Xiangyu Zhang, Tao Hu, Jiaya Jia
Preprint, 2022
arXiv
A simple transformer is already able to synthesize high-quality images.
|
|
On-target Adaptation
Dequan Wang,
Shaoteng Liu,
Sayna Ebrahimi, Evan Shelhamer, Trevor Darrell
Preprint, 2022
arxiv
Target accuracy is the goal, so we argue for optimizing as much as possible on the target data.
|
|
Tent: Fully test-time adaptation by entropy minimization
Dequan Wang*, Evan Shelhamer*
Shaoteng Liu,
Bruno Olshausen, Trevor Darrell
ICLR (Spotlight), 2021
arxiv/
Code
Tent equips a model to adapt itself to new and different data during testing.
|
|
Hyperbolic Visual Embedding Learning for Zero-Shot Recognition
Shaoteng Liu,
Jingjing Chen, Liangming Pan, Chong-Wah Ngo, Tat-Seng Chua, Yu-Gang Jiang
CVPR, 2020
Paper
/
Code
Learn visual representation in hyperbolic space for Zero-Shot Recognition.
|
|
Morph Studio
Core Team Member
Video Generation Foundation Model, 2023
Twitter /
App /
TechCrunch /
机器之心
Morph Studio, which has its own text-to-video model, just introduced an AI filmmaking platform. The eponymous tool takes the form of a storyboard, where users can create and edit shots by entering text prompts for different scenes and combine them into a cohesive narrative. It has partnership with Stability AI.
|
Excellent Teaching Assistantship, CUHK, 2023
Hong Kong PhD Fellowship Scheme (HKPFS), 2021
Vice-Chancellor’s Scholarship, CUHK, 2021
Scientist Scholarship of China (top 1%), 2019
Top 10 Undergraduate of XJTU (top 0.1%), 2019
National Scholarship of China, 2018
|
ENGG5104 | Image Processing and Computer Vision | 2023 Spring
ENGG2780A | Probability for Engineers | 2022 Spring
CSCI1540 | Computer Principles and C++ Programming | 2021 Fall
|
|