My research focuses on post-training of multimodal foundation models and agents.

I received my M.S. from Nanjing University, advised by Prof. Tong Lu. Previously at Tencent Youtu Lab, collaborating with Dr. Xiaobin Hu and Prof. Ying Tai.

News

  • 2026.01: Two papers accepted to ICLR 2026 (MedLesionVQA, Human-MME).
  • 2025.12: Champion in CURE-Bench @ NeurIPS 2025 (1st out of 322 teams).
  • 2025.03: Four papers accepted to CVPR / ACM MM 2025 (Sonic, GroundingFace, HunyuanPortrait, DICE-Talk).
  • 2024.07: Three papers accepted to ECCV / IJCAI 2024 (DiffuMatting, UniM-OV3D).
  • 2022.03: Two papers accepted to CVPR / ECCV 2022 (ColorFormer, SGPN).
  • 2021.09: Three papers accepted to NeurIPS / AAAI / ECCV 2021 (S2K, FCA, AE-TextSpotter).
  • 2020.06: Champion in NTIRE 2020 Real-World SR Challenge @ CVPR.

Publications

Selected Publications

CURE-Bench @ NeurIPS 2025
sym

CureFlow: An AI Engine for Complex Medication Decision-Making

Xiaozhong Ji, Jinghao Lin, Yuhang Wu, Zihan Wang, Boyuan Jiang, Chao Gao

NeurIPS 2025 CURE-Bench Challenge | Poster | Leaderboard

TechRxiv 2025
sym

Good Teachers, Better Students: A Survey of Reward Models for LLM

Linhao Wang, Zihan Wang, Jinghao Lin, …, Xiaozhong Ji, Xiaobin Hu, et al.

TechRxiv 2025

arXiv 2025
sym

Align and Surpass Human Camouflaged Perception: Visual Refocus Reinforcement Fine-Tuning

Ruolin Shen, Xiaozhong Ji, Kai WU, Jiangning Zhang, Yijun He, HaiHua Yang, Xiaobin Hu, Xiaoyu Sun

arXiv 2025

CVPR 2025
sym

Sonic: Shifting focus to global audio perception in portrait animation

Xiaozhong Ji, Xiaobin Hu, Zhihong Xu, Junwei Zhu, Chuming Lin, Qingdong He, Jiangning Zhang, Donghao Luo, Yi Chen, Qin Lin, Qinglin Lu, Chengjie Wang

CVPR 2025 | GitHub

CVPR 2025
sym

HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation

Zunnan Xu, Zhentao Yu, Zixiang Zhou, Jun Zhou, Xiaoyu Jin, Fa-Ting Hong, Xiaozhong Ji, Junwei Zhu, Chengfei Cai, Shiyu Tang, Qin Lin, Xiu Li, Qinglin Lu

CVPR 2025 | GitHub

CVPR 2025
sym

GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model

Yue Han, Jiangning Zhang, Junwei Zhu, Runze Hou, Xiaozhong Ji, Chuming Lin, Xiaobin Hu, Zhucun Xue, Yong Liu

CVPR 2025

ECCV 2024
sym

DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation

Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, Jinlong Peng, Zhengkai Jiang, Jiangning Zhang, Taisong Jin, Chengjie Wang, Rongrong Ji

ECCV 2024 | GitHub

RWSR Challenge @ CVPR 2020
sym

Real-World Super-Resolution via Kernel Estimation and Noise Injection

Xiaozhong Ji, Yun Cao, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang

CVPR 2020 Workshop | GitHub | GitHub (Tencent) | Citations

Full Publication List
TechRxiv 2025
sym

The Landscape of Medical Agents: A Survey

Xiaobin Hu, Yunhang Qian, Jiaquan Yu, …, Xiaozhong Ji, …, Shuicheng Yan, et al.

TechRxiv 2025

arXiv 2025
sym

Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models

Yuansen Liu, Haiming Tang, Jinlong Peng, Jiangning Zhang, Xiaozhong Ji, Qingdong He, Wenbin Wu, Donghao Luo, Zhenye Gan, Junwei Zhu, et al.

arXiv 2025

arXiv 2025
sym

Med-CMR: A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multimodal Reasoning

Haozhen Gong, Xiaozhong Ji, Yuansen Liu, Wenbin Wu, Xiaoxiao Yan, Jingjing Liu, Kai Wu, Jiazhen Pan, Bailiang Jian, Jiangning Zhang, et al.

arXiv 2025

IJCAI 2024
sym

UniM-OV3D: Uni-Modality Open-Vocabulary 3D Scene Understanding with Fine-Grained Feature Representation

Qingdong He, Jinlong Peng, Zhengkai Jiang, Kai Wu, Xiaozhong Ji, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Mingang Chen, Yunsheng Wu

IJCAI 2024 | GitHub

ACM MM 2025
sym

DICE-Talk: Disentangle Identity, Cooperate Emotion for Correlation-Aware Emotional Talking Portrait Generation

Weipeng Tan, Chuming Lin, Chengming Xu, FeiFan Xu, Xiaobin Hu, Xiaozhong Ji, Junwei Zhu, Chengjie Wang, Yanwei Fu

ACM MM 2025 | GitHub | Project

arXiv 2024
sym

MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning

Yue Han, Junwei Zhu, Yuxiang Feng, Xiaozhong Ji, Keke He, Xiangtai Li, Zhucun Xue, Yong Liu

arXiv 2024

arXiv 2024
sym

RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

Xiaozhong Ji, Chuming Lin, Zhonggan Ding, Ying Tai, Junwei Zhu, Xiaobin Hu, Donghao Luo, Yanhao Ge, Chengjie Wang

arXiv 2024

ECCV 2022
sym

ColorFormer: Image Colorization via Color Memory Assisted Hybrid-Attention Transformer

Xiaozhong Ji, Boyuan Jiang, Donghao Luo, Guangpin Tao, Wenqing Chu, Zhifeng Xie, Chengjie Wang, Ying Tai

ECCV 2022 | GitHub | GitHub (Tencent)

CVPR 2022
sym

Blind Face Restoration via Integrating Face Shape and Generative Priors

Feida Zhu, Junwei Zhu, Wenqing Chu, Xinyi Zhang, Xiaozhong Ji, Chengjie Wang, Ying Tai

CVPR 2022

NeurIPS 2021
sym

Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution

Guangpin Tao, Xiaozhong Ji, Wenzhuo Wang, Shuo Chen, Chuming Lin, Yun Cao, Tong Lu, Donghao Luo, Ying Tai

NeurIPS 2021

AAAI 2021
sym

Frequency Consistent Adaptation for Real World Super Resolution

Xiaozhong Ji, Guangpin Tao, Yun Cao, Ying Tai, Tong Lu, Chengjie Wang, Jilin Li, Feiyue Huang

AAAI 2021

ECCV 2020
sym

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, ZhiBo Yang, Tong Lu, Chunhua Shen, Ping Luo

ECCV 2020

Awards & Honors

  • Champion, CURE-Bench @ NeurIPS 2025 (November 2025) 1st place out of 322 teams, both internal reasoning and agentic reasoning tracks. Competition Website | NeurIPS Page

  • Outstanding Master’s Thesis Award, Nanjing University (2021) Thesis: “Key Techniques for Super-Resolution of Blurred Data in Real-World Scenarios”

  • Champion, NTIRE 2020 Real-World Super-Resolution Challenge @ CVPR (June 2020) 1st place in both Track 1 and Track 2. Competition Website | Code