About Me

Hi there! I'm an incoming PhD student at Purdue University Purdue Logo, Department of Computer Science, advised by Dr. Ruqi Zhang. I obtained my B.S. degree at the School of Mathematics, Tianjin University TJU Logo. Previously, I worked as a research assistant in the MLDM Lab's Multimodal Vision Processing (MVP) Group, under the guidance of Dr. Bing Cao.

My research interests lie in developing reliable machine learning algorithms and frameworks for real-world applications, with a particular focus on the alignment of Large Foundation Models (LLMs and VLMs) and the generalization of multimodal learning algorithms.

Research Interests

  • Multimodal Learning: Multimodal Fusion, Imbalanced Multimodal Learning.
  • Alignment of Foundation Models: LLMs, VLMs.
  • Trustworthy AI: Safety, Uncertainty, etc.
I am very excited about potential collaboration opportunities! If you share similar research interests and find my work interesting, I warmly welcome you to contact me via email!

Latest News

May 2025

🎉 Yi will give a talk about VLM safety at Shenlan School!

Apr 2025

🎉 Yi serves as Reviewer of NeurIPS 2025!

Jan 2025

🎉 Our paper, dataset, and models about VLM Multi-Image Safety (MIS) are released now!

Jan 2025

🎉 Our paper about MLLM safety alignment is accepted at ICLR 2025. Congratulations to all Collaborators!

Sep 2024

🎉 Yi serves as Reviewer of ICLR 2025!

Sep 2024

🎉 Our paper about Dynamic Image Fusion without additional training is accepted at NeurIPS 2024! Congratulations to all Collaborators!

Jul 2024

🎉 Yi will make a poster presentation at Tue 23 Jul 1:30 p.m. — 3 p.m. on ICML Hall C 4-9 #2817, Vienna, Austria!

May 2024

🎉 Our paper about Multimodal Fusion is accepted at ICML 2024!

Publications

* indicates author with equal contribution.
Technical Report
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45â—¦ Law

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45â—¦ Law

Shanghai Artificial Intelligence Laboratory, ..., Yi Ding, [and 100+ authors]

TL;DR: We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety.

Technical Report, 2025

Preprint
Sherlock: Self-Correcting Reasoning in Vision-Language Models

Sherlock: Self-Correcting Reasoning in Vision-Language Models

Yi Ding, Ruqi Zhang

TL;DR: We present Sherlock, a self-correction and self-improvement training framework enhancing VLM reasoning ability using minimal annotated data.

Preprint, arXiv 2025

Preprint
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

Ziqi Miao*, Yi Ding*, Lijun Li, Jing Shao

TL;DR: We present VisCo-Attack, which jailbreak MLLMs via visual-centric setting and fabricated visual context.

Preprint, arXiv 2025

Preprint
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models

Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models

Yi Ding*, Lijun Li*, Bing Cao, Jing Shao

TL;DR: Introducing the first multi-image safety (MIS) dataset, which includes both training and test splits. The VLMs fine-tuned with the MIRage method and MIS training set to improve both the safety and general performance of the models.

Preprint, arXiv 2025

ICLR 2025
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

Yi Ding, Bolian Li, Ruqi Zhang

TL;DR: Establishing multimodal safety mechanism for VLMs and enhancing harmlessness and helpfulness of responses without additional training.

International Conference on Learning Representations (ICLR), 2025

NeurIPS 2024
Test-Time Dynamic Image Fusion

Test-Time Dynamic Image Fusion

Bing Cao, Yinan Xia*, Yi Ding*, Changqing Zhang, Qinghua Hu

TL;DR: Improving quality of fused images of almost every backbones without additional training via setting dynamic weight in test-time.

Neural Information Processing Systems (NeurIPS), 2024

ICML 2024
Predictive Dynamic Fusion

Predictive Dynamic Fusion

Bing Cao, Yinan Xia*, Yi Ding*, Changqing Zhang, Qinghua Hu

TL;DR: The key to dynamic fusion lies in the correlation between the weights and the loss, providing generalization theory for decision-level fusion.

International Conference on Machine Learning (ICML), 2024

GitHub Repositories

Education

2025.08 (Expected) - Present

Ph.D. at Computer Science, Purdue University

Advisor: Dr. Ruqi Zhang

2021.08 - 2025.06

B.S. at School of Mathematics, Tianjin University

Advisor: Dr. Bing Cao

Services

Reviewer ICLR 2025, NeurIPS 2025, ARR 2025 May

Contact

ding432@purdue.edu

Beijing, China
Indiana, USA