Yi Ding's Homepage

About Me

Hi there! I'm an PhD student at Purdue University Purdue Logo , Department of Computer Science, advised by Dr. Ruqi Zhang. I obtained my B.S. degree at the School of Mathematics, Tianjin University TJU Logo . Previously, I worked as a research assistant in the MLDM Lab's Multimodal Vision Processing (MVP) Group, under the guidance of Dr. Bing Cao.

My research interests lie in developing reliable machine learning algorithms and frameworks for real-world applications, with a particular focus on the alignment of Large Foundation Models (LLMs and VLMs) and the generalization of multimodal learning algorithms.

Research Interests

Multimodal Learning: Multimodal Fusion, Imbalanced Multimodal Learning.
Alignment of Foundation Models: LLMs, VLMs.
Trustworthy AI: Safety, Uncertainty, etc.

I am very excited about potential collaboration opportunities! If you share similar research interests and find my work interesting, I warmly welcome you to contact me via email!

Latest News

May 2025

🎉 Yi will give a talk about VLM safety at Shenlan School!

Apr 2025

🎉 Yi serves as Reviewer of NeurIPS 2025!

Jan 2025

🎉 Our paper, dataset, and models about VLM Multi-Image Safety (MIS) are released now!

Jan 2025

🎉 Our paper about MLLM safety alignment is accepted at ICLR 2025. Congratulations to all Collaborators!

Sep 2024

🎉 Yi serves as Reviewer of ICLR 2025!

Sep 2024

🎉 Our paper about Dynamic Image Fusion without additional training is accepted at NeurIPS 2024! Congratulations to all Collaborators!

Jul 2024

🎉 Yi will make a poster presentation at Tue 23 Jul 1:30 p.m. — 3 p.m. on ICML Hall C 4-9 #2817, Vienna, Austria!

May 2024

🎉 Our paper about Multimodal Fusion is accepted at ICML 2024!

Publications

* indicates author with equal contribution.

Technical Report

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45◦ Law

Shanghai Artificial Intelligence Laboratory, ..., Yi Ding, [and 100+ authors]

TL;DR: We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety.

Technical Report, 2025

Paper

Preprint

Sherlock: Self-Correcting Reasoning in Vision-Language Models

Yi Ding, Ruqi Zhang

TL;DR: We present Sherlock, a self-correction and self-improvement training framework enhancing VLM reasoning ability using minimal annotated data.

Preprint, arXiv 2025

Paper Code Project

Preprint

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection

Ziqi Miao*, Yi Ding*, Lijun Li, Jing Shao

TL;DR: We present VisCo-Attack, which jailbreak MLLMs via visual-centric setting and fabricated visual context.

Preprint, arXiv 2025

Paper Code

Preprint

Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models

Yi Ding*, Lijun Li*, Bing Cao, Jing Shao

TL;DR: Introducing the first multi-image safety (MIS) dataset, which includes both training and test splits. The VLMs fine-tuned with the MIRage method and MIS training set to improve both the safety and general performance of the models.

Preprint, arXiv 2025

Paper Code Project

ICLR 2025