π About Me
Hi there! Iβm an incoming PhD student at Purdue University , Department of Computer Science, advised by Dr. Ruqi Zhang. I am also in my final year as an undergraduate student at the School of Mathematics, Tianjin University
. Previously, I worked as a research assistant in the MLDM Labβs Multimodal Vision Processing (MVP) Group, under the guidance of Dr. Bing Cao.
My research interests lie in developing reliable machine learning algorithms and frameworks for real-world applications, with a particular focus on the alignment of Large Foundation Models (LLMs and VLMs) and the generalization of multimodal learning algorithms.
𧩠Research Interests
- Multimodal Learning: Multimodal Fusion, Imbalanced Multimodal Learning.
- Alignment of Foundation Models: LLMs, VLMs.
- Trustworthy AI: Safety, Uncertainty, etc.
I am very excited about potential collaboration opportunities! You can find my CV here. If you share similar research interests and find my work interesting, I warmly welcome you to add my Wechat for further discussion!
π₯ News
- [Jan. 2025]: Β π Our paper, dataset, and models about MLLM Multi-Image Safety (MIS) is released now!
- [Jan. 2025]: Β π Our paper about MLLM safety alignment is accepted at ICLR 2025. Congratulations to all Collaborators!
- [Sep. 2024]: Β π Yi serves as Reviewer of ICLR 2025!
- [Sep. 2024]: Β π Our paper about Dynamic Image Fusion without additional training is accepted at NeurIPS 2024! Congratulations to all Collaborators!
- [Jul. 2024]: Β π Yi will make a poster presentation at Tue 23 Jul 1:30 p.m. β 3 p.m. on ICML Hall C 4-9 #2817, Vienna, Austria!
- [May. 2024]: Β π Our paper about Multimodal Fusion is accepted at ICML 2024!
π Publications & Preprints
* indicates author with equal contribution.

Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models
Yi Ding*, Lijun Li*, Bing Cao, Jing Shao
TL;DR: Introducing the first multi-image safety (MIS) dataset, which includes both training and test splits. The VLMs fine-tuned with the MIRage method and MIS training set to improve both the safety and general performance of the models. Preprint, arXiv 2025

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
Yi Ding, Bolian Li, Ruqi Zhang
TL;DR: Establishing multimodal safety mechanism for VLMs and enhancing harmlessness and helpfulness of responses without additional training.
International Conference on Learning Representations (ICLR), 2025


Predictive Dynamic Fusion
Bing Cao, Yinan Xia*, Yi Ding*, Changqing Zhang, Qinghua Hu
TL;DR: The key to dynamic fusion lies in the correlation between the weights and the loss, providing generalization theory for decision-level fusion.
International Conference on Machine Learning (ICML), 2024
π Educations
- 2024.05 - Present, Research Intern, Computer Science, Purdue University
- 2021.08 - Present, B.S., School of Mathematics, Tianjin University