πŸ‘€ About Me

Hi there! I’m an incoming PhD student at Purdue University Purdue Logo, Department of Computer Science, advised by Dr. Ruqi Zhang. I am also in my final year as an undergraduate student at the School of Mathematics, Tianjin University TJU Logo. Previously, I worked as a research assistant in the MLDM Lab’s Multimodal Vision Processing (MVP) Group, under the guidance of Dr. Bing Cao.

My research interests lie in developing reliable machine learning algorithms and frameworks for real-world applications, with a particular focus on the alignment of Large Foundation Models (LLMs and VLMs) and the generalization of multimodal learning algorithms.

🧩 Research Interests

  • Multimodal Learning: Multimodal Fusion, Imbalanced Multimodal Learning.
  • Alignment of Foundation Models: LLMs, VLMs.
  • Trustworthy AI: Safety, Uncertainty, etc.

I am very excited about potential collaboration opportunities! You can find my CV here. If you share similar research interests and find my work interesting, I warmly welcome you to add my Wechat for further discussion!

πŸ”₯ News

  • [Jan. 2025]: Β πŸŽ‰ Our paper, dataset, and models about MLLM Multi-Image Safety (MIS) is released now!
  • [Jan. 2025]: Β πŸŽ‰ Our paper about MLLM safety alignment is accepted at ICLR 2025. Congratulations to all Collaborators!
  • [Sep. 2024]: Β πŸŽ‰ Yi serves as Reviewer of ICLR 2025!
  • [Sep. 2024]: Β πŸŽ‰ Our paper about Dynamic Image Fusion without additional training is accepted at NeurIPS 2024! Congratulations to all Collaborators!
  • [Jul. 2024]: Β πŸŽ‰ Yi will make a poster presentation at Tue 23 Jul 1:30 p.m. β€” 3 p.m. on ICML Hall C 4-9 #2817, Vienna, Austria!
  • [May. 2024]: Β πŸŽ‰ Our paper about Multimodal Fusion is accepted at ICML 2024!

πŸ“ Publications & Preprints

* indicates author with equal contribution.

Preprint
sym

Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models

Yi Ding*, Lijun Li*, Bing Cao, Jing Shao

TL;DR: Introducing the first multi-image safety (MIS) dataset, which includes both training and test splits. The VLMs fine-tuned with the MIRage method and MIS training set to improve both the safety and general performance of the models. Preprint, arXiv 2025

[PDF] [Project Page] [CODE]

ICLR 2025
sym

ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time

Yi Ding, Bolian Li, Ruqi Zhang

TL;DR: Establishing multimodal safety mechanism for VLMs and enhancing harmlessness and helpfulness of responses without additional training.

International Conference on Learning Representations (ICLR), 2025

[PDF] [Project Page] [CODE]

NeurIPS 2024
sym

Test-Time Dynamic Image Fusion

Bing Cao, Yinan Xia*, Yi Ding*, Changqing Zhang, Qinghua Hu

TL;DR: Improving quality of fused images of almost every backbones without additional training via setting dynamic weight in test-time.

Neural Information Processing Systems (NeurIPS), 2024

[PDF] [CODE]

ICML 2024
sym

Predictive Dynamic Fusion

Bing Cao, Yinan Xia*, Yi Ding*, Changqing Zhang, Qinghua Hu

TL;DR: The key to dynamic fusion lies in the correlation between the weights and the loss, providing generalization theory for decision-level fusion.

International Conference on Machine Learning (ICML), 2024

[PDF] [CODE]

πŸ“– Educations

  • 2024.05 - Present, Research Intern, Computer Science, Purdue University Purdue Logo
  • 2021.08 - Present, B.S., School of Mathematics, Tianjin University TJU Logo