Test-Time Dyanamic Image Fusion

1College of Intelligence and Computing, Tianjin University, Tianjin, China
2Tianjin Key Lab of Machine Learning, Tianjin, China
geometric reasoning

We proceed to reveal the generalized form of image fusion and derive a new test-time dynamic image fusion paradigm. It provably reduces the upper bound of generalization error. Specifically, we decompose the fused image into multiple components corresponding to its source data. The decomposed components represent the effective information from the source data, thus the gap between them reflects the Relative Dominability (RD) of the uni-source data in constructing the fusion image. Theoretically, we prove that the key to reducing generalization error hinges on the negative correlation between the RD-based fusion weight and the uni-source reconstruction loss. Intuitively, RD dynamically highlights the dominant regions of each source and can be naturally converted to the corresponding fusion weight, achieving robust results.

📌 Introduction


Left: USR changes from LLM backbone to VLM, and finally our ETA. Right: Pre-generation evaluator can effectively distinguish safe and unsafe images

ETA establishes a strong multimodal safety awareness and defense mechanism for VLMs without any additional training.

👓 Visualization

vis VIF [1].

vis2 MIF [2].

vis on MFF MEF [3].

vis on Ablation1 [4].

vis on Ablation2 [5].

RD VIF [1].

RD MIF [2].

RD on MFF MEF [3].

🖨 Results

VIF [1].

MEF MFF [2].

MIF1 [3].

MIF2 [4].

MIF3 [5].