MEDUSA: Motion Elimination in Diffusion Using Spectral Attack

Yu, Hongwei; Zha, Daoqing; Ding, Xinlong; Li, Jiawei; Zhuo, Junbao; Liu, Qiankun; Ma, Huimin; Chen, Jiansheng

MEDUSA: Motion Elimination in Diffusion Using Spectral Attack

Hongwei Yu^*, Daoqing Zha^*, Xinlong Ding, Jiawei Li, Junbao Zhuo, Qiankun Liu, Huimin Ma, Jiansheng Chen

University of Science and Technology Beijing

^* Equal Contribution · Corresponding Author

Paper Code Poster BibTeX

Overview of the MEDUSA spectral attack pipeline — MEDUSA formulates motion elimination as a spectral attack on video diffusion models. Starting from a clean reference image, it optimizes an imperceptible adversarial perturbation by minimizing the nuclear norm of the temporal attention matrix, suppressing trailing singular values and inducing temporal rank collapse. The resulting rank-1 attention pattern makes frames attend to nearly the same temporal content, effectively freezing motion while preserving the scene semantics.

Abstract

With the widespread application of Video Diffusion Models (VDMs), video synthesis has achieved remarkable temporal dynamics. Image-to-Video (I2V) generation allows users to provide reference images, which enables attackers to inject adversarial noise into these conditions. Due to the robust spatio-temporal priors in VDMs, conventional frame-level attacks merely induce superficial artifacts and struggle to suppress the synthesis of motion semantics. In this work, we approach the problem by exploring the underlying mechanism of temporal dynamics. We reveal that the static video manifests as temporal rank collapse, a degenerate state characterized by rank-1 degeneracy within the temporal attention matrix. Guided by this insight, we propose Motion Elimination in Diffusion Using Spectral Attack (MEDUSA) to freeze the video. It minimizes the nuclear norm of the attention matrix to induce the temporal rank collapse. This objective circumvents the vanishing gradient problem encountered when directly imposing a rigid temporal mapping on the attention matrix. Furthermore, we provide a mathematical analysis of this phenomenon and the gradient vanishing problem during the optimization. Experiments confirm that MEDUSA achieves excellent performance and validates the effectiveness of spectral constraints.

Analysis

Loss curves comparing MEDUSA optimization with baseline objectives — **Optimization stability.** MEDUSA provides a smoother descent path than hard attention targets, avoiding early gradient saturation.

Imperceptibility analysis of the MEDUSA adversarial perturbation — **Imperceptibility.** The perturbation stays visually subtle while still removing temporal motion in the generated video.

Result

Clean generation

MEDUSA attack

Clean generation

MEDUSA attack

Clean generation

MEDUSA attack

Clean generation

MEDUSA attack

Clean generation

MEDUSA attack

Clean generation

MEDUSA attack

BibTeX

@inproceedings{yu2026medusa,
  title     = {MEDUSA: Motion Elimination in Diffusion Using Spectral Attack},
  author    = {Yu, Hongwei and Zha, Daoqing and Ding, Xinlong and Li, Jiawei and Zhuo, Junbao and Liu, Qiankun and Ma, Huimin and Chen, Jiansheng},
  booktitle = {Proceedings of the International Conference on Machine Learning},
  year      = {2026}
}