Adversarial vulnerability is introduced primarily during the transitions to late-stage 'complex' saddle points, meaning an ensemble of intermediate saddle-point checkpoints provides superior robustness compared to the final converged model.
Motivation
The paper suggests networks learn simple functions first. Literature suggests simple functions are often more robust, while complex functions rely on high-frequency, non-robust features. This implies that the 'saddle-to-saddle' trajectory is also a 'robust-to-fragile' trajectory. Utilizing the intermediate states (saddles) rather than just the final minimum could offer a natural defense against adversarial attacks without expensive adversarial training.
Proposed Method
Train a deep network and save checkpoints specifically when the gradient norm is low but the Hessian has negative eigenvalues (identifying saddle points). Construct an inference mechanism that ensembles the predictions of the last 3 distinct saddle checkpoints. Test this ensemble against PGD and FGSM attacks compared to the final model and standard early-stopping baselines.
Expected Contribution
A novel, training-efficient defense mechanism against adversarial attacks that leverages the inherent spectral dynamics of the loss landscape.
Required Resources
GPU compute for training and attack simulation; adversarial robustness benchmark libraries (e.g., RobustBench).
Source Paper
Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures