← Back to Ideas

Network weights that consistently remain orthogonal to the negative curvature directions of traversed saddle points are redundant, enabling a 'Dynamic Saddle Pruning' strategy that is more effective than magnitude-based pruning.

Feasibility: 7 Novelty: 8

Motivation

If the learning process is defined by escaping saddles to find more complex solutions, the 'active' weights are those participating in these escape trajectories. Current pruning methods (like magnitude pruning) ignore the dynamic history of the loss landscape. Identifying weights that never align with the escape directions could reveal the true 'skeleton' of the solution earlier in training.

Proposed Method

Track the Hessian eigenvectors corresponding to the most negative eigenvalues at each detected saddle point during training. Compute an accumulation score for each weight based on its projection onto these escape vectors. At the end of training, prune weights with the lowest accumulated projection scores. Compare the sparsity-accuracy trade-off against Magnitude Pruning and Lottery Ticket Hypothesis baselines.

Expected Contribution

A theoretically grounded pruning metric that links network compression directly to the dynamic optimization trajectory, potentially lowering training compute requirements.

Required Resources

Standard GPU compute; libraries for Hessian analysis (e.g., PyHessian); pre-trained model architectures (ResNet, VGG).

Source Paper

Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures

View Paper Details →

←

Injecting gradient noise aligned with the negative

→

Adversarial vulnerability is introduced primarily