Injecting gradient noise aligned with the negative curvature directions of the current saddle point explicitly controls the rate of complexity acquisition, allowing for 'tunable' simplicity bias.
Motivation
The paper establishes that learning dynamics traverse a sequence of saddle points from simple to complex. However, standard SGD relies on isotropic noise, which is inefficient for navigating this specific geometry. By actively aligning noise with the unstable directions (negative eigenvalues) of the saddle, we could theoretically accelerate the learning of complex features or, conversely, halt the process at a specific complexity level to maximize generalization.
Proposed Method
Implement an optimizer that periodically estimates the top-k negative eigenvectors of the Hessian using Lanczos iteration or power method during training plateaus (saddle points). Create two experimental conditions: one where noise is added specifically along these eigenvectors (to accelerate escape), and one where gradients along these directions are suppressed (to stabilize the current complexity). Compare convergence speed and generalization gaps against standard SGD and Adam on CIFAR-10 and ImageNet subsets.
Expected Contribution
A new class of 'Saddle-Aware' optimizers that decouple convergence speed from generalization performance, confirming the causal link between saddle escape dynamics and feature complexity.
Required Resources
High-performance GPU cluster (H100s/A100s) for Hessian-vector product calculations; standard computer vision datasets; PyTorch/JAX expertise.
Source Paper
Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures