The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
This paper was accepted to the “Has it Trained Yet?” (HITY) workshop at NeurIPS 2022. The grokking phenomenon as reported by Power et al., refers to a regime where a long period of overfitting is followed by a seemingly sudden transition to perfect generalization. In this paper, we attempt to reveal the underpinnings of Grokking …