Careful With That Scalpel: Improving Gradient Surgery With an EMA
Beyond minimizing a single training loss, many deep learning estimation pipelines rely on an auxiliary objective to quantify and encourage desirable properties of the model (e.g. performance on another dataset, robustness, agreement with a prior). Although the simplest approach to incorporating an auxiliary loss is to sum it with the training loss as a regularizer, …
Read more “Careful With That Scalpel: Improving Gradient Surgery With an EMA”