Analysis of Gradient Descent on Wide Two-Layer ReLU Neural Networks
In this talk, we propose an analysis of gradient descent on wide two-layer ReLU neural networks that leads to sharp characterizations of the learned predictor. The main idea is to study the dynamics when the width of the hidden layer goes to infinity, which is a Wasserstein gradient flow. While this dynamics evolves on a non-convex landscape, we show that its limit is a global minimizer if initialized properly. We also study the "implicit bias" of this algorithm when the objective is the unregularized logistic loss. We finally discuss what these results tell us about the generalization performance. This is based on joint work with Francis Bach.
Lénaïc Chizat is a CNRS researcher at Université Paris-Saclay. He obtained his PhD in applied mathematics at Université Paris-Dauphine in 2017. He is working on the mathematical analysis of data-driven algorithms. His current research interests include the theory of optimal transport and the theory of artificial neural networks.
2020-11-17 at 3:00 pm