Home > Authors > Yi Ren > Using Second-Order Information in Training Deep Neural Networks
Using Second-Order Information in Training Deep Neural Networks
In this dissertation, we are concerned with the advancement of optimization algorithms for training deep learning models, and in particular about practical second-order methods that take into account the structure of deep neural networks (DNNs). Although first-order methods such as stochastic gradient descent have long been the predominant optimization algorithm used in deep learning, second-order methods are of interest because of their ability to use curvature information to accelerate the optimization process. After the presentation of some background information in Chapter 1, Chapters 2 and 3 focus on the development of practical quasi-Newton methods for training DNNs. We analyze the Kronecker-factored structure of the Hessian matrix of multi-layer perceptrons and convolutional neural networks and consequently propose block-diagonal Kronecker-factored quasi-Newton methods named...