Annealed Gradient Descent for Deep Learning

We present a theoretical analysis on AGD’s convergence properties and learning speed, as well as use some visualization methods to show its advantages. The proposed AGD algorithm is applied to learn both deep neural networks (DNNs) and Convolutional Neural Networks (CNNs) for variety of tasks includes image recognition and speech recognition. Experimental results on several widely-used databases, such as Switchboard, CIFAR-10 and Pascal VOC 2012, show that AGD yields better classification accuracy than SGD, and obviously accelerates the training speed of DNNs and CNNs.
Source: Neurocomputing - Category: Neuroscience Source Type: research