Escaping Saddle Points for Successive Convex Approximation

Optimizing non-convex functions is of primary importance in modern pattern recognition because it underlies the training of deep networks and nonlinear dimensionality reduction. First-order algorithms under suitable randomized perturbations or step-size rules have been shown to be effective for such settings as their limit points can be guaranteed to be local extrema rather than saddle points. However, it is well-known that the practical convergence of first-order methods is slower than those which exploit additional structure. In particular, empirically, successive convex approximation (SCA) converges faster than first-order methods. However, to date, SCA in general non-convex settings converges to first-order stationary points, which could either be local extrema or saddle points whose performance is typically inferior. To mitigate this issue, we propose calibrated randomized perturbations of SCA, which exhibit the improved convergence rate as compared to the gradient descent counter part. In particular, our main technical contributions are to establish the non-asymptotic performance of SCA algorithm and its perturbed variant converges to an approximate second-order stationary point. Experiments on multi-dimensional scaling, a machine learning problem whose training objective is non-convex, substantiate the performance gains associated with employing random perturbations.
Source: IEEE Transactions on Signal Processing - Category: Biomedical Engineering Source Type: research