mlpack
3.0.4

Introduction
The field of optimization is vast and complex. In optimization problems, we have to find solutions which are optimal or nearoptimal with respect to some goals. Usually, we are not able to solve problems in one step, but we follow some process which guides us through problemsolving. mlpack
implements multiple strategies for optimizing different objective functions and provides different strategies for selecting and customizing the appropriate optimization algorithm. This tutorial discusses how to use each of the techniques that mlpack
implements.
This visualization allows us to see how many popular optimizers perform on different optimization problems. Select a problem to optimize, then select an optimizer and tune its parameters, and see the steps that the optimizer takes plotted in pink. Note that you can zoom in and rotate the drawing. Also, you can compare how different optimizers perform on a given problem in the second graph. As you try a given problem with more optimizers, the objective function vs. the number of iterations is plotted for each optimizer you have tried.
Stepsize: 0.01
Iterations: 4000
The defaults here are not necessarily good for the given problem, so it is suggested that the values used be tailored to the task at hand. (Use the mouse to zoom/drag the function.)
As intuition says, system has higher probability of staying in the states with a smaller stepsize. As the stepsize goes up, imbalance becomes stronger. When the stepsize is close to zero, the system stays in the state(s) with the highest cost.
Evaluations: Dynamic
A plot of the cost reveals distinct properties for each optimizer with its own style of convergence.
Optimizer
Optimizer
AdaDelta  AdaGrad  Adam  AdaGrad  Adam  AdaMax  AMSGrad  Nadam  CNE  IQN  L_BFGS  RMSProp  SMORMS3  SPALeRASGD  SVRG  SVRG (BarzilaiBorwein)  SARAH  SARAH+  Katyusha  CMAES
FunctionType API
In order to facilitate consistent implementations, we have defined a FunctionType
API that describes all the methods that an objective function may implement. mlpack
offers a few variations of this API to cover different function characteristics. This leads to several different APIs for different function types:
 FunctionType: a normal, differentiable objective function.
 DecomposableFunctionType: a differentiable objective function that can be decomposed into the sum of many objective functions (for SGDlike optimizers).
 SparseFunctionType: a decomposable, differentiable objective function with a sparse gradient.
 NonDifferentiableFunctionType: a nondifferentiable objective function that can only be evaluated.
 ConstrainedFunctionType: an objective function with constraints on the allowable inputs.
 NonDifferentiableDecomposableFunctionType: a decomposable nondifferentiable objective function that can only be evaluated.
 ResolvableFunctionType: a differentiable objective function where calculating the gradient with respect to only one parameter is possible.
Each of these types of objective functions require slightly different methods to be implemented. In some cases, methods will be automatically deduced by the optimizers using template metaprogramming and this allows the user to not need to implement every method for a given type of objective function. Each type described above is detailed in the following sections.
The FunctionType API
A function satisfying the FunctionType
API is a general differentiable objective function. It must implement an Evaluate()
and a Gradient()
method. The interface used for that can be the following two methods:
However, there are some objective functions (like logistic regression) for which it is computationally convenient to calculate both the objective and the gradient at the same time. Therefore, optionally, the following function can be implemented:
It is not a problem to implement all three of these methods, but it is not obligatory to. EvaluateWithGradient()
will automatically be inferred if it is not written from the Evaluate()
and Gradient()
functions; similarly, the Evaluate()
and Gradient()
functions will be inferred from EvaluateWithGradient()
if they are not available. However, any automatically inferred method may be slightly slower.
The following optimizers use the FunctionType
API:
The DecomposableFunctionType API
A function satisfying the DecomposableFunctionType
API is a differentiable objective function that can be decomposed into a number of separable objective functions. Examples of these types of objective functions include those that are a sum of loss on individual data points; so, common machine learning tasks like logistic regression or training a neural network can be expressed as optimizing a decomposable objective function.
Any function implementing the DecomposableFunctionAPI
must implement the following four methods:
Note that the decomposable objective functions should support batch computation—this can allow significant speedups. The Shuffle()
method shuffles the ordering of the functions. For some optimizers, randomness is an important component, so it is important that it is possible to shuffle the ordering of the decomposable functions.
As with the regular FunctionType
API, it is optional to implement an EvaluateWithGradient()
method in place of, or in addition to, the Evaluate()
and Gradient()
methods. The interface used for that should be the following:
The following mlpack optimizers require functions implementing the DecomposableFunctionType
API:
 StandardSGD
 MomentumSGD
 AdaDelta
 AdaGrad
 Adam
 AdaMax
 AMSGrad
 Nadam
 NadaMax
 IQN
 Katyusha
 KatyushaProximal
 RMSProp
 SARAH
 SARAH+
 SGDR
 SMORMS3
 SPALeRA SGD
 SVRG
 SVRGBB
The SparseFunctionType API
A function satisfying the SparseFunctionType
API is a decomposable differentiable objective function with the condition that a single individual gradient is sparse. The API is slightly different but similar to the DecomposableFunctionType
API; the following methods are necessary:
The Shuffle()
method is not needed for the SparseFunctionType
API.
Note that it is possible to write a Gradient()
method that accepts a template parameter GradType
, which may be arma::mat
(dense Armadillo matrix) or arma::sp_mat
(sparse Armadillo matrix). This allows support for both the DecomposableFunctionType
API and the SparseFunctionType
API, as below:
The following mlpack optimizers require an objective function satisfying the SparseFunctionType
API:
The NonDifferentiableFunctionType API
A function satisfying the NonDifferentiableFunctionType
API is a general nondifferentiable objective function. Only an Evaluate()
method must be implemented, with the following signature:
The following mlpack optimizers require an objective function satisfying the NonDifferentiableFunctionType
API:
The ConstrainedFunctionType API
A function satisfying the ConstrainedFunctionType
API is a general differentiable objective function that has differentiable constraints for the parameters. This API is more complex than the others and requires five methods to be implemented:
The following mlpack optimizers require a ConstrainedFunctionType:
The NonDifferentiableDecomposableFunctionType API
A function satisfying the NonDifferentiableDecomposableFunctionType
API is a decomposable nondifferentiable objective function. Only an Evaluate()
and a NumFunctions()
method must be implemented, with the following signatures:
The following mlpack optimizers require a NonDifferentiableDecomposableFunctionType:
The ResolvableFunctionType API
A function satisfying the ResolvableFunctionType
API is a partially differentiable objective function. For this API, three methods must be implemented, with the following signatures:
It is not required to templatize so that both sparse and dense gradients can be used, but it can be helpful.
The following mlpack optimizers require a ResolvableFunctionType:
Optimizer API
An optimizer must implement only the method:
The Optimize()
method optimizes the given function function
, and stores the best set of parameters in the matrix parameters
and returns the best objective value.
If the optimizer requires a given API from above, the following functions from src/mlpack/optimizers/function/static_checks.hpp
can be helpful:
 mlpack::optimization::traits::CheckFunctionTypeAPI()
 mlpack::optimization::traits::CheckDecomposableFunctionTypeAPI()
 mlpack::optimization::traits::CheckSparseFunctionTypeAPI()
 mlpack::optimization::traits::CheckNonDifferentiableFunctionTypeAPI()
 mlpack::optimization::traits::CheckConstrainedFunctionTypeAPI()
 mlpack::optimization::traits::CheckNonDifferentiableDecomposableFunctionTypeAPI()
 mlpack::optimization::traits::CheckResolvableFunctionTypeAPI()
Simple Function and Optimizer example
The example below constructs a simple function, where each dimension has a parabola with a distinct minimum. Note, in this example we maintain an ordering with the vector order
; in other situations, such as training neural networks, we could simply shuffle the columns of the data matrix in Shuffle()
.
For the optimization of the defined ObjectiveFunction
and other mlpack
objective functions, we must implement only an Optimize() method, and a constructor to set some parameters. The code is given below.
Note for the sake of simplicity we omitted checks on the batch size (bs
). This optimizer assumes that function.NumFunctions()
is a multiple of the batch size, we also omitted other typical parts of real implementations, a more detailed example may be found in the complete API documentation". Still, SimpleOptimizer
works with any mlpack objective function which implements Evaluate that can compute a partial objective function, and Gradient
that can compute a part of the gradient starting with a separable function.
Finding the minimum of ObjectiveFunction
or any other mlpack
function can be done as shown below.
The final value of the objective function should be close to the optimal value, which is the sum of values at the vertices of the parabolas.
Further documentation
Further documentation for each Optimizer
may be found in the complete API documentation. In addition, more information on the testing functions may be found in its complete API documentation.
Generated by 1.8.13