[mlpack] [GSOC '16] : Student interested in Low rank/sparse optimization using Frank-Wolfe

Tue Mar 8 10:00:41 EST 2016

On Tue, Mar 08, 2016 at 05:56:02AM +0000, Kwon, Jin Kyoung wrote:
> Hello everyone,
> 
> My name is Jin Kyoung Kwon and I am a 3rd year Computer Science major
> at Georgia Institute of Technology, interested in contributing to this
> project through GSOC. I am currently enrolled in Dr. Isbell's Machine
> Learning course at GT and have enjoyed the course so much so that I am
> aspiring to spend my summer learning more about ML and doing related
> work. I am specifically interested in the low rank/sparse optimization
> using Frank-Wolfe, as I am interested in optimization problems as a CS
> major concentrating in modeling/simulation. I am also familiar with
> non-convex optimization problems through learning about randomized
> optimization topics as well as a bit of the math through my coursework
> in differential equations and linear algebra. I am willing to learn
> more advanced math topics as well as gain skills in C++ as I believe
> the project will be a good fit for me.
> 
> I have built the mlpack library on my machine, learned how the library
> works through running simple execs, and read Dr. Martin Jaggi's paper
> about the topic. I wanted to gain a better sense of how the
> implementation of the framework would fit into the existing one. I saw
> that we already have methods for core optimizers (core/optimizers) as
> well as problems that variants of Frank-Wolfe algorithms can be
> applied to, such as Lasso (methods/lasso), matrix completion
> (methods/matrix_completion), and classic learning tasks (boosting,
> svm, etc). What I think is that our framework would go under
> core/optimizers, but I am not sure. I wanted to ask, how are you
> envisioning the Frank-Wolfe algorithms to fit in with the current
> architecture?
> 
> I am incredibly excited to continue learning about the library, and I
> am looking forward to hearing from you!

Hi Jenna,

Thanks for getting in touch.

I'm glad you're enjoying Charles' class; I think he's a great lecturer.
I hope that he has mentioned mlpack in his class and that is what
brought you here... :)

The current optimizer architecture is unfortunately not documented in
its own wiki page or doxygen page, but I can explain it to you a little
bit here and point you towards relevant code:

Optimizers in mlpack are stored in src/mlpack/core/optimizers/; right
now, we have implementations of SGD, mini-batch SGD, L-BFGS, simulated
annealing, RMSprop, the augmented Lagrangian method, and a low-rank
algorithm for semidefinite programming (LRSDP).  I think there are more
that will be there soon, since the artificial neural network code uses
optimizers that now match the API of the rest of the optimizers.

Each optimizer takes (at least) one template parameter, which is the
function type to be optimized:

template<typename FunctionType>
class Optimizer;

The function to be optimized must provide at least two functions:

  double Evaluate(const arma::mat& coordinates);
  void Gradient(const arma::mat& coordinates, arma::mat& gradient);

and those functions, given some input coordinates, will calculate the
objective function or the gradient.  This should be documented fairly
well in the comments in each file, and the code should be fairly easy to
read to figure out what is going on (I'd suggest taking a look at SGD
first since it's so simple).

My hope is that a project implemeting the Frank-Wolfe algorithm would
result in an optimizer of this type ending up in
src/mlpack/core/optimizers/ and then possibly the various programs like
mlpack_softmax_regression, mlpack_logistic_regression, and others being
modified so that they could use the Frank-Wolfe optimizer.

LARS actually does not support other optimizers; it has a hard-coded
version of (I think) Newton's method in there somewhere.  It would be
great to refactor it, pull Newton's method out, and allow an arbitrary
optimizer, but I don't think anyone's ever had the time (or motivation)
to do that.

I remember that I just wrote more about this in another email but I
can't seem to find the link.  Maybe if you search the archives you will
be able to find something more?
https://mailman.cc.gatech.edu/pipermail/mlpack/

I hope this information is helpful; let me know if I can clarify
anything.

Thanks,

Ryan

-- 
Ryan Curtin    | "Avoid the planet Earth at all costs."
ryan at ratml.org |   - The President