mlpack_sparse_coding - sparse coding


mlpack_sparse_coding [-k int] [-i string] [-m unknown] [-l double] [-L double] [-n int] [-w double] [-N bool] [-o double] [-s int] [-T string] [-t string] [-V bool] [-c string] [-d string] [-M unknown] [-h -v]


An implementation of Sparse Coding with Dictionary Learning, which achieves sparsity via an l1-norm regularizer on the codes (LASSO) or an (l1+l2)-norm regularizer on the codes (the Elastic Net). Given a dense data matrix X with d dimensions and n points, sparse coding seeks to find a dense dictionary matrix D with k atoms in d dimensions, and a sparse coding matrix Z with n points in k dimensions.

The original data matrix X can then be reconstructed as Z * D. Therefore, this program finds a representation of each point in X as a sparse linear combination of atoms in the dictionary D.

The sparse coding is found with an algorithm which alternates between a dictionary step, which updates the dictionary D, and a sparse coding step, which updates the sparse coding matrix.

Once a dictionary D is found, the sparse coding model may be used to encode other matrices, and saved for future usage.

To run this program, either an input matrix or an already-saved sparse coding model must be specified. An input matrix may be specified with the ’--training_file (-t)’ option, along with the number of atoms in the dictionary (specified with the ’--atoms (-k)’ parameter). It is also possible to specify an initial dictionary for the optimization, with the ’--initial_dictionary_file (-i)’ parameter. An input model may be specified with the ’--input_model_file (-m)’ parameter.

As an example, to build a sparse coding model on the dataset ’data.csv’ using 200 atoms and an l1-regularization parameter of 0.1, saving the model into ’model.bin’, use

$ sparse_coding --training_file data.csv --atoms 200 --lambda1 0.1 --output_model_file model.bin

Then, this model could be used to encode a new matrix, ’otherdata.csv’, and save the output codes to ’codes.csv’:

$ sparse_coding --input_model_file model.bin --test_file otherdata.csv --codes_file codes.csv


--atoms (-k) [int]

Number of atoms in the dictionary. Default value 15.

--help (-h) [bool]

Default help info.

--info [string]

Get help on a specific module or option. Default value ’’.

--initial_dictionary_file (-i) [string]

Optional initial dictionary matrix. Default value ’’.

--input_model_file (-m) [unknown]

File containing input sparse coding model. Default value ’’.

--lambda1 (-l) [double]

Sparse coding l1-norm regularization parameter. Default value 0.

--lambda2 (-L) [double]

Sparse coding l2-norm regularization parameter. Default value 0.

--max_iterations (-n) [int]

Maximum number of iterations for sparse coding (0 indicates no limit). Default value 0.

--newton_tolerance (-w) [double]

Tolerance for convergence of Newton method. Default value 1e-06.

--normalize (-N) [bool]

If set, the input data matrix will be normalized before coding.

--objective_tolerance (-o) [double]

Tolerance for convergence of the objective function. Default value 0.01.

--seed (-s) [int]

Random seed. If 0, ’std::time(NULL)’ is used. Default value 0.

--test_file (-T) [string]

Optional matrix to be encoded by trained model. Default value ’’.

--training_file (-t) [string]

Matrix of training data (X). Default value ’’.

--verbose (-v) [bool]

Display informational messages and the full list of parameters and timers at the end of execution.

--version (-V) [bool]

Display the version of mlpack.


--codes_file (-c) [string]

Matrix to save the output sparse codes of the test matrix (--test_file) to. Default value ’’.

--dictionary_file (-d) [string]

Matrix to save the output dictionary to. Default value ’’.

--output_model_file (-M) [unknown]

File to save trained sparse coding model to. Default value ’’.


For further information, including relevant papers, citations, and theory, consult the documentation found at or included with your distribution of mlpack.