## history and news

apr. 13th, 2017 -- mlpack 2.2.1

- Compilation fix for mlpack_nca and mlpack_test on older Armadillo versions (#984).

- Bugfix for mlpack_knn program (#816).
- Add decision tree implementation in methods/decision_tree/. This is very similar to a C4.5 tree learner.
- Add DBSCAN implementation in methods/dbscan/.
- Add support for multidimensional discrete distributions (#810, #830).
- Better output for Log::Debug/Log::Info/Log::Warn/Log::Fatal for Armadillo objects (#895, #928).
- Refactor categorical CSV loading with boost::spirit for faster loading (#681).

- HMMs now use random initialization; this should fix some convergence issues (#828).
- HMMs now initialize emissions according to the distribution of observations (#833).
- Minor fix for formatted output (#814).
- Fix DecisionStump to properly work with any input type.

- Fixed CoverTree to properly handle single-point datasets.
- Fixed a bug in CosineTree (and thus QUIC-SVD) that caused split failures for some datasets (#717).
- Added mlpack_preprocess_describe program, which can be used to print statistics on a given dataset (#742).
- Fix prioritized recursion for k-furthest-neighbor search (mlpack_kfn and the KFN class), leading to orders-of-magnitude speedups in some cases.
- Bump minimum required version of Armadillo to 4.200.0.
- Added simple Gradient Descent optimizer, found in src/mlpack/core/optimizers/gradient_descent/ (#792).
- Added approximate furthest neighbor search algorithms QDAFN and DrusillaSelect in src/mlpack/methods/approx_kfn/, with command-line program mlpack_approx_kfn.

- Standardize some parameter names for programs (old names are kept for reverse compatibility, but warnings will now be issued).
- RectangleTree optimizations (#721).
- Fix memory leak in NeighborSearch (#731).
- Documentation fix for k-means tutorial (#730).
- Fix TreeTraits for BallTree (#727).
- Fix incorrect parameter checks for some command-line programs.
- Fix error in HMM training with probabilities for each point (#636).

- Added the function LSHSearch::Projections(), which returns an arma::cube with each projection table in a slice (#663). Instead of Projection(i), you should now use Projections().slice(i).
- A new constructor has been added to LSHSearch that creates objects using projection tables provided in an arma::cube (#663).
- LSHSearch projection tables refactored for speed (#675).
- Handle zero-variance dimensions in DET (#515).
- Add MiniBatchSGD optimizer (src/mlpack/core/optimizers/minibatch_sgd/) and allow its use in mlpack_logistic_regression and mlpack_nca programs.
- Add better backtrace support from Grzegorz Krajewski for Log::Fatal messages when compiled with debugging and profiling symbols. This requires libbfd and libdl to be present during compilation.
- CosineTree test fix from Mikhail Lozhnikov (#358).
- Fixed HMM initial state estimation (#600).
- Changed versioning macros __MLPACK_VERSION_MAJOR, __MLPACK_VERSION_MINOR, and __MLPACK_VERSION_PATCH to MLPACK_VERSION_MAJOR, MLPACK_VERSION_MINOR, and MLPACK_VERSION_PATCH. The old names will remain in place until mlpack 3.0.0.
- Renamed mlpack_allknn, mlpack_allkfn, and mlpack_allkrann to mlpack_knn, mlpack_kfn, and mlpack_krann. The mlpack_allknn, mlpack_allkfn, and mlpack_allkrann programs will remain as copies until mlpack 3.0.0.
- Add --random_initialization option to mlpack_hmm_train, for use when no labels are provided.
- Add --kill_empty_clusters option to mlpack_kmeans and KillEmptyClusters policy for the KMeans class (#595, #596).

- Fix CMake to properly detect when MKL is being used with Armadillo.
- Minor parameter handling fixes to mlpack_logistic_regression (#504, #505).
- Properly install arma_config.hpp.
- Memory handling fixes for Hoeffding tree code.
- Add functions that allow changing training-time parameters to HoeffdingTree class.
- Fix infinite loop in sparse coding test.
- Documentation spelling fixes (#501).
- Properly handle covariances for Gaussians with large condition number (#496), preventing GMMs from filling with NaNs during training (and also HMMs that use GMMs).
- CMake fixes for finding LAPACK and BLAS as Armadillo dependencies when ATLAS is used.
- CMake fix for projects using mlpack's CMake configuration from elsewhere (#512).

- Removed overclustering support from k-means because it is not well-tested, may be buggy, and is (I think) unused. If this was support you were using, open a bug or get in touch with us; it would not be hard for us to reimplement it.
- Refactored KMeans to allow different types of Lloyd iterations.
- Added implementations of k-means: Elkan's algorithm, Hamerly's algorithm, Pelleg-Moore's algorithm, and the DTNN (dual-tree nearest neighbor) algorithm.
- Significant acceleration of LRSDP via the use of accu(a % b) instead of trace(a * b).
- Added MatrixCompletion class (matrix_completion), which performs nuclear norm minimization to fill unknown values of an input matrix.
- No more dependence on Boost.Random; now we use C++11 STL random support.
- Add softmax regression, contributed by Siddharth Agrawal and QiaoAn Chen.
- Changed NeighborSearch, RangeSearch, FastMKS, LSH, and RASearch API; these classes now take the query sets in the Search() method, instead of in the constructor.
- Use OpenMP, if available. For now OpenMP support is only available in the DET training code.
- Add support for predicting new test point values to LARS and the command-line 'lars' program.
- Add serialization support for Perceptron and LogisticRegression.
- Refactor SoftmaxRegression to predict into an arma::Row object, and add a softmax_regression program.
- Refactor LSH to allow loading and saving of models.
- ToString() is removed entirely (#487).
- Add --input_model_file and --output_model_file options to appropriate machine learning algorithms.
- Rename all executables to start with an "mlpack" prefix (#229).

- Switch to 3-clause BSD license.

- Proper handling of dimension calculation in PCA.
- Load parameter vectors properly for LinearRegression models.
- Linker fixes for AugLagrangian specializations under Visual Studio.
- Add support for observation weights to LinearRegression.
- MahalanobisDistance<> now takes root of the distance by default and therefore satisfies the triangle inequality (TakeRoot now defaults to true).
- Better handling of optional Armadillo HDF5 dependency.
- Fixes for numerous intermittent test failures.
- math::RandomSeed() now sets the seed for recent (>= 3.930) Armadillo versions.
- Handle Newton method convergence better for SparseCoding::OptimizeDictionary() and make maximum iterations a parameter.
- Known bug: CosineTree construction may fail in some cases on i386 systems (#358).

- Bugfix for NeighborSearch regression which caused very slow allknn/allkfn. Speeds are now restored to approximately 1.0.8 speeds, with significant improvement for the cover tree (#347).
- Detect dependencies correctly when ARMA_USE_WRAPPER is not defined (i.e., libarmadillo.so does not exist).
- Bugfix for compilation under Visual Studio (#348).

- GMM initialization is now safer and provides a working GMM when constructed with only the dimensionality and number of Gaussians (#301).
- Check for division by 0 in Forward-Backward algorithm in HMMs (#301).
- Fix MaxVarianceNewCluster (used when re-initializing clusters for k-means) (#301).
- Fixed implementation of Viterbi algorithm in HMM::Predict() (#303).
- Significant speedups for dual-tree algorithms using the cover tree (#235, #314) including a faster implementation of FastMKS.
- Fix for LRSDP optimizer so that it compiles and can be used (#312).
- CF (collaborative filtering) now expects users and items to be zero-indexed, not one-indexed (#311).
- CF::GetRecommendations() API change: now requires the number of recommendations as the first parameter. The number of users in the local neighborhood should be specified with CF::NumUsersForSimilarity().
- Removed incorrect PeriodicHRectBound (#58).
- Refactor LRSDP into LRSDP class and standalone function to be optimized (#305).
- Fix for centering in kernel PCA (#337).
- Added simulated annealing (SA) optimizer, contributed by Zhihao Lou.
- HMMs now support initial state probabilities; these can be set in the constructor, trained, or set manually with HMM::Initial() (#302).
- Added Nyström method for kernel matrix approximation by Marcus Edel.
- Kernel PCA now supports using the Nyström method for approximation.
- Ball trees now work with dual-tree algorithms, via the BallBound<> bound structure (#307); fixed by Yash Vadalia.
- The NMF class is now AMF<>, and supports far more types of factorizations, by Sumedh Ghaisas.
- A QUIC-SVD implementation has returned, written by Siddharth Agrawal and based on older code from Mudit Gupta.
- Added perceptron and decision stump by Udit Saxena (these are weak learners for an eventual AdaBoost class).
- Sparse autoencoder added by Siddharth Agrawal.

- Memory leak in NeighborSearch index-mapping code fixed (#298).
- GMMs can be trained using the existing model as a starting point by specifying an additional boolean parameter to GMM::Estimate() (#296).
- Logistic regression implementation added in methods/logistic_regression (see also #293).
- Version information is now obtainable via mlpack::util::GetVersion() or the __MLPACK_VERSION_MAJOR, __MLPACK_VERSION_MINOR, and __MLPACK_VERSION_PATCH macros (#297).
- Fix typos in allkfn and allkrann output.

- Cover tree support for range search (range_search), rank-approximate nearest neighbors (allkrann), minimum spanning tree calculation (emst), and FastMKS (fastmks).
- Dual-tree FastMKS implementation added and tested.
- Added collaborative filtering package (cf) that can provide recommendations when given users and items.
- Fix for correctness of Kernel PCA (kernel_pca) (#270).
- Speedups for PCA and Kernel PCA (#198).
- Fix for correctness of Neighborhood Components Analysis (NCA) (#279).
- Minor speedups for dual-tree algorithms.
- Fix for Naive Bayes Classifier (nbc) (#269).
- Added a ridge regression option to LinearRegression (linear_regression) (#286).
- Gaussian Mixture Models (gmm::GMM<>) now support arbitrary covariance matrix constraints (#283).
- MVU (mvu) removed because it is known to not work (#183).
- Minor updates and fixes for kernels (in mlpack::kernel).

- Minor bugfix so that FastMKS gets built.

- Speedups of cover tree traversers (#235).
- Addition of rank-approximate nearest neighbors (RANN), found in src/mlpack/methods/rann/.
- Addition of fast exact max-kernel search (FastMKS), found in src/mlpack/methods/fastmks/.
- Fix for EM covariance estimation; this should improve GMM training time.
- More parameters for GMM estimation.
- Force GMM and GaussianDistribution covariance matrices to be positive definite, so that training converges much more often.
- Add parameter for the tolerance of the Baum-Welch algorithm for HMM training.
- Fix for compilation with clang compiler.
- Fix for k-furthest-neighbor search.

- Force minimum Armadillo version to 2.4.2.
- Better output of class types to streams; a class with a ToString() method implemented can be sent to a stream with operator<<.
- Change return type of GMM::Estimate() to double (#257).
- Style fixes for k-means and RADICAL.
- Handle size_t support correctly with Armadillo 3.6.2 (#258).
- Add locality-sensitive hashing (LSH), found in src/mlpack/methods/lsh/.
- Better tests for SGD (stochastic gradient descent) and NCA (neighborhood components analysis).

- Remove internal sparse matrix support because Armadillo 3.4.0 now includes it. When using Armadillo versions older than 3.4.0, sparse matrix support is not available.
- NCA (neighborhood components analysis) now support an arbitrary optimizer (#245), including stochastic gradient descent (#249).

- Added density estimation trees, found in src/mlpack/methods/det/.
- Added non-negative matrix factorization, found in src/mlpack/methods/nmf/.
- Added experimental cover tree implementation, found in src/mlpack/core/tree/cover_tree/ (#157).
- Better reporting of boost::program_options errors (#225).
- Fix for timers on Windows (#212, #211).
- Fix for allknn and allkfn output (#204).
- Sparse coding dictionary initialization is now a template parameter (#220).

- Added kernel principal components analysis (kernel PCA), found in src/mlpack/methods/kernel_pca/ (#74).
- Fix for Lovasz-Theta AugLagrangian tests (#182).
- Fixes for allknn output (#185, #186).
- Added range search executable (#192).
- Adapted citations in documentation to BiBTeX; no citations in -h output (#195).
- Stop use of 'const char*' and prefer 'std::string' (#177).
- Support seeds for random numbers (#176).

- Initial release. See any resolved tickets numbered less than #196 or execute this query.