mlpack IRC logs, 2018-07-21

Logs for the day 2018-07-21 (starts at 0:00 UTC) are shown below.

July 2018
Sun
Mon
Tue
Wed
Thu
Fri
Sat
1
2
3
4
5
6
7
8
9
10
11
12
13
21
22
23
24
25
26
27
--- Log opened Sat Jul 21 00:00:55 2018
01:57 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Read error: Connection reset by peer]
02:00 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
02:03 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Read error: Connection reset by peer]
02:06 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
02:12 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Ping timeout: 240 seconds]
05:52 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
06:24 < Atharva> zoq:https://github.com/mlpack/mlpack/blob/3b7bbf0f14172cdb00fd16cbf12918b07c888b96/src/mlpack/methods/ann/layer/sequential_impl.hpp#L75
06:25 < Atharva> What is the reason for setting reset to true here?
06:27 < Atharva> It causes a problem in the case when in the sequential layer, the first layer is linear and the second is convolutional. This sets the height and width of convolutional to 0.
06:40 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
07:24 -!- vivekp [~vivek@unaffiliated/vivekp] has quit [Read error: Connection reset by peer]
07:26 -!- vivekp [~vivek@unaffiliated/vivekp] has joined #mlpack
12:54 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Read error: Connection reset by peer]
14:15 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
14:18 -!- travis-ci [~travis-ci@ec2-54-197-172-151.compute-1.amazonaws.com] has joined #mlpack
14:18 < travis-ci> manish7294/mlpack#73 (impBounds - b7e25ab : Manish): The build is still failing.
14:18 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/74236a6bd37b...b7e25aba5e6d
14:18 < travis-ci> Build details : https://travis-ci.com/manish7294/mlpack/builds/79640360
14:18 -!- travis-ci [~travis-ci@ec2-54-197-172-151.compute-1.amazonaws.com] has left #mlpack []
14:19 -!- travis-ci [~travis-ci@ec2-54-91-242-195.compute-1.amazonaws.com] has joined #mlpack
14:19 < travis-ci> manish7294/mlpack#8 (impBounds - b7e25ab : Manish): The build is still failing.
14:19 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/74236a6bd37b...b7e25aba5e6d
14:19 < travis-ci> Build details : https://travis-ci.org/manish7294/mlpack/builds/406585242
14:19 -!- travis-ci [~travis-ci@ec2-54-91-242-195.compute-1.amazonaws.com] has left #mlpack []
14:22 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Read error: Connection reset by peer]
14:30 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has joined #mlpack
14:59 < ShikharJ> zoq: Kris had mentioned that with SSRBM on the digits dataset, the accuracy he obtained was about 82%, so we're good on that number. But with BinaryRBM he had mentioned an accuracy of 86% (we are at about 70% now). I'm unsure how he obtained a number that high, I'll probably look for the comments in the PR as well.
16:17 -!- travis-ci [~travis-ci@ec2-54-226-1-75.compute-1.amazonaws.com] has joined #mlpack
16:17 < travis-ci> manish7294/mlpack#74 (Patch - 23fd8b1 : Manish): The build was fixed.
16:17 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/3bd4acfdf3c3...23fd8b1d7de5
16:17 < travis-ci> Build details : https://travis-ci.com/manish7294/mlpack/builds/79642747
16:17 -!- travis-ci [~travis-ci@ec2-54-226-1-75.compute-1.amazonaws.com] has left #mlpack []
16:17 -!- travis-ci [~travis-ci@ec2-54-80-247-144.compute-1.amazonaws.com] has joined #mlpack
16:17 < travis-ci> manish7294/mlpack#10 (Patch - dfff872 : Manish): The build was fixed.
16:17 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/23fd8b1d7de5...dfff872a421a
16:17 < travis-ci> Build details : https://travis-ci.org/manish7294/mlpack/builds/406613209
16:17 -!- travis-ci [~travis-ci@ec2-54-80-247-144.compute-1.amazonaws.com] has left #mlpack []
16:40 < ShikharJ> zoq: Ah, okay this is embarrassing, I just had to reduce the stepSize a bit, and we're hitting ~80% accuracy on BinaryRBM as well. It could be because we're taking mini-batches, and a larger stepSize would suit a single input batch better. Please review whenever free.
17:05 -!- travis-ci [~travis-ci@ec2-54-162-168-233.compute-1.amazonaws.com] has joined #mlpack
17:05 < travis-ci> manish7294/mlpack#75 (Patch - dfff872 : Manish): The build was fixed.
17:05 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/23fd8b1d7de5...dfff872a421a
17:05 < travis-ci> Build details : https://travis-ci.com/manish7294/mlpack/builds/79642787
17:05 -!- travis-ci [~travis-ci@ec2-54-162-168-233.compute-1.amazonaws.com] has left #mlpack []
18:57 -!- navdeep [49a5fcc8@gateway/web/freenode/ip.73.165.252.200] has joined #mlpack
19:01 < navdeep> Hi I trained a model using sklearn random_forest and got accuracy of ~80% on test data. Using the same num_of_trees and and minimum_leaf_size when I trained on same data in mlpack random_forest, I got accuracy 68% accuracy on same test data. Any reason why that'd happen?
19:12 < ShikharJ> navdeep: I can't really say why you'd be getting a lower score, but it'd really be helpful if you could provide the scripts you used for the sklearn and random_forest code. That way, it is easier for us to ascertain.
19:13 < navdeep> tuned_parameters = {'min_samples_leaf': range(2,16,2), 'n_estimators' : range(50,250,50), 'min_samples_split': range(2,16,2) }
19:13 < navdeep> rf_model= GridSearchCV(model_rf, tuned_parameters, cv=5, scoring='accuracy', n_jobs= -1)
19:14 < navdeep> rf_model.fit(X_train.values, y_train['label'])
19:14 < navdeep> model_rf = RandomForestClassifier()
19:15 < navdeep> this is what I get as best parameters: min_samples_leaf=2, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=150
19:16 < navdeep> So I use mlpack like this:
19:16 < navdeep> mlpack.random_forest(training=X_train, labels=y_train['label'], print_training_accuracy=True, num_trees=150, minimum_leaf_size=2, verbose=True )
21:02 < rcurtin> navdeep: note that the accuracy depends on the threshold that you use; do you try making an ROC curve to compare the models or anything?
21:03 < rcurtin> also I'd expect minimum_leaf_size == 1 to give the best performance
21:09 < navdeep> I haven't drawn RoC curve yet
21:10 < navdeep> What do you mean by threshold?
21:28 < navdeep> only available input parameters are:
21:28 < navdeep> - copy_all_inputs (bool), - input_model (RandomForestModelType) - labels - minimum_leaf_size (int) - num_trees (int) - print_training_accuracy (bool) - test (matrix) - test_labels (row vector) - training (matrix) - verbose (bool):
21:29 < navdeep> rcurtin: I was reading this article http://lists.mlpack.org/pipermail/mlpack/2018-May/003752.html seems to be written by you, but how do you set threshold in api?
21:29 < navdeep> @rcurtin
22:23 -!- ImQ009 [~ImQ009@unaffiliated/imq009] has quit [Quit: Leaving]
22:37 < rcurtin> navdeep: sorry, I stepped out and can give a better response later
22:37 < rcurtin> but in essence use the Predict() overload that returns class probabilities then classify based on that
22:51 -!- navdeep [49a5fcc8@gateway/web/freenode/ip.73.165.252.200] has quit [Ping timeout: 252 seconds]
23:12 -!- navdeep [324d5171@gateway/web/freenode/ip.50.77.81.113] has joined #mlpack
23:13 < navdeep> rcurtin: I am using probability overload one only. My question is still though why same algorithm returns different result for sklearn vs mlpack
23:39 < rcurtin> navdeep: there are a couple things
23:39 < rcurtin> first like I said the accuracy depends on the threshold so to compare these correctly you should look at ROC curves
23:39 < rcurtin> second there are minor implementation differences that could make a difference
23:40 < rcurtin> I see that in scikit, they take max_features = sqrt(dimensions)
23:40 < rcurtin> I see that mlpack's implementation uses a default of 3 that is not easy to change unless you write C++
23:41 < rcurtin> so for sure an option should be added for that and I will try to ensure that I do that this week (Monday perhaps)
23:41 < rcurtin> but that may or may not be making the difference here. an ROC curve would show more
--- Log closed Sun Jul 22 00:00:57 2018