[Home] [Articles, Categories, Tags] [Books, Quotes]
Python - Deeper Insights into Machine Learning
Author:
Pub Year:
Source:
Read: 2018-04-08
Last Update: 2018-04-08

Five Sentence Abstract:

This is a straightforward, no nonsense book, or three, that will allow you to go from zero understanding of machine learning to quite advanced, assuming you put in the required time and effort. The code provided is detailed, extremely well explained, often line-by-line, and gives you a solid intuitive basis that will make your journey into the more advanced areas more concrete. The book starts slowly and then moves forward at a steady, but more than manageable, pace, building on the previous, often simplified examples using SKLearn, to reach more modern and advanced techniques using libraries such as Theano and Keras to leverage your GPU. It lacks some of the detail you might want when it comes to mathematical proofs, but it does include plenty math to whet your appetite; supplementing this book with a follow up with an appropriate linear algebra/algorithms math book would probably be ideal. Lastly, at the end of each chapter in the third module there are numerous resources provided for further study in regard to the more advanced topics.

Thoughts:

Overall a great book. Would recommend to anyone interested in learning about machine learning.

It is basically 3 books in 1, separated, unsurprisingly, into beginner, intermediate, and advanced modules.

The detailed table of contents wins points with me every time.

The first book holds your hand and offers a very nice, slow, place to start. Compared to the other pair of beginner books I've read on the subject, this one was far superior.

All of the code works, which I can't say about the other books, and there is often a line-by-line explanation following each snippet.

While there isn't much in the way of real mathematical proofs, there is still plenty of math and what the underlying algorithms look like and do. Any understanding will help over simply using sklearn blindly. This said, after you have some intuitive understanding of what to expect when using various algorithms and libraries, you will be better equipped to tackle a more detailed and abstract exploration of the underlying mathematical underpinnings.

The second book goes back over the same ideas as the first book, but in more detail and with added depth. Great reinforcement learning. Practice, practice, practice!

Book three is considerably more advanced. It assumes you have the stuff from books one and two down pat. It uses the Theano and Keras libraries, which I didn't have installed so I didn't play with the code. The little bit of the code I did experiment with had numerous errors. This module included more advanced topics than I was ready for, but it was interesting none-the-less to expand my grammar for now.

Finally, there are tons of references to follow up and expand your understanding of whatever topic may suit your fancy. Again, all in all a great place to start if combined with another more mathematically oriented book.

Other

I would like a more detailed explanation of what is going on here, but, since it is matplotlib that I am not understanding, the lack of explanation can be forgiven. It plots the colored regions of the plots; I'm not sure what the meshgrid function is doing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# plot the decision surface
x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
                       np.arange(x2_min, x2_max, resolution))
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())

Links

Code Examples

01 perceptron.py
02_adaline.py
03_adalineSGD.py
04_sklearn_perceptron.py
05_sigmoid.py
06_logistic_regression.py
07_svm.py
08_kernal_svm.py
09_decision_tree.py
10_knn.py
11_missing_data.py
12_ordinal_nominal_features.py
13_feature_engineering.py
14_dimensionality_reduction.py
15_PCA.py
16_LDA.py
17_radial_bias_func_kernel_PCA.py
18_RBF_kernel_PCA_2.py
19_pipeline.py
20_diagnosing_bias_and_variance.py
21_grid_search.py
22_nested_cross_validation.py
23_confusion_matrix.py
24_performance_metrics.py
25_ensemble.py
26_bagging.py
27_boosting.py
28_imdb_to_csv.py
29_bag_of_words.py
30_imdb_processing.py
31_minibatch.py
32_housing.py
33_RANSAC_outliers.py
34_evaluating_linear_regression.py
35_polynomial_features.py
36_polynomial_features_housing.py
37_random_tree.py
38_k_means.py
39_clusters_heirarchical_tree.py
40_DBSCAN.py
41_Multi-layer_perceptron.py
41_mlp.pkl
42_logistic_recap.py
43_plot_sigmoids.py
44_part_2_demos.py
45_loading_image_data.py
46_part_2_batch_gradiant_descent.py
47_one_versus.py
48_imputer.py
49_adding_polynomial_complexity.py
50_PCA.py
51_ensembles.py
52_voting_ensemble.py
53_extra_tree_faces.py
54_adaboosting.py
55_gradient_tree_boosting.py
56_performance_estimation.py
57_grid_search.py
58_learning_curve.py
59_recommendation_system.py
60_PCA.py
61_k_clustering.py
62_collinearity_eigenvalues.py
63_bagging.py
64_randomForest.py
65_jitterTest.py

Further Reading

Exceptional Excerpts:

This [The expression of emotion in 20th century books (Acerbi et al, 2013)] study is interesting for several reasons. Firstly, it is an example of data-driven science, where previously considered soft sciences, such as sociology and anthropology, are given a solid empirical footing.

For data to become information, it requires some meaningful structure.

fill in the missing values through a process of imputation. For classification, we can simply use the statistics of the mean, median, and mode over the observed features to impute the missing values.

Many machine learning algorithms require that features are standardized. This means that they will work best when the individual features look more or less like normally distributed data with near-zero mean and unit variance. The easiest way to do this is by subtracting the mean value from each feature and scaling it by dividing by the standard deviation. This can be achieved by the scale() function or the standardScaler() function in the sklearn.preprocessing() function. Although these functions will accept sparse data, they probably should not be used in such situations because centering sparse data would likely destroy its structure. It is recommended to use the MacAbsScaler() or maxabs_scale() function in these cases. The former scales and translates each feature individually by its maximum absolute value. The latter scales each feature individually to a range of [-1,1]. Another specific case is when we have outliers in the data. In these cases using the robust_scale() or RobustScaler() function is recommended.

Bagging is primarily a variance reduction technique and boosting is primarily a bias reduction technique.

GoogLeNet was designed to tackle computer vision challenges involving Internet-quality image data, that is, images that have been captured in real contexts where the pose, lighting, occlusion, and clutter of images vary significantly. GoogLeNet was applied to the 2014 ImageNet challenge with noteworthy success, achieving only 6.7% error rate on the test dataset. ImageNet images are small, high-granularity images taken from many, varied classes.

Notes:

Table of Contents

Module 1: Python Machine Learning

01: Giving Computers the Ability to Learn from Data
02: Training Machine Learning Algorithms for Classification
03: A Tour of Machine Learning Classifiers Using Scikit-learn
04: Building Good Training Sets – Data Preprocessing
05: Compressing Data via Dimensionality Reduction
06: Learning Best Practices for Model Evaluation and Hyperparameter Tuning
07: Combining Different Models for Ensemble Learning
08: Applying Machine Learning to Sentiment Analysis
09: Embedding a Machine Learning Model into a Web Application
10: Predicting Continuous Target Variables with Regression Analysis
11: Working with Unlabeled Data – Clustering Analysis
12: Training Artificial Neural Networks for Image Recognition
13: Parallelizing Neural Network Training with Theano

Module 2: Designing Machine Learning Systems with Python

01: Thinking in Machine Learning
02: Tools and Techniques
03: Turning Data into Information
04: Models – Learning from Information
05: Linear Models
06: Neural Networks
07: Features – How Algorithms See the World
08: Learning with Ensembles
09: Design Strategies and Case Studies

Module 3: Advanced Machine Learning with Python

01: Unsupervised Machine Learning
02: Deep Belief Networks
03: Stacked Denoising Autoencoders
04: Convolutional Neural Networks
05: Semi-Supervised Learning
06: Text Feature Engineering
07: Feature Engineering Part II
08: Ensemble Methods
09: Additional Python Machine Learning Tools

Module 1: Python Machine Learning

01: Giving Computers the Ability to Learn from Data

page 004:

page 010:

page 012:

page 14:

02: Training Machine Learning Algorithms for Classification

page 24:
page 25:
page 026:

page 035:

page 36:
page 44:
page 45:

03: A Tour of Machine Learning Classifiers Using Scikit-learn

page 51:
page 52:
1
2
3
4
5
1.   Selection of features.
2.   Choosing a performance metric.
3.   Choosing a classifier and optimization algorithm.
4.   Evaluating the performance of the model.
5.   Tuning the algorithm.
page 55:
1
2
if misclassification error = 0.089
1 - misclassification error = 0.911 or 91.1 percent.
page 60:

page 61:
page 68:
page 70:
page 71:

page 074:

page 76:
1
2
3
4
>>> from sklearn.linear_model import SGDClassifier
>>> ppn = SGDClassifier(loss='perceptron')
>>> lr = SGDClassifier(loss='log')
>>> svm = SGDClassifier(loss='hinge')
page 078:

page 79:

One of the most widely used kernels is the Radial Basis Function kernel (RBF kernel) or Gaussian kernel

page 84:
page 85:
page 86:
page 94:
page 95:

page 96:
page 98:

04: Building Good Training Sets – Data Preprocessing

page 103:
1
2
3
4
>>> df.values
array([[ 1.,  2.,  3.,  4.],
       [ 5.,  6., nan,  8.],
       [10., 11., 12., nan]])
1
2
3
>>> df.dropna()
A B C D
0 1 2 3 4
1
2
3
4
5
>>> df.dropna(axis=1)
A B
0 1 2
1 5 6
2 10 11
1
2
3
4
5
6
# only drop rows where all columns are NaN
>>> df.dropna(how='all')
# drop rows that have not at least 4 non-NaN values
>>> df.dropna(thresh=4)
# only drop rows where NaN appear in specific columns (here: 'C')
>>> df.dropna(subset=['C'])
page 104:
page 105:

page 106:
page 112:

page 113:

page 120:
page 128:

05: Compressing Data via Dimensionality Reduction

page 130:

page 131:

page 133:
page 140:
page 141:
page 151:

page 153:
page 154:

page 169:

06: Learning Best Practices for Model Evaluation and Hyperparameter Tuning

page 174:

page 176:

page 177:
page 178:

page 179:
page 182:

page 183:
page 187:
page 188:
page 189:
page 190:

page 192:

page 193:

page 194:

page 195:
1
2
3
4
5
6
>>> from sklearn.metrics import make_scorer, f1_score
>>> scorer = make_scorer(f1_score, pos_label=0)
>>> gs = GridSearchCV(estimator=pipe_svc,
... param_grid=param_grid,
... scoring=scorer,
... cv=10)
page 199:

07: Combining Different Models for Ensemble Learning

page 202:

page 203:

page 207:

page 212:
page 221:

page 222:

page 226:
1
2
3
4
5
6
7
8
1. Draw a random subset of training samples d1 without replacement from the
training set D to train a weak learner C1.
2. Draw second random training subset d 2 without replacement from the training
set and add 50 percent of the samples that were previously misclassified to
train a weak learner C2.
3.   Find the training samples d 3 in the training set D on which C 1 and C2
disagree to train a third weak learner C3.
4.   Combine the weak learners C1 , C2 , and C3 via majority voting.
page 227:

08: Applying Machine Learning to Sentiment Analysis

page 238:
1
2
3
4
5
1.  We create a vocabulary of unique tokens—for example, words—from the entire
set of documents.

2.  We construct a feature vector from each document that contains the counts
of how often each word occurs in the particular document.
page 239:
page 240:

page 245:
page 249:

09: Embedding a Machine Learning Model into a Web Application

10: Predicting Continuous Target Variables with Regression Analysis

page 279:
page 280:

page 287:
page 293:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
1. Select a random number of samples to be inliers and fit the model.

2. Test all other data points against the fitted model and add those points
that fall within a user-given tolerance to the inliers.

3. Refit the model using all inliers.

4. Estimate the error of the fitted model versus the inliers.

5. Terminate the algorithm if the performance meets a certain user-defined
threshold or if a fixed number of iterations has been reached; go back to step
1 otherwise.
page 300:
1
2
>>> from sklearn.linear_model import Ridge
>>> ridge = Ridge(alpha=1.0)
1
2
>>> from sklearn.linear_model import Lasso
>>> lasso = Lasso(alpha=1.0)
1
2
>>> from sklearn.linear_model import ElasticNet
>>> lasso = ElasticNet(alpha=1.0, l1_ratio=0.5)

page 306:
page 307:

11: Working with Unlabeled Data – Clustering Analysis

page 314:
page 328:

page 331:
1
2
3
4
>>> from scipy.cluster.hierarchy import linkage
>>> row_clusters = linkage(row_dist,
                           method='complete',
                           metric='euclidean')
1
2
>>> row_clusters = linkage(pdist(df, metric='euclidean'),
                           method='complete')
1
2
3
>>> row_clusters = linkage(df.values,
                           method='complete',
                           metric='euclidean')
page 336:
page 337:

12: Training Artificial Neural Networks for Image Recognition

page 347:

page 349:

page 351:
page 353:
page 373:

page 375:
page 383:
page 385:
page 386:
1
2
3
Pylearn2 (http://deeplearning.net/software/pylearn2/)
Lasagne (https://lasagne.readthedocs.org/en/latest/)
Keras (http://keras.io)

13: Parallelizing Neural Network Training with Theano

page 392:
page 393:
page 394:

// Can not get Theano to install, Scipy was installed via apt and uninstalling it will also uninstall a number of other programs. So, I skimmed the rest of this chapter. It seems straight forward enough.

page 403:
page 404:

page 406:

page 407:

page 408:
page 410:

page 415:
page 417:

01: Thinking in Machine Learning

page 424:

page 429:

page 446:

page 448:
page 451:

02: Tools and Techniques

page 458:
page 475:
page 477:
page 478:
page 479:

03: Turning Data into Information

page 483:

page 484:
page 485:
page 487:

page 489:

page 490:

page 491:

page 492:

page 501:

page 503:

04: Models – Learning from Information

05: Linear Models

page 529:
page 536:
1
w = (X^T X)^-1 * X^T y
page 538:
page 544:
page 545:
page 546:

page 547:

06: Neural Networks

page 549:
page 551:

07: Features – How Algorithms See the World

page 569:
page 570:

page 582:

08: Learning with Ensembles

page 589:

page 601:
page 602:

09: Design Strategies and Case Studies

page 605:
page 606:
page 612:

Module 3: Advanced Machine Learning with Python

01: Unsupervised Machine Learning

page 631:
page 634:
page 641:

02: Deep Belief Networks

page 658:
page 660:

page 661:

page 664:

page 679:

03: Stacked Denoising Autoencoders

page 688:
page 689:

page 691;

04: Convolutional Neural Networks

page 708:
page 709:
page 712:

page 720:

05: Semi-Supervised Learning

06: Text Feature Engineering

page 759:
page 769:

page 770:

07: Feature Engineering Part II

page 798:
page 805:

08: Ensemble Methods

page 862:

09: Additional Python Machine Learning Tools

page 866:
page 869:










[About] [Contact]