A Tour of Machine Learning in Python

How to perform exploratory data analysis and build a machine learning pipeline.

A Tour of Machine Learning in Python

How to perform exploratory data analysis and build a machine learning pipeline.

In this tutorial I demonstrate key elements and design approaches that go into building a well-performing machine learning pipeline. The topics I’ll cover include:

Exploratory Data Analysis and Feature Engineering.
Data Pre-Processing including cleaning and feature standardization.
Dimensionality Reduction with Principal Component Analysis and Recursive Feature Elimination.
Classifier Optimization via hyperparameter tuning and Validation Curves.
Building a more powerful classifier through Ensemble Voting and Stacking.

Along the way we’ll be using several important Python libraries, including scikit-learn and pandas, as well as seaborne for data visualization.

Our task in this tutorial is a binary classification problem inspired by Kaggle’s “Getting Started” competition, Titanic: Machine Learning from Disaster. The goal is to accurately predict whether a passenger survived or perished during the Titanic’s sinking, based on data such as passenger age, class, and sex. The training and test datasets are provided here.

I have chosen here to focus on the fundamentals that should be a part of every data scientist’s toolkit. The topics covered should provide a solid foundation for launching into more advanced machine learning approaches, such as Deep Learning. For an intro to Deep Learning, see my notebook on building a Convolutional Neural Network with Google’s TensorFlow API.

Mobile Users:

Click to View Full Notebook

View Notebook Highlights

Notes:

This IPython notebook is best viewed using Google Chrome; some images and hyperlinks may not work in Mozilla FireFox.
To download the source code, which you can edit and execute yourself, save this link (.ipynb file extension).

Check out some of my past projects!

I’ve worked on technical projects in a variety of fields. Here are some highlights:

A Tour of Machine Learning in Python

A Tour of Machine Learning in Python

Check out some of my past projects!

Integrated Photonics

Chip-Based Medical Biosensors

Quantum Computing

3D Medical Radiation Dose Mapping