A Tour of Machine Learning in Python

How to perform exploratory data analysis and build a machine learning pipeline.

Thumbnail-AI-Titanic

A Tour of Machine Learning in Python

How to perform exploratory data analysis and build a machine learning pipeline.

Thumbnail-AI-Titanic

In this tutorial I demonstrate key elements and design approaches that go into building a well-performing machine learning pipeline. The topics I’ll cover include:

  1. Exploratory Data Analysis and Feature Engineering.
  2. Data Pre-Processing including cleaning and feature standardization.
  3. Dimensionality Reduction with Principal Component Analysis and Recursive Feature Elimination.
  4. Classifier Optimization via hyperparameter tuning and Validation Curves.
  5. Building a more powerful classifier through Ensemble Voting and Stacking.

Along the way we’ll be using several important Python libraries, including scikit-learn and pandas, as well as seaborne for data visualization.

Our task in this tutorial is a binary classification problem inspired by Kaggle’s “Getting Started” competition, Titanic: Machine Learning from Disaster. The goal is to accurately predict whether a passenger survived or perished during the Titanic’s sinking, based on data such as passenger age, class, and sex. The training and test datasets are provided here.

I have chosen here to focus on the fundamentals that should be a part of every data scientist’s toolkit. The topics covered should provide a solid foundation for launching into more advanced machine learning approaches, such as Deep Learning. For an intro to Deep Learning, see my notebook on building a Convolutional Neural Network with Google’s TensorFlow API.

Mobile Users:

Notes:

  • This IPython notebook is best viewed using Google Chrome; some images and hyperlinks may not work in Mozilla FireFox.
  • To download the source code, which you can edit and execute yourself, save this link (.ipynb file extension).

Check out some of my past projects!

I’ve worked on technical projects in a variety of fields. Here are some highlights:

Integrated Photonics

Empowering next-generation optical technologies.

Chip-Based Medical Biosensors

Merging engineering with biochemistry.

Quantum Computing

My time as a "quantum coder".

3D Medical Radiation Dose Mapping

Pioneering a new technique for radiation treatment calibration.