Installing TensorFlow on Your New Deep-Learning Rig

So, you’ve just built a beautiful new computing rig, equipped it with Linux and a high-performance GPU, and you’re eager to start training some neural nets. And after a few quick driver installations you’ll be up and running in no time, right?

If only it were so easy. In reality, installing the drivers you need to get started with TensorFlow and making everything work properly can be a long-winded quest, especially if you are new to Linux.  But that’s why I’ve written this walkthrough: to help save you the time and pain of having to jump through several flaming hoops yourself.  Below is a step-by-step account of how I set up TensorFlow on my own deep learning rig.

My deep-learning rig, featuring the NVIDIA GeForce GTX 1060.  Built on a budget, but effective!
My deep-learning rig, featuring the NVIDIA GeForce GTX 1060. Built on a budget, but effective!

Here are the software versions used in this guide:

  • Ubuntu 16.04 LTS
  • NVIDIA driver 390.25
  • CUDA 9.0
  • cuDNN 7.04
  • TensorFlow 1.6.0

By the time you’re reading this, one or more of these will have been superseded by a new update. But beware: the latest version of TensorFlow will not necessarily be compatible with the latest versions of CUDA and cuDNN. Irrespective of the version, if you’re having trouble with compatibility or installations, this walkthrough will help guide you to the promised land.  

What is TensorFlow?

TensorFlow is a deep learning API built by Google that allows you train neural networks on your powerful GPU, without needing to get your hands dirty with nitty-gritty C++. It provides quick and convenient tools for defining neural networks in Python and executing parallel computations using computational graphs, which represent operations on multi-dimensional arrays. It gives you access to powerful and time-saving community-developed tools, helps you make the most of your hardware, and will speed up your development process.

Before You Begin:

I’m assuming you’re starting with a fresh Ubuntu install. As a prerequisite, we’ll need to install some basic packages, as well as Python.

  1. Start with the following commands from the terminal:
    $  sudo apt-get update
    $  sudo apt-get upgrade
    $  sudo apt-get install -y build-essential g++ gfortran git libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev libblas-dev liblapack-dev libatlas-base-dev linux-headers-generic linux-image-extra-virtual zlib1g-dev libcurl3-dev
  2. For a lightweight Python installation, where you can add packages as you go:
    $  wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
    $  bash Miniconda3-latest-Linux-x86_64.sh
    $  source ~/.bashrc

    Let’s also install some key python data science packages:
    $  conda install numpy scipy matplotlib pandas scikit-learn

Installing NVIDIA GPU Drivers (v. 390.25) on Ubuntu 16.04

Sounds straightforward, right? But in reality, quite a few problems can crop up, thanks to interference from Ubuntu’s default ‘nouveau’ graphics kernel. Here’s the step-by-step walkthrough that will navigate you safely to a successful installation.

  1. Ensure any prior NVIDIA driver installations are removed to avoid conflict. If you’re starting with a clean Ubuntu install, you can skip this step. Otherwise, issue the following command from the terminal:
    $  sudo apt-get purge nvidia*

  2. Add this repository to access NVIDIA drivers that the community has built for compatibility with your Linux distribution (Ubuntu 16.04), but do not attempt an installation yet:
    $  sudo add-apt-repository ppa:graphics-drivers/ppa

    Also check the NVIDIA website and take note of the latest compatible driver version number for your GPU model (for me this was 390.25).

  3. Next check your existing video drivers:
    $  lspci -v | less

    On Ubuntu 16.04, you should find that your computer is running the nouveau kernel graphics driver by default. To install the NVIDIA drivers, we’ll need to blacklist nouveau and reboot into text mode (runlevel 3) to prevent the GUI from loading, as outlined in the following steps.

    Note: while the GUI is disabled, if at any point you find yourself at a blank screen with no visible terminal input, you can press ctrl+alt+F1 to access the terminal via teletype.

  4. To deal with issues surrounding the nouveau kernel, we need to blacklist it. Issue the following command from terminal to create and open a blacklist configuration file in VIM:
    $  sudo vi /etc/modprobe.d/blacklist-nouveau.conf

    In VIM, enter “insert mode” by pressing the “i” key, and add the following two lines:
    blacklist nouveau
    options nouveau modeset=0

    Then press escape to get back into command mode, type “:w” and press enter to save the file, then type “:q” and press enter to quit VIM.

  5. To reboot into runlevel 3 (text mode), issue the following terminal commands:
    $  sudo systemctl set-default multi-user.target
    $  sudo reboot

    When Ubuntu reloads, you’ll need to press Ctrl-Alt-F3 to access teletype. Enter your username (e.g. “rpm”) and login password, and you’ll be at the terminal again.

  6. Now in runlevel 3 with nouveau blacklisted, issue the following terminal commands to install the NVIDIA drivers (substituting the appropriate version number):
    $  sudo apt update
    $  sudo apt install nvidia-390

  7. Check your graphics drivers again to see whether you are now running the NVIDIA kernel, or if it’s still nouveau:
    $  lspci -v | less

    You should at least see nvidia drivers listed. If it still shows the nouveau kernel running, backtrack and ensure you didn’t make a mistake in the instructions for blacklisting it.

  8. Next, and very importantly, we need to update the “initial RAM file system” using:
    $  sudo update-initramfs -u

  9. Reboot back into GUI mode:
    $  sudo systemctl set-default graphical.target
    $  sudo reboot

  10. Check that the NVIDIA kernel is running:
    $  lspci -v | less

    If the installation was successful, you should also be able to execute the following command to display information about your GPU and its status:
    $  nvidia-smi

Installing CUDA 9.0:

In general, you should follow the official CUDA installation instructions supplied on the NVIDIA website, because these may change with updated versions. The following method worked for installing CUDA 9.0 on top of the NVIDIA 390.25 driver on Ubuntu 16.04 LTS.

  1. Before proceeding, double-check the CUDA compatibility of the latest version of TensorFlow. In my case, I first installed CUDA 9.1 only to later discover that I had to downgrade to CUDA 9.0 for full compatibility with TensorFlow (1.6.0).

  2.  From the NVIDIA website, download the latest runfile for the CUDA version you wish to install. Do not use the DEB distribution, as that may try to overwrite the NVIDIA drivers we just installed.

  3. As with the NVIDIA driver installation, we need to reboot into runlevel 3 (text mode) and have nouveau blacklisted as described earlier. To boot into runlevel 3, issue the terminal commands:
    $  sudo systemctl set-default multi-user.target
    $  sudo reboot

    When Ubuntu reloads, press Ctrl-Alt-F3 and enter your login information to retrieve the terminal.

  4. Now you can execute the runfile you downloaded in step 2 (use sudo chmod 777 to change the permissions if necessary) via:
    $ sudo sh cuda_*

    Accept the license, choose NO for driver installation, YES to toolkit installation, use DEFAULT location, YES to symbolic link creation, and you can say NO to the examples.

    In my case, the toolkit “successful installation” notice was followed by a scary-sounding message that at first caused some concern: “*WARNING: Installation incomplete! You must have [an NVIDIA driver of version X or higher…]”. However, this warning shows up merely because we chose to skip the runfile’s built-in NVIDIA driver installation option. Thus it’s just a reminder, not an error, and you can disregard it.

  5. The post-installation messages also give us instructions for adding the necessary PATH variables. In our case, this is achieved via (substituting the appropriate CUDA version number):
    $ export CUDA_HOME=/usr/local/cuda-9.0
    $ export PATH=${CUDA_HOME}/bin:${PATH}
    $ export LD_LIBRARY_PATH=${CUDA_HOME}/lib64

    Importantly, to make these PATH variable additions permanent, you need to add them to your .bashrc file, otherwise they’ll expire after you close this terminal session.

    For me, I opened the file using nano text editor as follows:
    $  nano ~/.bashrc

    Then I just appended the file with the following lines:
    export PATH=/usr/local/cuda-9.0/bin:${PATH}
    export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64

    Note that in this case, I’m creating LD_LIBRARY_PATH. If you wish to later add directories to this, append them like so:
    $  export LD_LIBRARY_PATH=/usr/local/new-library-directory:${LD_LIBRARY_PATH}.

  6. Next reboot into GUI mode using:
    $  sudo systemctl set-default graphical.target
    $  sudo reboot

  7. NVIDIA has several post-installation instructions, the first of which is “check that the device files/dev/nvidia* exist and have the correct (0666) file permissions”. Note that this permission is listed by ls – l as “crw-rw-rw”:
    $  cd /dev/
    $  ls -l | less

    For me, I get the files nvidia0, nvidiaactl, nvidia-modeset, and nvidia-uvm, all with “crw-rw-rw” permissions. Note that the “c” (rather than d) means this is a “character device”, rather than a “directory” device “d”, or a block device “b”. If files DO NOT exist, then the online NVIDIA CUDA installation guide provides a bash script you can use to install them.

    Other MANDATORY actions include updating the PATH and LD_LIBRARY_PATH variables, which we already did in an earlier step.

    For CUDA 9.0 and later, we also need to do a “POWER9” Setup. But it turns out that this is only for “POWER9” users, which is for a high-end server CPU made by IBM. So if you don’t know what it is, chances are you don’t use it! And if you don’t have POWER9, you can skip all the steps pertaining to it.

  8. Finally, for future reference, you can check which version of CUDA you have installed using:
    $  cat /usr/local/cuda/version.txt
Installing cuDNN 7.0.4:
  1. Before proceeding, again check the compatibility requirements for the version of TensorFlow you intend to install. I originally installed cuDNN version 7.1.1, and encountered issues when working with convolutional neural networks which required me to downgrade to 7.0.4.

  2. Next, you must register an account at https://developer.nvidia.com/cudnn. Then you can go to https://developer.nvidia.com/rdp/cudnn-download to get both the Runtime Library and the Developer Library, plus related documentation. Note that, apparently, the installers for Ubuntu14.04 work just fine for Ubuntu 16.04.

  3. Fortunately this next part is quite easy. To install these, simply issue the following terminal commands (in this order) from the directory you’ve saved your files to:
    $  sudo dpkg -i libcudnn7_7.1.1*.deb
    $  sudo dpkg -i libcudnn7-dev*.deb
    $  sudo dpkg -i libcudnn7-doc*.deb

  4. Next, NVIDIA recommends we test cuDNN by running one of their examples. Execute the following in terminal:
    $  cp -r /usr/src/cudnn_samples_v7/ $HOME
    $  cd $HOME/cudnn_samples_v7/mnistCUDNN
    $  make clean && make
    $  ./mnistCUDNN

    You should see output that ends with “Test passed!”

    If it fails, saying it cannot find the correct library file (specifically, libcudart.so.9), double-check that you indeed have added the following lines to your .bashrc file:
    $  export PATH=/usr/local/cuda-9.0/bin:${PATH}
    $  export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64

    Then everything should work!

Installing TensorFlow 1.6.0:

Get ready for this.

  1. From terminal:
    $ pip install tensorflow-gpu

That’s it! One-line installation! Of course, this assumes you’ve followed all of the previous instructions, and have both pip and python installed.

If you want to check that tensorflow is working, launch python:
     $  python3

And type:
>>> import tensorflow as tf
>>> tf.__version__
>>> tf.test.gpu_device_name()

These commands should produce sensible results, giving you the TensorFlow version, and showing your GPU device.

Bonus: Installing PyTorch:

Again a single-line installation assuming you’ve followed all the previous steps:
     $  conda install pytorch torchvision cuda90 -c pytorch

And there you have it! Welcome, brave adventurer, you are now ready to take your deep learning journey to the next level!