Machine Learning – Setup

I am constantly reading and experimenting to learn things which I may apply to work projects. A few years ago I decided to spent time on and off learning machine learning (ML). With that purpose in mind, I got a number of books on different subjects (e.g., Deep Learning, Python and Statistics) which seemed to be useful to achieve my goal.

On the platform side, I started experimenting on Windows. Most things work fine but some things do not. For example, I had to wait to use Tensorflow because it ran on Linux but not on Windows. Today it seems to work on both. The same seems to hold true for Docker.

On a future post, I will touch on some dedicated hardware that I purchased in order to extend my understanding and abilities in what generally is referred to as “Artificial Intelligence”, AI for short.

Like I said, I have purchased and read some books, I also read several articles a day on AI and when available I watch webinars from the ACM. Becoming a member of the ACM is something all individuals that have studied and work on different aspects of software development (e.g., architecture, coding, computer science, or design) should consider joining.

In this post I will go over some notes that I took and continue to take, while learning and experimenting with AI. For context reasons I start with the hardware and platform.

A couple years ago I purchased a DELL computer which I will use for most of the posts regarding AI and Machine Learning (ML). At the time I was interested in getting familiar with CentOS which is a Linux distribution. It happens that I know the owner of Advanced Productivity (AP), a company in the Twin Cities of Minneapolis and St. Paul that among other items and services sells refurbished DELL equipment. The company also supports and maintains what they sell. Not that I am patronizing DELL, but at home I have about a dozen computers some of which I have been using for a decade or so. So far I have not had a hardware failure.

If I have time; I do not mind performing hardware updates (e.g., increase memory capacity, add or replace disks) on the equipment. In this specific case, my machine was initially running an early version of CentOS 6. I updated the Linux version several times. Before starting on this personal project, I wanted to have installed CentOS 7 and some additional memory. I asked AP to have the Linux software and memory updated.

A simple way to verify the amount of memory available in your Linux system is to open a console and enter the following:

$ free
              total        used        free      shared  buff/cache   available
Mem:       24518532     1443992    21828776       31980     1245764    22594072
Swap:       4063228           0     4063228

The total (and in my case the maximum for this desktop) amount of memory is 24 GBs.

To verify the version of Linux you may try:

$ uname -a
Linux localhost.localdomain 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

The issue with this command is that it does not show the flavor of Linux. For that you could use the following:

$ cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)

At this point in time my machine is running CentOS 7.4

As I mentioned earlier, I will be using Python on this set of posts. I needed to determine if Python came with the Linux distribution (distro for short). For this I used the following command:

[johncanessa@localhost Documents]$ which python
/usr/bin/python

Apparently the distro has a version of Python. On a side note, you can tell that the original default prompt on the console is:

[johncanessa@localhost Documents]$

For my taste the default prompt is too long. I will address it shortly in this post.

So far we know that Python came with the Linux distro, but we do not know which version. To get the version we could use the following command:

[johncanessa@localhost Documents]$ python -V
Python 2.7.5

There are two main versions of Python in circulation; one is 2.x and the more recent is 3.x. There have been several enhancements to the 3.x version. So why do not update to version 3.x and be done? There are many scripts / programs in the public domain that were written for 2.x versions. Because of this, you may want to maintain several versions of Python in your machine. In my case I will maintain the 2.x and will add 3.x later in this post.

You can run Python interactively. The idea is that you can try different commands without the need to write scripts. To start an interactive session you can do it by entering the following command:

[johncanessa@localhost Documents]$ python
Python 2.7.5 (default, Aug  4 2017, 00:39:18) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

Before we continue I decided to switch the Linux shell. The reason is that I use at work the Korn (ksh). To determine which shell is configured by default one can use the following command:

[johncanessa@localhost ~]$ echo $0
bash

I seems like bash is the default shell. I have been working with different flavors of UNIX (e.g., BSD, IRIX, Minix and Solaris among others) and Linux for a few decades. During that time different shells became popular and with time faded. Among the top ones I can recall are:  Bash, Bourne, C, and Korn shells.

To determine which shells are installed in our Linux distro one can use the following command:

[johncanessa@localhost ~]$ cat /etc/shells
/bin/sh
/bin/bash
/sbin/nologin
/usr/bin/sh
/usr/bin/bash
/usr/sbin/nologin
/bin/tcsh
/bin/csh

It seems that ksh is not included as an option. We will have to download, install, and then figure out how to make it the default shell.

To install ksh we can use the following:

[johncanessa@localhost ~]$ sudo yum install ksh
[sudo] password for johncanessa:

I used the sudo command to run the yum installer as root. I could have switched users to root, but that is typically not recommended. The reason is that you may forget that you are running with privileges and may accidentally run a command from a wrong location that may affect other users or the actual root account.

After installing ksh we can verify that it is available issuing the following command:

[johncanessa@localhost ~]$ cat /etc/shells
/bin/sh
/bin/bash
/sbin/nologin
/usr/bin/sh
/usr/bin/bash
/usr/sbin/nologin
/bin/tcsh
/bin/csh
/bin/ksh	<==== Korn shell now available
/bin/rksh

As you can see with the new labeled entry, we have the just installed available for our picking.

To change the shell one could edit some system level files or issue the following:

[johncanessa@localhost ~]$ su
Password: 
[root@localhost johncanessa]# chsh -s /bin/ksh johncanessa
Changing shell for johncanessa.
Shell changed.

Note that in this case I used the su command. As you can tell the prompt changed from a $ to a # character. To exit root just enter the following command:

[root@localhost johncanessa]# exit
exit
$

The exit command lets you return to the previous user. In this case it me running at user level. Note how the prompt switched back to the $ symbol.

On Linux there are several different editors available. During the years I have seen a few coming and going. The two workhorses in Linux / UNIX are Emacs and vi. The original vi has been enhanced with vim and gvim. You can still find and use vi but I prefer to use gvim.

You can set many options for gvim using the ~/.gvimrc configuration file. By default, gvim uses 8 space tabs. Given that modern software may use many indentation levels, it is typical to set the indentation to only 4 spaces. This can be accomplished by appending to your existing (or creating a new) ~/.gvimrc file the following line:

set tabstop=4

I like to use the arrow keys on the keyboard to scroll up and down for commands that I have issued.  To get this done issue the following commands;

$ cd
$ gvim .kshrc

Then append the following lines to the .kshrc file:

# **** ENABLE arrows on console ****
set -o emacs

I always like to enter comments so I can remember what I have done and to be able to share work with colleagues. You can logout and back in or you can source your new file. To source it enter the following:

$ . ~/.kshrc

Anaconda is a typical distribution used with machine learning and Python in general. The nice thing about it is that it comes with a very nice selection of useful commands / utilities. To download Anaconda use the following commands:

$ cd
$ wget https://repo.continuum.io/archive/Anaconda3-4.3.0-Linux-x86_64.sh

To install Anaconda, will use the following command:

$ bash Anaconda3-4.3.0-Linux-x86_64.sh
:::: :::: ::::
Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/johncanessa/.bashrc ? [yes|no]
[no] >>> yes

Prepending PATH=/home/johncanessa/anaconda3/bin to PATH in /home/johncanessa/.bashrc
A backup will be made to: /home/johncanessa/.bashrc-anaconda3.bak

For this change to become active, you have to open a new terminal.

Thank you for installing Anaconda3!
:::: :::: ::::

Notice that I used the bash shell to install Anaconda on my CentOS machine. I could have used a different shell (e.g., ksh) but I did not wished to spend time figuring out how to modify the installation script in case something went wrong.

Like I mentioned, I like to use the ksh. For it to find the Anaconda software, I had to append:  /root/anaconda3/bin to PATH (ksh). Let’s take a look and see how busy PATH is:

$ echo $PATH
/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin

In order to make a permanent shell change, it has to be included in the .kshrc file residing in your home directory. To get this done:

$ cd
$ gvim .kshrc

In the .kshrc file I appended the following:

:::: :::: :::: <=== represents other lines in the .kshrc file
# **** PREPEND Anaconda3 to PATH ****

export PATH=/home/johncanessa/anaconda3/bin:$PATH

It is time to check if Python is available. This can be done by opening a console and entering the following:

$ python
Python 3.6.0 |Anaconda 4.3.0 (64-bit)| (default, Dec 23 2016, 12:22:00) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Note that Anaconda has been installed and the Python interpreter is version 3.6. Life is good; at least so far.

When you work with ML or data in Python, you need to make sure some basic libraries are available. Let’s check that.

>>> import numpy
>>> import scipy
>>> exit()
$ 

If the numpy or scipy libraries were not available, then Python would not have been happy as shown:

>>> import cake
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'cake'
>>>

Apparently there is no cake library in Python.

In order to update python software, one may use the pip command. To update pip:

$ pip install --upgrade pip
Requirement already up-to-date: pip in /home/johncanessa/anaconda3/lib/python3.6/site-packages

It seems like pip was already up to date.

A tool to have available when working with Python is the Jupyter notebook. I will discuss it further later in this post. For now let’s attempt to install it:

$ pip install --upgrade jupyter
Requirement already up-to-date: jupyter in ./anaconda3/lib/python3.6/site-packages

No surprise. This tool is part of the Anaconda distribution.

Anaconda comes with its own package manager named conda. Conda is equivalent to pip or perhaps is pip on steroids. To get a flavor, from a console enter the following command:

$ conda
usage: conda [-h] [-V] command ...

conda is a tool for managing and deploying applications, environments and packages.

Options:

positional arguments:
  command
    info         Display information about current conda install.
    help         Displays a list of available conda commands and their help
                 strings.
    list         List linked packages in a conda environment.
    search       Search for packages and display their information. The input
                 is a Python regular expression. To perform a search with a
                 search string that starts with a -, separate the search from
                 the options with --, like 'conda search -- -h'. A * in the
                 results means that package is installed in the current
                 environment. A . means that package is not installed but is
                 cached in the pkgs directory.
    create       Create a new conda environment from a list of specified
                 packages.
    install      Installs a list of packages into a specified conda
                 environment.
    update       Updates conda packages to the latest compatible version. This
                 command accepts a list of package names and updates them to
                 the latest versions that are compatible with all other
                 packages in the environment. Conda attempts to install the
                 newest versions of the requested packages. To accomplish
                 this, it may update some packages that are already installed,
                 or install additional packages. To prevent existing packages
                 from updating, use the --no-update-deps option. This may
                 force conda to install older versions of the requested
                 packages, and it does not prevent additional dependency
                 packages from being installed. If you wish to skip dependency
                 checking altogether, use the '--force' option. This may
                 result in an environment with incompatible packages, so this
                 option must be used with great caution.
    upgrade      Alias for conda update. See conda update --help.
    remove       Remove a list of packages from a specified conda environment.
    uninstall    Alias for conda remove. See conda remove --help.
    config       Modify configuration values in .condarc. This is modeled
                 after the git config command. Writes to the user .condarc
                 file (/home/johncanessa/.condarc) by default.
    clean        Remove unused packages and caches.
    package      Low-level conda package utility. (EXPERIMENTAL)

optional arguments:
  -h, --help     Show this help message and exit.
  -V, --version  Show the conda version number and exit.

other commands, such as "conda build", are available when additional conda
packages (e.g. conda-build) are installed

Let’s take a look at the version numbers of python and iphyton by in a console the following:

$ conda list | grep python
ipython                   5.1.0                    py36_0  
ipython_genutils          0.1.0                    py36_0  
python                    3.6.0                         0  
python-dateutil           2.6.0                    py36_0  

At this point we know we are running Python 3.6.0 (see early check). Let’s see if there are updated as follows:

$ conda update python

We can now check the python version:

$ python
Python 3.6.2 |Anaconda custom (64-bit)| (default, Jul 20 2017, 13:51:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
$

As we can tell, the version has been updated from 3.6.0 to 3.6.2.

Now let’s take a look at the current version of ipython and try to update it to the latest one available in the repository:

$ conda list | grep ipython
ipython                   5.1.0                    py36_0  
ipython_genutils          0.1.0                    py36_0  

$ conda update ipython

Similarly we can check the current version of Jupyter and update it to the newest available version:

$ conda list jupyter
# packages in environment at /home/johncanessa/anaconda3:
#
jupyter                   1.0.0                    py36_3  
jupyter_client            4.4.0                    py36_0  
jupyter_console           5.0.0                    py36_0  
jupyter_core              4.2.1                    py36_0  

$ conda update jupyter

To check if the necessary packages / libraries are available, from the python prompt we should be able to enter the following and we should not encounter errors:

>>> import matplotlib
>>> import numpy
>>> import pandas
>>> import scipy
>>> import sklearn
>>> import jupyter
>>>

If one of the packages is not available, we would encounter an error like:

>>> import cake
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'cake'
>>>

Now let’s try to open a Jupyter notebook to verify all is well with that package:

$ cd
$ cd ml
$ jupyter notebook
I 08:12:12.852 NotebookApp] Serving notebooks from local directory: /home/johncanessa/ml/housing
[I 08:12:12.852 NotebookApp] 0 active kernels 
[I 08:12:12.852 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/?token=3b67cdc232d9be160eb92837ae24e7ed1c02bb285676e849
[I 08:12:12.852 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 08:12:12.853 NotebookApp] 
    
    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:8888/?token=3b67cdc232d9be160eb92837ae24e7ed1c02bb285676e849
[I 08:12:13.348 NotebookApp] Accepting one-time-token-authenticated connection from ::1
[12915:12958:0113/081213.535139:ERROR:browser_gpu_channel_host_factory.cc(107)] Failed to launch GPU process.
Created new window in existing browser session.

^C[I 08:12:26.156 NotebookApp] interrupted <====
Serving notebooks from local directory: /home/johncanessa/ml/housing
0 active kernels 
The Jupyter Notebook is running at: http://localhost:8888/?token=3b67cdc232d9be160eb92837ae24e7ed1c02bb285676e849
Shutdown this notebook server (y/[n])? y <====
[C 08:12:29.484 NotebookApp] Shutdown confirmed
[I 08:12:29.485 NotebookApp] Shutting down kernels
$

The notebook was started. I then entered <ctrl-c> to kill the process. The software issued a prompt to confirm my intentions. I enter ‘y’ to exit the process.

So far we have the following in our home directory:

$ pwd
/home/johncanessa
$ ls -l
total 8
drwxr-xr-x. 21 johncanessa johncanessa 4096 Jan  7 08:06 anaconda3
drwxr-xr-x.  2 johncanessa johncanessa    6 Nov 10 15:17 Desktop
drwxr-xr-x.  2 johncanessa johncanessa   59 Jan  7 07:48 Documents
drwxr-xr-x.  2 johncanessa johncanessa    6 Nov 10 15:17 Downloads
drwxr-xr-x.  2 johncanessa johncanessa    6 Nov 10 15:17 Music
drwxr-xr-x.  2 johncanessa johncanessa    6 Nov 10 15:17 Pictures
drwxr-xr-x.  2 johncanessa johncanessa    6 Nov 10 15:17 Public
drwxr-xr-x.  2 johncanessa johncanessa    6 Nov 10 15:17 Templates
-rw-r--r--.  1 johncanessa johncanessa   72 Jan  7 08:23 Untitled.ipynb <====
drwxr-xr-x.  2 johncanessa johncanessa    6 Nov 10 15:17 Videos

The Untitled.ipynb file is the Jupyter notebook we created to verify that the software was properly installed and available.

A Jupyter notebook:

1) Is represented by a notebook file (e.g., Untitled.ipynb).

2) Starts a Jupyter Python kernel to send Python commands to be executed.

2) For access, opens a notebook tab on your web browser.

To get somewhat familiar with Jupyter you should take a tour. You can do that from the notebook in your web browser by selecting:

Help -> User Interface Tour

One more thing that I like to do is to on Gnome reduce the number or workspaces to 1. You can set the number of workspaces using:

Applications -> Utilities -> Tweak Tool -> Workspaces

Number of Workspaces: 1

Sorry about the lack of continuity in this post. I started a few weeks ago and collected my notes on a Linux machine. I then sent the text file via Gmail (yes, I could have open a window in my Windows machine) to myself. Back on the Windows machine I created a Word document and entered the text for this post. That took me several sessions. During that time I took off on holiday for nine days. I needed a change. The temperature in the Twin Cities of Minneapolis and St. Paul was reaching -15F and on average it was snowing a couple inches a week. I spent time in the Maya Riviera in Mexico, with my wife, granddaughters and a couple of friends. The sun was up every day and the highs for the day were around 84F. When we got back on Super Bowl Sunday which was played in the stadium in Minneapolis, I was tanned and had my sunglasses tattooed on my face.

If you have comments or questions, please leave me a note at the bottom of this post.

Remember that to learn you must read and experiment.

John

www.johncanessa.com

Follow me on Twitter:  @john_canessa

Leave a Reply

Your email address will not be published. Required fields are marked *