I am constantly reading and experimenting to learn things which I may apply to work projects. A few years ago I decided to spent time on and off learning machine learning (ML). With that purpose in mind, I got a number of books on different subjects (e.g., Deep Learning, Python and Statistics) which seemed to be useful to achieve my goal.
On the platform side, I started experimenting on Windows. Most things work fine but some things do not. For example, I had to wait to use Tensorflow because it ran on Linux but not on Windows. Today it seems to work on both. The same seems to hold true for Docker.
On a future post, I will touch on some dedicated hardware that I purchased in order to extend my understanding and abilities in what generally is referred to as “Artificial Intelligence”, AI for short.
Like I said, I have purchased and read some books, I also read several articles a day on AI and when available I watch webinars from the ACM. Becoming a member of the ACM is something all individuals that have studied and work on different aspects of software development (e.g., architecture, coding, computer science, or design) should consider joining.
In this post I will go over some notes that I took and continue to take, while learning and experimenting with AI. For context reasons I start with the hardware and platform.
A couple years ago I purchased a DELL computer which I will use for most of the posts regarding AI and Machine Learning (ML). At the time I was interested in getting familiar with CentOS which is a Linux distribution. It happens that I know the owner of Advanced Productivity (AP), a company in the Twin Cities of Minneapolis and St. Paul that among other items and services sells refurbished DELL equipment. The company also supports and maintains what they sell. Not that I am patronizing DELL, but at home I have about a dozen computers some of which I have been using for a decade or so. So far I have not had a hardware failure.
If I have time; I do not mind performing hardware updates (e.g., increase memory capacity, add or replace disks) on the equipment. In this specific case, my machine was initially running an early version of CentOS 6. I updated the Linux version several times. Before starting on this personal project, I wanted to have installed CentOS 7 and some additional memory. I asked AP to have the Linux software and memory updated.
A simple way to verify the amount of memory available in your Linux system is to open a console and enter the following:
$ free total used free shared buff/cache available Mem: 24518532 1443992 21828776 31980 1245764 22594072 Swap: 4063228 0 4063228
The total (and in my case the maximum for this desktop) amount of memory is 24 GBs.
To verify the version of Linux you may try:
$ uname -a Linux localhost.localdomain 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
The issue with this command is that it does not show the flavor of Linux. For that you could use the following:
$ cat /etc/centos-release CentOS Linux release 7.4.1708 (Core)
At this point in time my machine is running CentOS 7.4
As I mentioned earlier, I will be using Python on this set of posts. I needed to determine if Python came with the Linux distribution (distro for short). For this I used the following command:
[johncanessa@localhost Documents]$ which python /usr/bin/python
Apparently the distro has a version of Python. On a side note, you can tell that the original default prompt on the console is:
[johncanessa@localhost Documents]$
For my taste the default prompt is too long. I will address it shortly in this post.
So far we know that Python came with the Linux distro, but we do not know which version. To get the version we could use the following command:
[johncanessa@localhost Documents]$ python -V Python 2.7.5
There are two main versions of Python in circulation; one is 2.x and the more recent is 3.x. There have been several enhancements to the 3.x version. So why do not update to version 3.x and be done? There are many scripts / programs in the public domain that were written for 2.x versions. Because of this, you may want to maintain several versions of Python in your machine. In my case I will maintain the 2.x and will add 3.x later in this post.
You can run Python interactively. The idea is that you can try different commands without the need to write scripts. To start an interactive session you can do it by entering the following command:
[johncanessa@localhost Documents]$ python Python 2.7.5 (default, Aug 4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>>
Before we continue I decided to switch the Linux shell. The reason is that I use at work the Korn (ksh). To determine which shell is configured by default one can use the following command:
[johncanessa@localhost ~]$ echo $0 bash
I seems like bash is the default shell. I have been working with different flavors of UNIX (e.g., BSD, IRIX, Minix and Solaris among others) and Linux for a few decades. During that time different shells became popular and with time faded. Among the top ones I can recall are: Bash, Bourne, C, and Korn shells.
To determine which shells are installed in our Linux distro one can use the following command:
[johncanessa@localhost ~]$ cat /etc/shells /bin/sh /bin/bash /sbin/nologin /usr/bin/sh /usr/bin/bash /usr/sbin/nologin /bin/tcsh /bin/csh
It seems that ksh is not included as an option. We will have to download, install, and then figure out how to make it the default shell.
To install ksh we can use the following:
[johncanessa@localhost ~]$ sudo yum install ksh [sudo] password for johncanessa:
I used the sudo command to run the yum installer as root. I could have switched users to root, but that is typically not recommended. The reason is that you may forget that you are running with privileges and may accidentally run a command from a wrong location that may affect other users or the actual root account.
After installing ksh we can verify that it is available issuing the following command:
[johncanessa@localhost ~]$ cat /etc/shells /bin/sh /bin/bash /sbin/nologin /usr/bin/sh /usr/bin/bash /usr/sbin/nologin /bin/tcsh /bin/csh /bin/ksh <==== Korn shell now available /bin/rksh
As you can see with the new labeled entry, we have the just installed available for our picking.
To change the shell one could edit some system level files or issue the following:
[johncanessa@localhost ~]$ su Password: [root@localhost johncanessa]# chsh -s /bin/ksh johncanessa Changing shell for johncanessa. Shell changed.
Note that in this case I used the su command. As you can tell the prompt changed from a $ to a # character. To exit root just enter the following command:
[root@localhost johncanessa]# exit exit $
The exit command lets you return to the previous user. In this case it me running at user level. Note how the prompt switched back to the $ symbol.
On Linux there are several different editors available. During the years I have seen a few coming and going. The two workhorses in Linux / UNIX are Emacs and vi. The original vi has been enhanced with vim and gvim. You can still find and use vi but I prefer to use gvim.
You can set many options for gvim using the ~/.gvimrc configuration file. By default, gvim uses 8 space tabs. Given that modern software may use many indentation levels, it is typical to set the indentation to only 4 spaces. This can be accomplished by appending to your existing (or creating a new) ~/.gvimrc file the following line:
set tabstop=4
I like to use the arrow keys on the keyboard to scroll up and down for commands that I have issued. To get this done issue the following commands;
$ cd $ gvim .kshrc
Then append the following lines to the .kshrc file:
# **** ENABLE arrows on console **** set -o emacs
I always like to enter comments so I can remember what I have done and to be able to share work with colleagues. You can logout and back in or you can source your new file. To source it enter the following:
$ . ~/.kshrc
Anaconda is a typical distribution used with machine learning and Python in general. The nice thing about it is that it comes with a very nice selection of useful commands / utilities. To download Anaconda use the following commands:
$ cd $ wget https://repo.continuum.io/archive/Anaconda3-4.3.0-Linux-x86_64.sh
To install Anaconda, will use the following command:
$ bash Anaconda3-4.3.0-Linux-x86_64.sh :::: :::: :::: Do you wish the installer to prepend the Anaconda3 install location to PATH in your /home/johncanessa/.bashrc ? [yes|no] [no] >>> yes Prepending PATH=/home/johncanessa/anaconda3/bin to PATH in /home/johncanessa/.bashrc A backup will be made to: /home/johncanessa/.bashrc-anaconda3.bak For this change to become active, you have to open a new terminal. Thank you for installing Anaconda3! :::: :::: ::::
Notice that I used the bash shell to install Anaconda on my CentOS machine. I could have used a different shell (e.g., ksh) but I did not wished to spend time figuring out how to modify the installation script in case something went wrong.
Like I mentioned, I like to use the ksh. For it to find the Anaconda software, I had to append: /root/anaconda3/bin to PATH (ksh). Let’s take a look and see how busy PATH is:
$ echo $PATH /usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin
In order to make a permanent shell change, it has to be included in the .kshrc file residing in your home directory. To get this done:
$ cd $ gvim .kshrc
In the .kshrc file I appended the following:
:::: :::: :::: <=== represents other lines in the .kshrc file # **** PREPEND Anaconda3 to PATH **** export PATH=/home/johncanessa/anaconda3/bin:$PATH
It is time to check if Python is available. This can be done by opening a console and entering the following:
$ python Python 3.6.0 |Anaconda 4.3.0 (64-bit)| (default, Dec 23 2016, 12:22:00) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
Note that Anaconda has been installed and the Python interpreter is version 3.6. Life is good; at least so far.
When you work with ML or data in Python, you need to make sure some basic libraries are available. Let’s check that.
>>> import numpy >>> import scipy >>> exit() $
If the numpy or scipy libraries were not available, then Python would not have been happy as shown:
>>> import cake Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named 'cake' >>>
Apparently there is no cake library in Python.
In order to update python software, one may use the pip command. To update pip:
$ pip install --upgrade pip Requirement already up-to-date: pip in /home/johncanessa/anaconda3/lib/python3.6/site-packages
It seems like pip was already up to date.
A tool to have available when working with Python is the Jupyter notebook. I will discuss it further later in this post. For now let’s attempt to install it:
$ pip install --upgrade jupyter Requirement already up-to-date: jupyter in ./anaconda3/lib/python3.6/site-packages
No surprise. This tool is part of the Anaconda distribution.
Anaconda comes with its own package manager named conda. Conda is equivalent to pip or perhaps is pip on steroids. To get a flavor, from a console enter the following command:
$ conda usage: conda [-h] [-V] command ... conda is a tool for managing and deploying applications, environments and packages. Options: positional arguments: command info Display information about current conda install. help Displays a list of available conda commands and their help strings. list List linked packages in a conda environment. search Search for packages and display their information. The input is a Python regular expression. To perform a search with a search string that starts with a -, separate the search from the options with --, like 'conda search -- -h'. A * in the results means that package is installed in the current environment. A . means that package is not installed but is cached in the pkgs directory. create Create a new conda environment from a list of specified packages. install Installs a list of packages into a specified conda environment. update Updates conda packages to the latest compatible version. This command accepts a list of package names and updates them to the latest versions that are compatible with all other packages in the environment. Conda attempts to install the newest versions of the requested packages. To accomplish this, it may update some packages that are already installed, or install additional packages. To prevent existing packages from updating, use the --no-update-deps option. This may force conda to install older versions of the requested packages, and it does not prevent additional dependency packages from being installed. If you wish to skip dependency checking altogether, use the '--force' option. This may result in an environment with incompatible packages, so this option must be used with great caution. upgrade Alias for conda update. See conda update --help. remove Remove a list of packages from a specified conda environment. uninstall Alias for conda remove. See conda remove --help. config Modify configuration values in .condarc. This is modeled after the git config command. Writes to the user .condarc file (/home/johncanessa/.condarc) by default. clean Remove unused packages and caches. package Low-level conda package utility. (EXPERIMENTAL) optional arguments: -h, --help Show this help message and exit. -V, --version Show the conda version number and exit. other commands, such as "conda build", are available when additional conda packages (e.g. conda-build) are installed
Let’s take a look at the version numbers of python and iphyton by in a console the following:
$ conda list | grep python ipython 5.1.0 py36_0 ipython_genutils 0.1.0 py36_0 python 3.6.0 0 python-dateutil 2.6.0 py36_0
At this point we know we are running Python 3.6.0 (see early check). Let’s see if there are updated as follows:
$ conda update python
We can now check the python version:
$ python Python 3.6.2 |Anaconda custom (64-bit)| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> exit() $
As we can tell, the version has been updated from 3.6.0 to 3.6.2.
Now let’s take a look at the current version of ipython and try to update it to the latest one available in the repository:
$ conda list | grep ipython ipython 5.1.0 py36_0 ipython_genutils 0.1.0 py36_0 $ conda update ipython
Similarly we can check the current version of Jupyter and update it to the newest available version:
$ conda list jupyter # packages in environment at /home/johncanessa/anaconda3: # jupyter 1.0.0 py36_3 jupyter_client 4.4.0 py36_0 jupyter_console 5.0.0 py36_0 jupyter_core 4.2.1 py36_0 $ conda update jupyter
To check if the necessary packages / libraries are available, from the python prompt we should be able to enter the following and we should not encounter errors:
>>> import matplotlib >>> import numpy >>> import pandas >>> import scipy >>> import sklearn >>> import jupyter >>>
If one of the packages is not available, we would encounter an error like:
>>> import cake Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named 'cake' >>>
Now let’s try to open a Jupyter notebook to verify all is well with that package:
$ cd $ cd ml $ jupyter notebook I 08:12:12.852 NotebookApp] Serving notebooks from local directory: /home/johncanessa/ml/housing [I 08:12:12.852 NotebookApp] 0 active kernels [I 08:12:12.852 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/?token=3b67cdc232d9be160eb92837ae24e7ed1c02bb285676e849 [I 08:12:12.852 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 08:12:12.853 NotebookApp] Copy/paste this URL into your browser when you connect for the first time, to login with a token: http://localhost:8888/?token=3b67cdc232d9be160eb92837ae24e7ed1c02bb285676e849 [I 08:12:13.348 NotebookApp] Accepting one-time-token-authenticated connection from ::1 [12915:12958:0113/081213.535139:ERROR:browser_gpu_channel_host_factory.cc(107)] Failed to launch GPU process. Created new window in existing browser session. ^C[I 08:12:26.156 NotebookApp] interrupted <==== Serving notebooks from local directory: /home/johncanessa/ml/housing 0 active kernels The Jupyter Notebook is running at: http://localhost:8888/?token=3b67cdc232d9be160eb92837ae24e7ed1c02bb285676e849 Shutdown this notebook server (y/[n])? y <==== [C 08:12:29.484 NotebookApp] Shutdown confirmed [I 08:12:29.485 NotebookApp] Shutting down kernels $
The notebook was started. I then entered <ctrl-c> to kill the process. The software issued a prompt to confirm my intentions. I enter ‘y’ to exit the process.
So far we have the following in our home directory:
$ pwd /home/johncanessa $ ls -l total 8 drwxr-xr-x. 21 johncanessa johncanessa 4096 Jan 7 08:06 anaconda3 drwxr-xr-x. 2 johncanessa johncanessa 6 Nov 10 15:17 Desktop drwxr-xr-x. 2 johncanessa johncanessa 59 Jan 7 07:48 Documents drwxr-xr-x. 2 johncanessa johncanessa 6 Nov 10 15:17 Downloads drwxr-xr-x. 2 johncanessa johncanessa 6 Nov 10 15:17 Music drwxr-xr-x. 2 johncanessa johncanessa 6 Nov 10 15:17 Pictures drwxr-xr-x. 2 johncanessa johncanessa 6 Nov 10 15:17 Public drwxr-xr-x. 2 johncanessa johncanessa 6 Nov 10 15:17 Templates -rw-r--r--. 1 johncanessa johncanessa 72 Jan 7 08:23 Untitled.ipynb <==== drwxr-xr-x. 2 johncanessa johncanessa 6 Nov 10 15:17 Videos
The Untitled.ipynb file is the Jupyter notebook we created to verify that the software was properly installed and available.
A Jupyter notebook:
1) Is represented by a notebook file (e.g., Untitled.ipynb).
2) Starts a Jupyter Python kernel to send Python commands to be executed.
2) For access, opens a notebook tab on your web browser.
To get somewhat familiar with Jupyter you should take a tour. You can do that from the notebook in your web browser by selecting:
Help -> User Interface Tour
One more thing that I like to do is to on Gnome reduce the number or workspaces to 1. You can set the number of workspaces using:
Applications -> Utilities -> Tweak Tool -> Workspaces
Number of Workspaces: 1
Sorry about the lack of continuity in this post. I started a few weeks ago and collected my notes on a Linux machine. I then sent the text file via Gmail (yes, I could have open a window in my Windows machine) to myself. Back on the Windows machine I created a Word document and entered the text for this post. That took me several sessions. During that time I took off on holiday for nine days. I needed a change. The temperature in the Twin Cities of Minneapolis and St. Paul was reaching -15F and on average it was
snowing a couple inches a week. I spent time in the Maya Riviera in Mexico, with my wife, granddaughters and a couple of friends. The sun was up every day and the highs for the day were around 84F. When we got back on Super Bowl Sunday which was played in the stadium in Minneapolis, I was tanned and had my sunglasses tattooed on my face.
If you have comments or questions, please leave me a note at the bottom of this post.
Remember that to learn you must read and experiment.
John
www.johncanessa.com
Follow me on Twitter: @john_canessa