As I have mentioned in previous posts, I like to purchase and read computer related technical books. When I receive the book I write my name and year on the first page. I then locate the date for the last revision and circle it. In 2017 I purchased “Data Science from Scratch” by Joel Grus. I read the first five chapters that I was interested it at the time and moved on to the next book.
I always try to read and practice the concepts. You cannot be sure of the material you are learning until you had a chance to experiment with it and hopefully use it in an actual project. Chapter 2 “A Crash Course in Python” provides some installation suggestions and then covers some examples using Python 2.x. For some reason Joel is not confident with Python 3.x compatibility for some common libraries used in Data Science. I prefer to use Python 3.x and have not run into issues in the past six years. More on this when we cover my notes as I was experimenting roughly following the contents and examples in the book.
I am taking some specialization courses in Big Data and Machine Learning and decided to use a Linux computer instead of Windows. I have several computers in my home office. A Linux system running CentOS was almost idle so I initially decided to go over the full installation. Found some notes I took when installing Python / Anaconda on that machine several months ago. For completeness and simplicity I just updated the software and started going over the second chapter in the book. I believe will end up with two or three related posts based on that chapter.
The book suggests the use of the Anaconda distribution. I did a Google search to get current information for Anaconda and found the following description at this link:
“With over six million users worldwide, Anaconda is the industry standard for data science. Anaconda Enterprise empowers organizations to develop, govern, and automate ML/AI pipelines from laptop to production, quickly delivering insights into the hands of business leaders and decision-makers”.
I rather use Python bundled in Anaconda because it seems to include most libraries and tools used by data scientists. To recall how to update the installed conda software on my Linux machine:
$ conda usage: conda [-h] [-V] command ... conda is a tool for managing and deploying applications, environments and packages. Options: positional arguments: command clean Remove unused packages and caches. config Modify configuration values in .condarc. This is modeled after the git config command. Writes to the user .condarc file (/home/johncanessa/.condarc) by default. create Create a new conda environment from a list of specified packages. help Displays a list of available conda commands and their help strings. info Display information about current conda install. install Installs a list of packages into a specified conda environment. list List linked packages in a conda environment. package Low-level conda package utility. (EXPERIMENTAL) remove Remove a list of packages from a specified conda environment. uninstall Alias for conda remove. See conda remove --help. search Search for packages and display associated information. The input is a MatchSpec, a query language for conda packages. See examples below. update Updates conda packages to the latest compatible version. This command accepts a list of package names and updates them to the latest versions that are compatible with all other packages in the environment. Conda attempts to install the newest versions of the requested packages. To accomplish this, it may update some packages that are already installed, or install additional packages. To prevent existing packages from updating, use the --no-update-deps option. This may force conda to install older versions of the requested packages, and it does not prevent additional dependency packages from being installed. If you wish to skip dependency checking altogether, use the '--force' option. This may result in an environment with incompatible packages, so this option must be used with great caution. upgrade Alias for conda update. See conda update --help. optional arguments: -h, --help Show this help message and exit. -V, --version Show the conda version number and exit. conda commands available from other packages: env server
To determine the location of the Anaconda package I used the following:
$ which conda /home/johncanessa/anaconda3/bin/conda
Using this information I proceeded to update the anaconda package. To do so I used the following command:
$ conda update --prefix /home/johncanessa/anaconda3 anaconda Solving environment: - Proceed ([y]/n)? y Downloading and Extracting Packages :::: :::: :::: Preparing transaction: done Verifying transaction: done Executing transaction: done $
The update took about five minutes to complete.
I then decided to get the current / updated Python version using the following:
$ python --version Python 3.6.5 :: Anaconda, Inc. $
The last time I worked with Python on this machine I was running version 3.6.0. Seems like there have been several updates since I used Python on this computer.
Getting back to the discussion of Python 2.x or 3.x, if you need to use 2.7 for some strange reason, you can easily do that with Anaconda. It is really simple to switch between different Python environments if you use the Conda package manager.
Originally, Python had scheduled the ‘end of life’ date for Python 2.x for 2015, but in 2014 they announced they would extend this by five years to 2020, in part to relieve worries for those users who cannot yet migrate to Python 3. Today (July 2018), there are very few libraries that do not support Python 3.
The book recommends the use of IPython. In Wikipedia I found the following information:
IPython (Interactive Python) is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, which offers introspection, rich media, shell syntax, tab completion, and history. IPython provides the following features:
Interactive shells (terminal and Qt-based).
A browser-based notebook interface with support for code, text, mathematical expressions, inline plots and other media.
Support for interactive data visualization and use of GUI toolkits.
Flexible, embeddable interpreters to load into one’s own projects.
Tools for parallel computing.
The following command shows the version of ipython installed:
$ conda list | grep ipython ipython 6.4.0 py36_0 ipython_genutils 0.2.0 py36hb52b0d5_0 $
If I am not mistaken this command was issued before updating the Anaconda distribution.
If you are interested in reading the Zen of Python you could use the following:
In : import this The Zen of Python, by Tim Peters Beautiful is better than ugly (how can anyone disagree with this). Explicit is better than implicit (stop making things obscure). Simple is better than complex (Junior (has nothing to do with age) developers generate complex code, seniors develop simple and elegant code). Complex is better than complicated (Hard for most developers to understand the difference). Flat is better than nested (specially when generating thousands of methods and classes mostly with very few lines of code). Sparse is better than dense (less is better than more when it codes to software). Readability counts (something that many coding standards forget). Special cases aren't special enough to break the rules (you can see why Tim is a senior developer). Although practicality beats purity (this is learned with experience). Errors should never pass silently (ripple them up or stop). Unless explicitly silenced (the end user is always in control; allow her to be in such position). In the face of ambiguity, refuse the temptation to guess (seen so often with junior developers; even after performing Google searches). There should be one-- and preferably only one --obvious way to do it (I always use: There are many ways to skin a cat; but only one is the best). Although that way may not be obvious at first unless you're Dutch (let's not get racial). Now is better than never (Not only for software development but in life). Although never is often better than *right* now (can argue with this). If the implementation is hard to explain, it's a bad idea (if you cannot explain it; you do not know enough). If the implementation is easy to explain, it may be a good idea (ditto). Namespaces are one honking great idea -- let's do more of those (too Python oriented)!
Please note that the text inside the parenthesis at the end of each line, are my comments. The import this command was issued from the IPython tool.
Python does not use parentheses or begin / end for loops; it just indents the code. This tends to cause some trouble for people starting with Python. Following is a test that illustrates how python uses indentation to flag loops:
print("<<< starting ...") for i in [1, 2, 3, 4]: print("<<< i:", i) for j in [1, 2, 3, 4]: print("<<< j:", j, " i + j:", i + j) print("<<< ending !!!")
Running the script produces the following output:
$ python ./test.py <<< starting ... <<< i: 1 <<< j: 1 i + j: 2 <<< j: 2 i + j: 3 <<< j: 3 i + j: 4 <<< j: 4 i + j: 5 <<< i: 2 <<< j: 1 i + j: 3 <<< j: 2 i + j: 4 <<< j: 3 i + j: 5 <<< j: 4 i + j: 6 <<< i: 3 <<< j: 1 i + j: 4 <<< j: 2 i + j: 5 <<< j: 3 i + j: 6 <<< j: 4 i + j: 7 <<< i: 4 <<< j: 1 i + j: 5 <<< j: 2 i + j: 6 <<< j: 3 i + j: 7 <<< j: 4 i + j: 8 <<< ending !!! $
Following are the contents of a Python script that illustrates the use of the backslash ‘\’ character as a line continuation marker.
# **** declaring a couple variables **** a = 10 b = 20 # **** single line **** print("a + b:", a + b) # **** continuation line character **** print("a + b:",\ a + b) # **** no continuation line character **** print("a + b:", a + b)
The output generated by the scrip follows:
$ python ./cont_line.py a + b: 30 a + b: 30 a + b: 30 $
Python 2 used to perform divisions using integer arithmetic. Typically one would expect a decimal result. This is not the case in Python 3. Division is performed in decimal as illustrated in the following example using ipython:
In : 5 / 2 Out: 2.5
OK, we will stop here for today. Tomorrow will continue where we got off. Things will hopefully get a little more interesting and perhaps complex.
Follow me on Twitter: @john_canessa