In a previous post I commented on waking up exactly at 05:00 AM. This morning I woke up around 04:10 AM. I guess there is some variation every day on how much sleep you get based on several factors (i.e., physical exhaustion, mental exhaustion, noise, temperature, food intake, among others).

This post is based on an optional assignment for the last course I completed on Coursera. The original assignment was long and I made modifications and enhancements so it just got bigger. If you are interested in Machine Learning (ML) you will need some refreshing on linear algebra. If you have little experience with Python or Numpy you will need some practice. This post should provide some refreshing and practice.

I will go over the cells in my Jupyter notebook. Now and then I might provide output for a cell. If you are interested, my suggestion is to visit GitHub and pull the notebook and experiment with it. Most of the operations and functions covered will be useful when you start working with logistic regression. That is one of the first topics in ML. With that out of the way, let’s dive into the notebook.

Cell #1 is used to display a line of text.

# **** print a message **** test = "Hello World !!!" print("test: " + test)

Cell #2 is similar to the previous one. In this case we will print a value. That works when not accompanied by some text as illustrated in the third line. Given that we initiate the print() statement with a string, we need to convert the double value in the variable val to a string.

# **** print a variable **** val = 1.2 print(val) print("val: " + str(val))

In the next cell #3 we import the math package because the math.exp() method will be used in the implementation of the basic_sigmoid() function. To learn more about the sigmoid function you may do it here. Sigmoid functions are used in ML due to the fact that the output is always in the range <0.0, 1.0>. We will learn more about it when we deal with logistic regression in a future post.

Once we define the basic_sigmoid() function we test it. You could try a value of 0 and the result should be 0.5.

import math # **** basic sigmoid function definition **** def basic_sigmoid(x): s = 1 / (1 + math.exp(-x)) return s # **** test the sigmoid function **** val = basic_sigmoid(3) #val = basic_sigmoid(0) print("val: " + str(val))

As you know, most algorithms in ML require a considerable amount of data. A typical way of presenting data to ML models is via arrays with one, or more dimensions. Let’s try our function with an array as illustrated in cell #4:

# **** define a vector (list) **** x = [1, 2, 3, 4] #x = np.array([1, 2, 3, 4]) #x = np.array([[1, 2, 3, 4]) print("x: " + str(x)) # **** try the basic_sigmoid function **** val = basic_sigmoid(x)

The output of this cell follows:

x: [1, 2, 3, 4] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-4-dcba755c75e9> in <module> 4 5 # **** try the basic_sigmoid function **** ----> 6 val = basic_sigmoid(x) <ipython-input-3-897a4c6e5ae2> in basic_sigmoid(x) 3 # **** basic sigmoid function definition **** 4 def basic_sigmoid(x): ----> 5 s = 1 / (1 + math.exp(-x)) 6 return s 7 TypeError: bad operand type for unary -: 'list'

As you can see things did not go well. We wrote the function to support individual values. We need a new version that will support vectors. For an example on how Numpy handle vectors we can do something like:

# **** now let's try it with Numpy **** import numpy as np # **** define a numpy vector (array) **** x = np.array([1, 2, 3, 4]) #x = np.array([[1, 2, 3, 4]]) print("x: " + str(x)) print("x.shape: " + str(x.shape)) print() # **** compute the exponential of each element in the array **** val = np.exp(x) print("val: " + str(val)) print("val.shape: " + str(val.shape))

We first need to import Numpy. We define an array and we compute an array in which all entries represent the exponential of the first set of values. If you get a chance to run this cell you will be able to see the shapes of the input and output vectors. You can check more about shape() and reshape() in a previous post.

In cell #6 we will add a constant to each element in vector x. We do it with and without saving the result to a new vector. Something to think about is how four entries in the array can add with a single value. This is called broadcasting. You can read more about it here.

# **** add a constant to each element in the array **** print("x + 3: " + str(x + 3)) # **** save the results in an array **** val = x + 3 print("val: " + str(val)) print("val.shape: " + str(val.shape))

If you are at any point interested on the definition or arguments for a numpy call / method, you can go to an empty cell in the notebook, type in the call followed by a question mark. A description will appear. You can expand it to a separate window in your web browser. This is illustrated in cell #7.

# **** get quick access to documentation on a numpy function **** np.exp?

In this case, we are interested in learning more about the np.exp() function / method we used in a previous cell.

In cell #8 we define the sigmoid() function. In this case the function is written to support and use Numpy arrays. After the definition we run a test.

# **** sigmoid function using Numpy definition **** def sigmoid(x): s = 1 / (1 + np.exp(-x)) return s # **** try the sigmoid function **** print("x: " + str(x)) val = sigmoid(x) print("sigmoid(x): " + str(val)) print("val.shape: " + str(val.shape))

Looking into the future, cell #9 defines and tests the sigmoid_derivative() function. When we deal with logistic regression we will make use of this or a very similar function.

# **** sigmoid_derivative function definition **** def sigmoid_derivative(x): s = sigmoid(x) ds = s * (1 - s) return ds # **** try the sigmoid_derivative function **** x = np.array([1, 2, 3, 4]) print("x: " + str(x)) val = sigmoid_derivative(x) print("sigmoid_derivative(x): " + str(val)) print("val.shape: " + str(val.shape))

In cell #10 we are going to perform some vector operations similar to what we do with actual images. Images displayed on a computer device typically use three values to describe each pixel. One value is for RED, the second for GREEN and the last for BLUE. In this case all the RED components are in the first [3, 3] matrix. The GREEN are in the second matrix, and the BLUE are on the third one. In this case the image would take 3 x 3 == 9 pixels. Not even a single character uses 9 pixels. Not with standing vector graphics, a typical matrix of 7 x 9 == 63 pixels would be used to describe a single numerical digit. The idea is to illustrate how we would deal with an actual image. In ML competitions images are typically 64 x 64 x 3 or larger.

# **** define an extremely small RGB image **** #image = np.random.randn(3, 3, 2) image = np.array([[[ 0.67826139, 0.29380381], [ 0.90714982, 0.52835647], [ 0.4215251 , 0.45017551]], [[ 0.92814219, 0.96677647], [ 0.85304703, 0.52351845], [ 0.19981397, 0.27417313]], [[ 0.60659855, 0.00533165], [ 0.10820313, 0.49978937], [ 0.34144279, 0.94630077]]]) print("image:\n" + str(image)) print() print("image.shape: " + str(image.shape)) print("image.shape[0]: " + str(image.shape[0])) print("image.shape[1]: " + str(image.shape[1])) print("image.shape[2]: " + str(image.shape[2]))

In cell #11 we define the image2vector() function. We pass in an image and the function returns a columnar vector holding all the pixels for the RED, GREEN and BLUE component for all pixels. In a future post we will see how this concept is used with actual images in order to create a model that would differentiate pictures containing cats or dogs.

# **** image2vector function definition **** def image2vector(image): v = image.reshape(image.shape[0] * image.shape[1] * image.shape[2], 1) return v # **** try the image2vector function **** x = image2vector(image) print("image2vector(image):\n" + str(x)) print("x.shape: " + str(x.shape))

In cell #12 de define and implement a function used to normalize rows. The reason for this is that when you are working in a ML project, we do not want to have a feature that is given more weight than another based on the units. For example, the weight (no pun intended) of an adult in pounds, depending on gender may be around 160 pounds. The height of an average person may be around 65 inches. In this case weight is about 2.5 larger than the height. To avoid giving more emphasis to weight, we can normalize both.

# **** normalizeRows function definition **** def normalizeRows(x): # **** compute the norm per row **** x_norm = np.linalg.norm(x, ord = 2, axis = 1, keepdims = True) print("x_norm:\n" + str(x_norm)) print("x_norm.shape: " + str(x_norm.shape)) x_norm_0 = np.sqrt((x ** 2).sum(axis = 1, keepdims = True)) print("x_norm_0:\n" + str(x_norm_0)) x = x / x_norm return x # **** define an array to normalize **** x = np.array([[0, 3, 4], [1, 6, 4]]) print("x.shape: " + str(x.shape)) print() val = normalizeRows(x) print() print("normalizeRows(x):\n" + str(val))

Moving on, on cell #13 we define a softmax() function. To read more about it you can find the subject in Wikipedia. You know the routine. We define the function, then we define a matrix for testing and we generate results.

# **** define softmax function **** def softmax(x): # **** apply exp() element-wise to x **** x_exp = np.exp(x) # print("x_exp.shape: " + str(x_exp.shape)) # **** create a vector x_sum that sums each row of x_exp **** x_sum = np.sum(x_exp, axis = 1, keepdims = True) # print("x_sum.shape: " + str(x_sum.shape)) # **** compute the softmax(x) by dividing x_exp by x_sum (broadcasting in action) **** s = x_exp / x_sum return s # **** define an array to compute the softmax **** x = np.array([[9, 2, 5, 0, 0], [7, 5, 0, 0, 0]]) val = softmax(x) print() print("softmax(x):\n" + str(val)) print("val.shape: " + str(val.shape))

Note that I left some print() statements inside the function. I used them to better understand how the function operates. Do not forget how easy it is to get help on Numpy functions (e.g., np.sum?)

Now we will move on to implement some vector operations using loops. Loops in any programming language take time to execute. The idea is to demonstrate how long a set of operations take when using loops and compare to the same operations when implemented with a single method using SIMD.

# **** to time operations **** import time # **** arrays to use for testing vectorization **** #x1 = np.random.randn(15000) #x2 = np.random.randn(15000) x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0] x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0] print("type(x1): " + str(type(x1))) print("len(x1): " + str(len(x1))) #x1 = np.array([9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]) #x2 = np.array([9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]) #print("type(x1): " + str(type(x1))) #print("x1.shape: " + str(x1.shape)) # **** clasic dot product of vectors **** tic = time.process_time_ns() dot = 0 for i in range(len(x1)): dot += x1[i] * x2[i] toc = time.process_time_ns() print(" dot: " + str(dot)) print("time: " + str(toc - tic) + " ns") print() # **** classic outer product of vectors **** tic = time.process_time_ns() #print("len(x1): " + str(len(x1))) #print("len(x2): " + str(len(x2))) outer = np.zeros((len(x1), len(x2))) #print("outer.shape: " + str(outer.shape)) for i in range(len(x1)): for j in range(len(x2)): outer[i, j] = x1[i] * x2[j] toc = time.process_time_ns() print("outer:\n" + str(outer)) print("time: " + str(toc - tic) + " ns") print() # **** classic elementwise multiplication **** tic = time.process_time_ns() mul = np.zeros(len(x1)) for i in range(len(x1)): mul[i] = x1[i] * x2[i] toc = time.process_time_ns() print("mul:\n" + str(mul)) print("time: " + str(toc - tic) + " ns") print() # **** classic general dot product implementation **** W = np.random.randn(3, len(x1)) print("W:\n" + str(W)) tic = time.process_time_ns() gdot = np.zeros(W.shape[0]) for i in range(W.shape[0]): for j in range(len(x1)): gdot[i] += W[i, j] * x1[j] toc = time.process_time_ns() print("gdot:\n" + str(gdot)) print("time: " + str(toc - tic) + " ns") print()

You can see that I tried defining some rather large arrays with random values. The issue is that when you look at some of the results it is quite difficult to check them and to follow. For this reason smaller arrays with specified values are used. The drawback is that timing is useless. If you run the code you will understand what I am saying. For time comparisons you should just use large arrays.

# **** vectorized dot product **** tic = time.process_time_ns() dot = np.dot(x1, x2) toc = time.process_time_ns() print(" dot: " + str(dot)) print("time: " + str(toc - tic) + " ns") print() # **** vectorized outer product of vectors **** tic = time.process_time_ns() outer = np.outer(x1, x2) toc = time.process_time_ns() print("outer:\n" + str(outer)) print("time: " + str(toc - tic) + " ns") print() # **** vectorized elementwise multiplication **** tic = time.process_time_ns() mul = np.multiply(x1, x2) toc = time.process_time_ns() print("mul:\n" + str(mul)) print("time: " + str(toc - tic) + " ns") print() # **** vectorized general dot product **** tic = time.process_time_ns() gdot = np.dot(W, x1) toc = time.process_time_ns() print("gdot:\n" + str(gdot)) print("time: " + str(toc - tic) + " ns") print()

This last cell shows the same operations, this time in vectorized form using SIMD. If you have a chance to run the code you will be able to verify that both produce the same results. To learn more about dot product, outer product and / or vector multiplication please follow one of the links.

In cell #16 we define a loss function and test it with some random values.

# **** define L1 loss function **** def L1(yhat, y): loss = np.sum(np.abs(y - yhat)) return loss # **** test the L1 loss function *** yhat = np.array([.9, 0.2, 0.1, .4, .9]) y = np.array([1, 0, 0, 1, 1]) val = L1(yhat, y) print("L1: " + str(val))

Finally in cell #17 de define and test a second loss function. Loss functions will become clear in a future post.

# **** define the L2 loss function **** def L2(yhat, y): # loss = np.sum(np.square(y - yhat)) loss = np.sum(np.dot(y - yhat, y - yhat)) return loss # *** test the L2 loss function **** yhat = np.array([.9, 0.2, 0.1, .4, .9]) y = np.array([1, 0, 0, 1, 1]) val = L2(yhat, y) print("L2: " + str(val))

Note that the loss can be computed in different ways. In our code we have two approaches. If you have a chance to experiment with the code, you will find out, at least it happened on my machine, that the results are very similar, yet one implementation seems to be more accurate that the other. In addition, from a software engineering point of view, one line of code is easier to understand than the other. Let me know your thoughts.

I have posted this Jupyter notebook on GitHub.

As usual, if you have comments or questions regarding this or any other post in this blog, please leave me a note bellow.

Keep on learning, experimenting and enjoying;

John

Please follow me on Twitter: **@john_canessa**