Logistic Regression with a Neural Network mindset

It is a very snowy day in the Twin Cities of Minneapolis and St. Paul. Schools are closed due to the amount of snow and low visibility. It started snowing earlier this morning and according to forecast, it should end around 09:00 PM this evening. We have already surpassed the snow amount for February according to records that go back over a century. We will be receiving more snow in the upcoming days. Will see if we set other new records.

In this post I will cover a logistic regression implementation used to determine if pictures contain a cat or not. The code is based on an edited assignment for Coursera Neural Networks and Deep Learning.

The goals of this post are to build the general architecture of a learning algorithm including:

Initializing parameters
Calculating the cost function and its gradient
Using an optimization algorithm (gradient descent)

We will use a Jupyter Notebook for this post. I will cover all cells, one at a time.

# **** imports ****
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset
import skimage
%matplotlib inline

Cell #1 defines the imports required. Numpy is used for all the computations. The matplotlib library is used to display plots and charts. The hp5py package provides a Python interface to the HDF5 binary data format. You can read more about it here. The scipy package is used for scientific computing. You can read more about SciPy here. The PIL package adds support for many image formats. Given that this post is about images with and without cats, such package is used to process images. You can learn more about it here. The scikit package is used to process images. You can read more about it here.

The lr_utils package is used to download images. It is not a standard package. It is used to fetch the datasets required by the course.

Finally the %matplotlib inline is a magic function in IPython. It is used to render images in the Jupyter notebook.

In the next cell we will use the load_dataset() function to load the data we use. The definition for that function follows:

# **** definition for the load_dataset() function ****
def load_dataset():
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes

    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

The next step illustrated in cell #2 is to load the data for the project.

# **** loading the data (cat/non-cat) ****
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

The data consists of:

A training set of m_train images labeled as cat (y=1) or non-cat (y=0)
A test set of m_test images labeled as cat or non-cat
Each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB)
Each image is square (height = num_px) and (width = num_px)

Cell #3 illustrates how to display an image.

# **** example of an image ****
index = 50
plt.imshow(train_set_x_orig[index])
print("y = index[" + str(index) + "], it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")

In this last cell we select image 50. Then using plt (common abbreviation for matplotlib) we display the associated image on the notebook. A message is displayed regarding the image. Take the time to understand which data is selected from the dataset to compose the message here displayed. It is quite important to get to know the data you are using in a project.

In cell #4 we get to display information about the dataset.

# **** extract and display information about the dataset ****
m_train = train_set_x_orig.shape[0]
m_test  = test_set_x_orig.shape[0]
num_px  = train_set_x_orig.shape[2]

print ("Number of training examples: m_train = " + str(m_train))
print (" Number of testing examples: m_test = " + str(m_test))
print (" Height/Width of each image: num_px = " + str(num_px))
print ("      Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("          train_set_x shape: " + str(train_set_x_orig.shape))
print ("          train_set_y shape: " + str(train_set_y.shape))
print ("           test_set_x shape: " + str(test_set_x_orig.shape))
print ("           test_set_y shape: " + str(test_set_y.shape))

We should see that in this case we are using 209 images to train and 50 to test. The training data should only be used to train the model and the test data to test it. To help us understand the dataset, I will include the output from this cell:

Number of training examples: m_train = 209
 Number of testing examples: m_test = 50
 Height/Width of each image: num_px = 64
      Each image is of size: (64, 64, 3)
          train_set_x shape: (209, 64, 64, 3)
          train_set_y shape: (1, 209)
           test_set_x shape: (50, 64, 64, 3)
           test_set_y shape: (1, 50)

The load_dataset() function in cell #2 returned something called “classes”. In cell #5 we will explore what is in that variable.

# **** explore classes (image label identifying the class of the sample) ****
print("classes.shape: " + str(classes.shape))
print("   classes[0]: " + str(classes[0]))
print("   classes[1]: " + str(classes[1]))
print()

# **** display shapes ****
print("type(train_set_y): " + str(type(train_set_y)))
print("train_set_y.shape: " + str(train_set_y.shape))

It seems that classes is a Numpy array that contains the two possible labels label associated with each image. One label is to be assigned to images with cat (‘cat’) and the other to images without a cat (‘non-cat’).

Let’s continue to explore the train set and determine how many images have been labeled with and without cats.

# **** how many cats in train set ****
cats = 0
for i in range(m_train):
    x = train_set_y[0][i]
    if x == 1:
        cats += 1

print("in training set")
print("    cats: " + str(cats))
print("non-cats: " + str(train_set_y.shape[1] - cats))

In cell #6 we traverse the train_set_y which contains the labels for each train image. The output shows that 72 images of 209 contain cats and 137 do not. Seems like it is a small sample and should be interesting to get a model and see how it performs with such a reduced set of data.

# **** how many cats in the test set ****
cats = 0
for i in range(m_test):
    x = test_set_y[0][i]
    if x == 1:
        cats += 1
print("in test set")
print("    cats: " + str(cats))
print("non-cats: " + str(test_set_y.shape[1] - cats))

In cell #7 we repeat what we did for the training data this time with the test data.

Cell #8 is used to create a data set of only two images with 4 x 4 pixels each. The structure in this case is for images using three planes. One plane contains the RED values of the pixels, the next the GREEN values and the third the values for the BLUE component for each pixel. I did this just to better understand how the data was being presented to the algorithm.

# **** create a first set of two images ****
XX = np.zeros((2, 3, 4, 4))
print("XX.shape: " + str(XX.shape))

# **** fill the first image ****
XX[0][0].fill(1)  # RED plane
XX[0][1].fill(2)  # GREEN plane
XX[0][2].fill(3)  # BLUE plane

# **** fill the second image ****
XX[1][0].fill(4)  # RED plane
XX[1][1].fill(5)  # GREEN plane
XX[1][2].fill(6)  # BLUE plane

# **** display the two images ****
print("XX:\n" + str(XX))
print()

# **** display the shape of the first image ****
print("XX[0].shape: " + str(XX[0].shape))
print("XX[0]:\n" + str(XX[0]))
print()

# **** display the shape of the second image ****
print("XX[1].shape: " + str(XX[1].shape))
print("XX[1]:\n" + str(XX[1]))
print()

# **** reshape images and display results ****
XX_flat = XX.reshape(XX.shape[0], -1).T
print("XX_flat:\n" + str(XX_flat))
print("len(XX_flat): " + str(len(XX_flat)))

I figured that it would be simpler and better for the algorithm to deal with the three color components at the same time. This is shown in cell #9.

# **** create a second set of two images ****
ZZ = np.zeros((2, 4, 4, 3))
print("ZZ.shape: " + str(ZZ.shape))

# **** fill the first image ****
ZZ[0][0][0].fill(1)  # RGB pixel
ZZ[0][0][1].fill(2)  # RGB pixel
ZZ[0][0][2].fill(3)  # RGB pixel
ZZ[0][0][3].fill(4)  # RGB pixel

ZZ[0][1][0].fill(5)  # RGB pixel
ZZ[0][1][1].fill(6)  # RGB pixel
ZZ[0][1][2].fill(7)  # RGB pixel
ZZ[0][1][3].fill(8)  # RGB pixel

ZZ[0][2][0].fill(9)  # RGB pixel
ZZ[0][2][1].fill(10) # RGB pixel
ZZ[0][2][2].fill(11) # RGB pixel
ZZ[0][2][3].fill(12) # RGB pixel

ZZ[0][3][0].fill(13) # RGB pixel
ZZ[0][3][1].fill(15) # RGB pixel
ZZ[0][3][2].fill(15) # RGB pixel
ZZ[0][3][3].fill(16) # RGB pixel

# **** fill the second image ****
ZZ[1][0][0].fill(17) # RGB pixel
ZZ[1][0][1].fill(18) # RGB pixel
ZZ[1][0][2].fill(19) # RGB pixel
ZZ[1][0][3].fill(20) # RGB pixel

ZZ[1][1][0].fill(21) # RGB pixel
ZZ[1][1][1].fill(22) # RGB pixel
ZZ[1][1][2].fill(23) # RGB pixel
ZZ[1][1][3].fill(24) # RGB pixel

ZZ[1][2][0].fill(25) # RGB pixel
ZZ[1][2][1].fill(26) # RGB pixel
ZZ[1][2][2].fill(27) # RGB pixel
ZZ[1][2][3].fill(28) # RGB pixel

ZZ[1][3][0].fill(29) # RGB pixel
ZZ[1][3][1].fill(30) # RGB pixel
ZZ[1][3][2].fill(31) # RGB pixel
ZZ[1][3][3].fill(31) # RGB pixel

# **** display images ****
print("ZZ:\n" + str(ZZ))
print()

# **** display the shape of each image ****
print("ZZ[1].shape: " + str(ZZ[1].shape))
print("ZZ[1]:\n" + str(ZZ[1]))
print()

# **** reshape the images and display results ****
ZZ_flat = ZZ.reshape(ZZ.shape[0], -1).T
print("ZZ_flat:\n" + str(ZZ_flat))
print("len(ZZ_flat): " + str(len(ZZ_flat)))

In this last cell you can see how each component stores a RED, GREEN and BLUE pixel value. The layout of the images is important when reshaping and normalizing. Having the three components next to each other provides locality.

# **** reshape the training and test examples ****
print("   train_set_x_orig.shape[0]: " + str(train_set_x_orig.shape[0]))
print("   train_set_x_orig.shape[1]: " + str(train_set_x_orig.shape[1]))
print("   train_set_x_orig.shape[2]: " + str(train_set_x_orig.shape[2]))
print("   train_set_x_orig.shape[3]: " + str(train_set_x_orig.shape[3]))

#print(" train_set_x_orig: " + str(train_set_x_orig[0, 0, 0]))
#print(" train_set_x_orig: " + str(train_set_x_orig[0, 0, 1]))

print(" train_set_x_orig:\n" + str(train_set_x_orig[0, 0, 0:2]))
print()

# **** flatten the train set ****
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T

# **** flatten the test set ****
test_set_x_flatten  = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T

print ("   train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print ("           train_set_y shape: " + str(train_set_y.shape))
print ("    test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print ("            test_set_y shape: " + str(test_set_y.shape))
print ("sanity check after reshaping: " + str(train_set_x_flatten[0:6,0]))

In cell #10 we are taking the pixels for each image and building a single vertical vector per image of 12,288 values. We have images of 64 * 64 pixels with 3 values (RGB) per pixel giving us 64 * 64 * 3 = 12,288 values. Typical images use a single byte per component so each image is now a single vertical / columnar vector [12888, 1]. Our training sample would have [12288, 209] values while the test version would have [12288, 50].

# **** standardize the datasets ****
train_set_x = train_set_x_flatten / 255.
test_set_x  = test_set_x_flatten / 255.

In cell #11 we are standardizing values. As previously mentioned, in RGB images, pixels are represented by three bytes. The minimum value is 0x00 which maps to no color and 0xff which maps to full color. By dividing each pixel value by 255 (0xff) we end up with values in the range [0.0 : 1.0]. This is done to eliminate different dimensions on different data.

# **** display the length of each sample ****
print("len(train_set_x): " + str(len(train_set_x)))

This cell verifies the length of our training set.

The key steps for this exercise are:

Initialize the parameters of the model
Learn the parameters for the model by minimizing the cost
Use the learned parameters to make predictions
Analyze the results

To keep some sanity, we will define a set of helper functions and methods that will help us put together the complete model at the end of this post.

# **** sigmoid function ****
def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """

    s = 1 / (1 + np.exp(-z))
    
    return s

In cell #13 declare a function to compute the sigmoid of a scalar or a numpy array of any size. You can read more about the sigmoid function here.

# **** test operation of sigmoid function ****
print("sigmoid([-2, 0, 2]): " + str(sigmoid(np.array([-2.0, 0, 2]))))
print("   1 - sigmoid([2]): " + str(1 - sigmoid(np.array([2]))))

In cell #14 we run a simple test on the sigmoid function to verify it. We first compute the sigmoid of three points. The sigmoid function is centered in 0.0 with a value of 0.5. If we compute the sigmoid of -2 and of 2, the values should be symmetrical to the limits. That is what we verify with the second computation. If the results do not match, then we have an issue with the code.

# **** initialize with zeros function ****
def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    
    # **** initialize the variables ****
    w = np.zeros(shape=(dim, 1), dtype=np.float)
    b = 0.0

    # ***** check if something went wrong ****
    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    # **** return the array w and the constant b ****
    return w, b

In cell #15 we define a function to initialize a vector w with the specified dimension and a variable b with values of zero (0.0 because we want to use the numpy float type).

# **** check the operation of initialize_with_zeros ****
dim = 3
w, b = initialize_with_zeros(dim)
print ("w:\n" + str(w))
print ("b: " + str(b))

We now test the operation of the previous function. We specify a dimension of 3 and we should get a columnar vector w with 3 entries and all values set to 0.0 in addition to variable b set to 0.0. If this is not the case, we need to check the implementation of the initialize_with_zeros() function.

# **** define a function returning multiple values ****
def multipleVals(a, b, c):
    
    # **** compute values ****
    aa = np.square(a)
    bb = np.sqrt(b)
    cc = np.abs(c)
    
    # **** return values ****
    return aa, bb, cc

aa = 2
bb = 36
cc = -7
aa, bb, cc = multipleVals(aa, bb, cc)

print("aa: " + str(aa))
print("bb: " + str(bb))
print("cc: " + str(cc))

In the previous cell we define the function multipleVals() which is not part of the logistic regression model. The purpose of the function is to make sure we understand that in Python a function may return multiple values. Another feature in Python is to return a single dictionary with a set of values. We will see such use later on.

The cell defines the function in which we compute three values using numpy method, sets values for the three arguments, calls the function, and then displays the three returned values.

OK, let’s now implement the propagate function. This is illustrated in cell #18.

# **** define the propagate() function ****
def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation.

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    """
    
    # **** get the number of examples ****
    m = X.shape[1]
    
    # **** forward propagation (from X TO cost) ****
    A = sigmoid(np.dot(w.T, X) + b)                                               # compute activation
    cost = (-1. / m) * np.sum((Y * np.log(A) + (1 - Y) * np.log(1 - A)), axis=1)  # compute cost
    
    # **** backward propagation (TO FIND GRAD) ****
    dw = (1. / m) * np.dot(X,((A - Y).T))
    db = (1. / m) * np.sum(A - Y, axis=1)

    # **** ****
    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    # **** define a dictionary with the gradients (derivatives) ****
    grads = {"dw": dw,
             "db": db}
    
    # **** ****
    return grads, cost

In this function we compute the gradients and cost for the forward and backward propagation steps.

In cell #19 we run a test to verify the operation of the propagate() function.

# **** test the propagate function (forward and backward propagation) ****
w, b, X, Y = np.array([[1.],[2.]]), 2., np.array([[1., 2., -1.],[3., 4., -3.2]]), np.array([[1, 0, 1]])
print("w:\n" + str(w))
print("b: " + str(b))
print("X:\n" + str(X))
print("Y: " + str(Y))
print()

grads, cost = propagate(w, b, X, Y)
print ("  dw:\n" + str(grads["dw"]))
print ("  db: " + str(grads["db"]))
print ("cost: " + str(cost))

We first define the four arguments, call the propagate() function and then display the returned values. Note that the entire point is to verify that all the values match what is expected. In this case the output should look like:

w:
[[1.]
 [2.]]
b: 2.0
X:
[[ 1.   2.  -1. ]
 [ 3.   4.  -3.2]]
Y: [[1 0 1]]

  dw:
[[0.99845601]
 [2.39507239]]
  db: [0.00145558]
cost: 5.801545319394553

In cell #20 we will define and implement the optimize() function.

# **** optimize() function ****
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    This function optimizes w and b by running a gradient descent algorithm
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps
    
    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
    """
    
    # **** define the array for the costs ****
    costs = []
    
    # **** loop for the specified number of times ****
    for i in range(num_iterations):
        
        # **** compute cost and gradients ****
        grads, cost = propagate(w, b, X, Y)
        
        # **** retrieve derivatives from grads ****
        dw = grads["dw"]
        db = grads["db"]
        
        # **** update the rule ****
        w = w - learning_rate * dw
        b = b - learning_rate * db
        
        # **** record the costs (???? every 100 steps ????) ****
        if i % 100 == 0:
            costs.append(cost)
        
        # **** print the cost every 100 training examples ****
        if print_cost and i % 100 == 0:
            print ("cost after iteration %i: %f" %(i, cost))
    
    # **** save the values in a dictionary ****
    params = {"w": w,
              "b": b}
    
    # **** save the gradients in a dictionary ****
    grads = {"dw": dw,
             "db": db}
    
    # **** return values ****
    return params, grads, costs

As you can see things are falling in place to compute gradient descent.

# **** test the optimize() function ****
params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)

print (" w:\n" + str(params["w"]))
print (" b: "  + str(params["b"]))
print ("dw:\n" + str(grads["dw"]))
print ("db: "  + str(grads["db"]))

As usual we need to test each function to make sure we did not make a simple mistake. In the previous cell we test the optimize() function. The expected results follow:

w:
[[0.19033591]
 [0.12259159]]
 b: [1.92535983]
dw:
[[0.67752042]
 [1.41625495]]
db: [0.2191945]

Please note that if you run more than once this (and in some cases other cells in this notebook) the results will vary. The reason for this is due to the fact that some variables are updated (e.g., grads). You can always rerun the entire notebook to clear things up.

# **** predict() function ****
def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    # **** get ready to compute the predictions ****
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)
    
    # **** compute vector "A" predicting the probabilities of a cat being present in the picture ****
    A = sigmoid(np.dot(w.T, X) + b)
    
    # **** loop once per sample ****
    for i in range(A.shape[1]):
        
        # **** convert probabilities A[0,i] to actual predictions p[0,i] ****
        Y_prediction[0,i] = np.where(A[0,i] > 0.5, 1, 0)

    # **** check the shape of our predictions ****
    assert(Y_prediction.shape == (1, m))
    
    # **** return our predictions ****
    return Y_prediction

In cell #22 we define the predict() function. This function predicts the labels for each sample. As you can recall, we should obtain a 1 for cats and a 0 for non-cats.

# **** test predictions() function ****
w = np.array([[0.1124579],[0.23106775]])
b = -0.3
X = np.array([[1., -1.1, -3.2],[1.2, 2., 0.1]])

print("predictions: " + str(predict(w, b, X)))

In cell #23 we test the predictions using some random data. Given that this is specific to this notebook the results follow:

predictions: [[1. 1. 0.]]

OK, seems we are ready to define our model. In this function we will use most of the functions we developed in earlier steps.

# **** model function ****
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to true to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """
    
    # **** initialize parameters with zeros ****
    w, b = initialize_with_zeros(X_train.shape[0])
    
    # **** compute the gradient descent ****
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    
    # **** retrieve parameters w and b from dictionary "parameters" ****
    w = parameters["w"]
    b = parameters["b"]
    
    # **** predict test / train set examples ****
    Y_prediction_test  = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    # **** print train / test errors ****
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print(" test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
    
    # **** dictionary containing information about this model ****
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    # **** return dictionary with model information ****
    return d

You can see in cell #24 the steps used to generate our model. We display the accuracy and return all necessary values in a dictionary to later use this model.

# **** train our model ****
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

Now let’s train the model. This is illustrated by calling the model() function with the specified set of arguments.

In this case our model returns the following values after training is completed:

cost after iteration 0: 0.693147
cost after iteration 100: 0.584508
cost after iteration 200: 0.466949
cost after iteration 300: 0.376007
cost after iteration 400: 0.331463
cost after iteration 500: 0.303273
cost after iteration 600: 0.279880
cost after iteration 700: 0.260042
cost after iteration 800: 0.242941
cost after iteration 900: 0.228004
cost after iteration 1000: 0.214820
cost after iteration 1100: 0.203078
cost after iteration 1200: 0.192544
cost after iteration 1300: 0.183033
cost after iteration 1400: 0.174399
cost after iteration 1500: 0.166521
cost after iteration 1600: 0.159305
cost after iteration 1700: 0.152667
cost after iteration 1800: 0.146542
cost after iteration 1900: 0.140872
train accuracy: 99.04306220095694 %
 test accuracy: 70.0 %

In cell #26 we loop displaying how our model predicted versus what the label is per test image.

# **** display how images were classified ****
wrong = 0
for index in range(0, test_set_y.shape[1]):
    if test_set_y[0, index] == d["Y_prediction_test"][0,index]:
        print("RIGHT")
    else:
        print("WRONG")
        wrong += 1
    
    print("index: " + str(index) + " y: " + str(test_set_y[0, index]) + " y_prediction: " + str(int(d["Y_prediction_test"][0,index])))

print("wrong: " + str(wrong))

Output of this cell follows:

RIGHT
index: 0 y: 1 y_prediction: 1
RIGHT
index: 1 y: 1 y_prediction: 1
RIGHT
index: 2 y: 1 y_prediction: 1
RIGHT
index: 3 y: 1 y_prediction: 1
RIGHT
index: 4 y: 1 y_prediction: 1
WRONG
index: 5 y: 0 y_prediction: 1
WRONG
index: 6 y: 1 y_prediction: 0
RIGHT
index: 7 y: 1 y_prediction: 1
RIGHT
index: 8 y: 1 y_prediction: 1
RIGHT
index: 9 y: 1 y_prediction: 1
WRONG
index: 10 y: 1 y_prediction: 0
WRONG
index: 11 y: 1 y_prediction: 0
RIGHT
index: 12 y: 1 y_prediction: 1
WRONG
index: 13 y: 0 y_prediction: 1
RIGHT
index: 14 y: 0 y_prediction: 0
RIGHT
index: 15 y: 1 y_prediction: 1
RIGHT
index: 16 y: 0 y_prediction: 0
RIGHT
index: 17 y: 1 y_prediction: 1
WRONG
index: 18 y: 1 y_prediction: 0
WRONG
index: 19 y: 1 y_prediction: 0
RIGHT
index: 20 y: 1 y_prediction: 1
RIGHT
index: 21 y: 0 y_prediction: 0
RIGHT
index: 22 y: 0 y_prediction: 0
RIGHT
index: 23 y: 1 y_prediction: 1
RIGHT
index: 24 y: 1 y_prediction: 1
RIGHT
index: 25 y: 1 y_prediction: 1
RIGHT
index: 26 y: 1 y_prediction: 1
RIGHT
index: 27 y: 0 y_prediction: 0
WRONG
index: 28 y: 1 y_prediction: 0
WRONG
index: 29 y: 0 y_prediction: 1
WRONG
index: 30 y: 1 y_prediction: 0
RIGHT
index: 31 y: 1 y_prediction: 1
RIGHT
index: 32 y: 1 y_prediction: 1
WRONG
index: 33 y: 1 y_prediction: 0
WRONG
index: 34 y: 0 y_prediction: 1
RIGHT
index: 35 y: 0 y_prediction: 0
RIGHT
index: 36 y: 0 y_prediction: 0
RIGHT
index: 37 y: 1 y_prediction: 1
RIGHT
index: 38 y: 0 y_prediction: 0
RIGHT
index: 39 y: 0 y_prediction: 0
RIGHT
index: 40 y: 1 y_prediction: 1
WRONG
index: 41 y: 1 y_prediction: 0
RIGHT
index: 42 y: 1 y_prediction: 1
RIGHT
index: 43 y: 0 y_prediction: 0
WRONG
index: 44 y: 0 y_prediction: 1
RIGHT
index: 45 y: 0 y_prediction: 0
WRONG
index: 46 y: 1 y_prediction: 0
RIGHT
index: 47 y: 1 y_prediction: 1
RIGHT
index: 48 y: 1 y_prediction: 1
RIGHT
index: 49 y: 0 y_prediction: 0
wrong: 15

You can see that on the 50 samples 35 were properly classified. Of the 15 that were incorrectly classified, there are a mixture of cats not classified as cats (index 10) and non-cats classified as cats (index 5). These are FALSE-NEGATIVE and FALSE-POSITIVE classifications. You can read more about this here and here.

# **** example of a picture that was wrongly classified ****
index = 10
plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))
#print("y = " + str(test_set_y[0, index]) + ", you predicted that it is a \"" + classes[d["Y_prediction_test"][0,index]].decode("utf-8") +  "\" picture.")

In cell #27 we display image 5 which our model misclassified. That looks like a small frog on a green leaf. You could also check image 10 which is the image of a cat but our model failed to classify it as one.

# **** plot learning curve (with costs) ****
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

This cell is used to generate and display a plot of cost versus number of iterations. As you can see the cost gradually reduced as the number of iterations increased.

Let’s now compare different training rates. In the following cell #29 we will compare three learning rates.

# **** try three different learning rates ****
learning_rates = [0.01, 0.001, 0.0001]
models = {}

# **** populate the three models ****
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

# **** display information about each model ****
for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

# **** plot the cost versus iterations for the models ****
plt.ylabel('cost')
plt.xlabel('iterations')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

We first define the three learning rates. Then we populate the models. We display accuracy information about them and finalize with a plot of cost versus iterations.  The results follow:

learning rate is: 0.01
train accuracy: 99.52153110047847 %
 test accuracy: 68.0 %

-------------------------------------------------------

learning rate is: 0.001
train accuracy: 88.99521531100478 %
 test accuracy: 64.0 %

-------------------------------------------------------

learning rate is: 0.0001
train accuracy: 68.42105263157895 %
 test accuracy: 36.0 %

-------------------------------------------------------

The plot follows:

The final cell #30 in this Jupyter notebook allows us to specify an image and submitted it to our model. I tried different ones for cats and dogs. Of the eight images all were properly classified.

# **** ****
my_image = "dog4.jpg"        # change this to the name of your image file 

# **** preprocess the image to fit your algorithm ****
fname = "images/" + my_image
image = np.array(plt.imread(fname))
my_image = skimage.transform.resize(image, output_shape=(num_px,num_px)).reshape((1, num_px*num_px*3)).T

# **** ****
my_predicted_image = predict(d["w"], d["b"], my_image)

# **** ****
plt.imshow(image)

# **** ****
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") +  "\" picture.")

The model predicted:

The image was not of a cat.

Hope I was clear enough to describe how to put together a linear regression model to separate cat versus non-cat images.

I have pushed Jupyter notebook and additional data to my GitHub repository,

As usual, if you have comment or questions regarding this or any other post in this blog, or if you need help with some software in any part of the life cycle, please do not hesitate and leave me a message below. Messages only appear after I approve them.

Keep on learning, experimenting and having fun developing software;

John

Follow me on Twitter:  @john_canessa

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.