Block Views and Pooling Operations

In this post we will continue reading and experimenting with the contents of the PluralSight course “Building Image Processing Applications Using scikit-image” by Janani Ravi.

Please note that the course uses the Jupyter notebook to hold the code and results. In this post we will write modified code using the VSCode IDE and a Python script using GitHub Copilot. I would like to disclose that I am a Microsoft employee and have been using VSCode and Python for several years.

Let’s start with our setup.

# **** folder of interest ****
C:\Documents\_Image Processing\scikit-image-building-image-processing-applications\02\demos

# **** open file of interest using VSCode ****
(base) C:\Documents\_Image Processing\scikit-image-building-image-processing-applications\02\demos>code BlockViewsOnImageArrays.py

# **** execute python script of interest ****
(base) C:\Documents\_Image Processing\scikit-image-building-image-processing-applications\02\demos>python BlockViewsOnImageArrays.py

We start by getting to a folder of interest. In such a folder we will use the IDE VSCode to create and edit a Python script. As we progress we will run the script and display images.

# **** ****
import numpy as np                              # numpy is primary library for numeric array (and matrix) processing
from matplotlib import pyplot as plt            # pyplot is sub-library of matplotlib, pyplot is for plotting

# **** ****
import skimage.io                               # skimage is scikit-image library for image processing
from skimage import color                       # skimage.color is sub-library for converting color spaces
from skimage.util import view_as_blocks         # skimage.util is sub-library for various generic utilities (like view_as_blocks)


# **** read three_dogs image ****
three_dogs = skimage.io.imread(fname='./images/pexels-3-dogs.jpg')

# **** plot three_dogs RGB image ****
plt.imshow( three_dogs,
            interpolation='nearest')            # plot image, set interpolation to nearest
plt.title('three_dogs - RGB')                   # set image title
plt.show()                                      # show image

We start by importing libraries of interest.

We then read in a JPG color image containing three dogs. The image is then displayed.

# **** convert three_dogs to grayscale ****
three_dogs = color.rgb2gray(three_dogs)

# **** plot three_dogs grayscale image ****
plt.imshow( three_dogs, 
            cmap='gray')                        # plot image, set colormap to gray
plt.title('three_dogs - grayscale')             # set image title
plt.show()                                      # show image

We then convert the RGB image into grayscale. The grayscale image is then displayed.

# **** display shape of three_dogs (2D grayscale) ****
print(f'three_dogs.shape: {three_dogs.shape}')

# **** assign block spape 4 x 4 ****
block_shape = (4, 4)

# **** view three_dogs as blocks ****
three_dogs_blocks = view_as_blocks( three_dogs,
                                    block_shape=block_shape)

# **** display shape of three_dogs_blocks (H/4, W/4, 4, 4) ****
print(f'three_dogs_blocks.shape: {three_dogs_blocks.shape}')


# **** reshape three_dogs_blocks) ****
flattened_blocks = three_dogs_blocks.reshape(   three_dogs_blocks.shape[0],
                                                three_dogs_blocks.shape[1],
                                                -1)

# **** print shape of three_dogs_blocks ****
print(f'shape of the blocks image: {three_dogs_blocks.shape}')

# **** print shape of flattened image ****
print(f'shape of the flattened image: {flattened_blocks.shape}')


# **** mean-pooling: find the mean for each block ****
mean_blocks = np.mean(flattened_blocks, axis=2)

# **** plot mean_blocks ****
plt.imshow( mean_blocks,
            interpolation='nearest',            # plot image, set interpolation to nearest
            cmap='gray')                        # plot image, set colormap to gray
plt.title('mean_blocks - grayscale')            # set image title
plt.show()                                      # show image

In this step we take the original grayscale image and take 4 x 4 blocks to perform a pooling operation. In this case we will obtain the mean from each 4 x 4 block. The result of the operation is then displayed.

# **** max-pooling: find the max for each block 
#      max_pooling is used to find the most prominent feature in each block ****

max_blocks = np.max(flattened_blocks, axis=2)

# **** plot max_blocks ****
plt.imshow( max_blocks,
            interpolation='nearest',            # plot image, set interpolation to nearest
            cmap='gray')                        # plot image, set colormap to gray
plt.title('max_blocks - grayscale')             # set image title
plt.show()                                      # show image

In this step we repeat the pooling operation, but this time we obtain the max value of each 4 x 4 block. The resulting image is displayed.

# **** median-pooling: find the median for each block ****
median_blocks = np.median(flattened_blocks, axis=2)

# **** plot median_blocks ****
plt.imshow( median_blocks,
            interpolation='nearest',            # plot image, set interpolation to nearest
            cmap='gray')                        # plot image, set colormap to gray
plt.title('median_blocks - grayscale')          # set image title
plt.show()                                      # show image

Finally we repeat the pooling operation for the last time. In this case we obtain the median value in each 4 x 4 block. The resulting image is displayed.

With practice we can see that different operations extract different features from the original images.

The output for the screen capture without the images follows:

(base) C:\Documents\_Image Processing\scikit-image-building-image-processing-applications\02\demos>python BlockViewsOnImageArrays.py
three_dogs.shape: (344, 516)
three_dogs_blocks.shape: (86, 129, 4, 4)
shape of the blocks image: (86, 129, 4, 4)
shape of the flattened image: (86, 129, 16)

Hope you learned from this exercise. I know I did.

The complete code for this post is in my GitHub repository.

Enjoy;

John

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.