Two Dimensional Array

A few days ago a group of software engineers were discussing how the order in which the numbers of rows versus columns in a two dimensional array affect performance. That is; if an array has more rows than columns as opposed to more columns than rows, the time it takes to traverse the array will be affected.

The issue discussed is due to the fact that when all is said and done, the code is being executed by a processor (CPU) which needs to load values from the array represented in memory into a register. The time it takes to load the value depends on the representation of the array in memory and the algorithm used to traverse / process the array.

The discussion reminded me of the port of the well known NoSQL database engine Apache Cassandra written in Java to ScyllaDB written in C++. The company SCYLLA (www.scyladb.com) claims that their port is at least one order of magnitude faster than the original database. They achieved that by using C++ instead of Java and making some changes allowed by the different programming language.

I am not going to get into the actual port, but will show the results of a program written in C#. A screen capture of the C# program follows:

LargeSmall <<< rows: 10000 cols: 1000
main <<< sum: 1000000000 elapsed: 12.225 seconds
SmallLarge <<< cols: 1000 rows: 10000
main <<< sum: 1000000000 elapsed: 21.867 seconds

As one can see, processing a byte array of 10,000 rows by 1,000 columns took 12.27 seconds. This processing was performed by a function named LargeSmall(). Processing consisted on adding up all the entries in the array. The same array was then processed by the SmallLarge() function. The processing was performed by traversing the same array in a different order.

Initially I populated the array with random values. All was well when running the code written in the same programming language. When using different languages the initialization of the pseudo random number generator produced different values. For simplicity I set all values in the array to one (1).

The C# code follows:

// **** ****
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace arrays
{

    class Program
    {

        // **** constants ****
        const int ROWS = 10000;
        const int COLS = 1000;
        const int MAX_VAL = 100;
        const int PASSES = 100;

        // **** ****
        static double LargeSmall(byte[,] arr, int rows, int cols)
        {

            // **** variables ****
            double sum = 0.0;

            // **** inform what is going on ****
            System.Console.WriteLine("LargeSmall <<< rows: {0} cols: {1}", rows, cols);

            // **** loop a few times to average time ****
            for (int pass = 0; pass < PASSES; pass++)
            {

                // **** loop through the elements in the array ****
                for (int row = 0; row < rows; row++)
                {
                    for (int col = 0; col < cols; col++)
                    {
                        // **** add the value of this element to the sum ****
                        sum += arr[row, col];
                    }
                }
            }

            // **** return the sum ****
            return sum;
        }

        // **** ****
        static double SmallLarge(byte[,] arr, int rows, int cols)
        {

            // **** variables ****
            double sum = 0.0;

            // **** inform what is going on ****
            System.Console.WriteLine("SmallLarge <<< cols: {0} rows: {1}", cols, rows);

            // **** loop a few times to average time ****
            for (int pass = 0; pass < PASSES; pass++)
            {

                // **** loop through the elements in the array ****
                for (int col = 0; col < cols; col++)
                {
                    for (int row = 0; row < rows; row++)
                    {
                        // **** add the value of this element to the sum ****
                        sum += arr[row, col];
                    }
                }
            }

            // **** return the sum ****
            return sum;
        }

        // **** ****
        static int Main(string[] args)
        {

            // **** variables ****
            byte[,] arr = new byte[ROWS, COLS];
            double  sum = 0.0;

            // **** populate array with random values ****
            for (int row = 0; row < ROWS; row++)
            {
                for (int col = 0; col < COLS; col++)
                {
                    arr[row, col] = 1;
                }
            }

            // **** loop once per test ****
            for (int i = 0; i < 2; i++)
            {

                // **** get start time ****
                var watch = System.Diagnostics.Stopwatch.StartNew();

                // **** compute the sum ****
                switch (i)
                {
                    case 0:
                        sum = LargeSmall(arr, ROWS, COLS);
                        break;

                    case 1:
                        sum = SmallLarge(arr, ROWS, COLS);
                        break;

                    default:
                        System.Console.WriteLine();
                        return -1;
                        break;
                }

                // **** compute the elapsed time ****
                watch.Stop();
                var elapsedMs = watch.ElapsedMilliseconds;

                // **** display the sum and elapsed time ****
                System.Console.WriteLine("main <<< sum: {0} elapsed: {1} seconds ", sum, elapsedMs / 1000.0);
            }

            // **** inform the caller what went on****
            return 0;
        }
    }
}

I then ported the code to Java. The output of the Java code follows:

LargeSmall <<< rows: 10000 cols: 1000
main <<< sum: 1000000000.00 elapsed: 2.597 seconds
SmallLarge <<< cols: 1000 rows: 10000
main <<< sum: 1000000000.00 elapsed: 19.848 seconds

The Java code follows:

import java.text.DecimalFormat;
import java.util.Random;

public class Solution {

	// **** constants ****
	static final int	ROWS	= 10000;
	static final int	COLS	= 1000;
	static final int	MAX_VAL	= 100;
	static final int	PASSES	= 100;
	
	// **** ****
	static double LargeSmall(char[][] arr, int rows, int cols) {

		// **** variables ****
		double	sum	= 0.0;

		// **** inform what is going on ****
		System.out.println("LargeSmall <<< rows: " + rows + " cols: " + cols);

		// **** loop a few times to average the time ****
		for (int pass = 0; pass < PASSES; pass++) {
			
			// **** loop through the elements in the array ****
			for (int row = 0; row < rows; row++) {
				
				for (int col = 0; col < cols; col++) {
					
					// **** add the value of this element to the sum ***
					sum += arr[row][col]; 
				}
			}
			
		}

		// **** return the sum ****
		return sum;
	}
	
	// **** ****
	static double SmallLarge(char[][] arr, int rows, int cols) {
		
		// **** variables ****
		double	sum	= 0.0;
		
		// **** inform what is going on ****
		System.out.println("SmallLarge <<< cols: " + cols + " rows: " + rows);
		
		// **** loop a few times to average the time ****
		for (int pass = 0; pass < PASSES; pass++) {
			
			// **** loop through the elements in the array ****
			for (int col = 0; col < cols; col++) {
				
				for (int row = 0; row < rows; row++) {
					
					// **** add the value of this element to the sum ***
					sum += arr[row][col]; 
				}
			}
			
		}
		
		// **** return the sum ****
		return sum;
	}
	
	// **** ****
	public static void main(String[] args) {

		// **** variables ****
		double		sum	= 0.0;
		char[][] 	arr = new char[ROWS][COLS];
	
		// **** seed the random number generator ****
//		Random rand = new Random(System.currentTimeMillis());

		// **** populate the array with random values ****
		for (int row = 0; row < ROWS; row++) {
			for (int col = 0; col < COLS; col++) {
//				arr[row][col] = rand.nextInt(MAX_VAL);
				arr[row][col] = 1;
			}
		}
	
		// **** loop once per test ****	
		for (int i = 0; i < 2; i++) {
			
			// **** get the start time ****
			long startTime = System.currentTimeMillis();
			
			// **** compute the sum ****
			switch (i) {
			case 0:
				sum = LargeSmall(arr, ROWS, COLS);
				break;
				
			case 1:
				sum = SmallLarge(arr, ROWS, COLS);
				break;
			
			default:
				System.out.println("main <<< unexpected i: " + i);
				System.exit(-1);
				break;
			}
			
			// **** get the end time ****
			long endTime = System.currentTimeMillis();
			
			// **** display the sum and elapsed time ****
			DecimalFormat formatter = new DecimalFormat("#0.00");
			System.out.println("main <<< sum: " + formatter.format(sum) + " elapsed: " + (endTime - startTime) / 1000.0 + " seconds");
		}

	}

}

Finally I ported the code to C and added a third function using a pointer and counter. The output of the C code follows:

LargeSmall <<< rows: 10000 cols: 1000
AddByteArray <<< sum: 1000000000.00 elapsed 4.90 seconds
SmallLarge <<< cols: 1000 rows: 10000
AddByteArray <<< sum: 1000000000.00 elapsed 12.52 seconds
AnySize <<< size: 10000000
AddByteArray <<< sum: 1000000000.00 elapsed 4.95 seconds

The C code follows:

double __stdcall	LargeSmall	(
								char	arr[ROWS][COLS],
								int		rows,
								int		cols
								)

//***************************************************************@@
// - 
//*****************************************************************

{

	// **** variables ****
	register double	sum		= 0.0;

	register int	col		= 0;
	register int	row		= 0;
	register int	pass	= 0;

	// **** inform what is going on ****
	printf("LargeSmall <<< rows: %d cols: %d\n", rows, cols);

	// **** loop a few times to average time ****
	for (pass = 0; pass < PASSES; pass++)
	{

		// **** loop through the elements in the array ****
		for (row = 0; row < rows; row++)
		{

			for (col = 0; col < cols; col++)
			{

				// **** add the value of this element to the sum ***
				sum += arr[row][col];
			}
		}

	}

	// **** return the array sum ****
	return sum;
}


double __stdcall	SmallLarge	(
								char	arr[ROWS][COLS],
								int		rows,
								int		cols
								)

//***************************************************************@@
// - 
//*****************************************************************

{

	// **** variables ****
	register double	sum		= 0.0;

	register int	col		= 0;
	register int	row		= 0;
	register int	pass	= 0;

	// **** inform what is going on ****
	printf("SmallLarge <<< cols: %d rows: %d\n", cols, rows);

	// **** loop a few times to average the time ****
	for (pass = 0; pass < PASSES; pass++)
	{

		// **** loop through the elements in the array ****
		for (col = 0; col < cols; col++)
		{

			for (row = 0; row < rows; row++)
			{

				// **** add the value of this element to the sum ***
				sum += arr[row][col];
			}
		}

	}

	// **** return the array sum ****
	return sum;
}


double __stdcall	AnySize	(
							char	*arr,
							int		size
							)

//***************************************************************@@
// - 
//*****************************************************************

{

	// **** variables ****
	register char	*bp		= NULL;

	register int	i		= 0;
	register int	pass	= 0;

	register double	sum		= 0.0;

	// **** inform what is going on ****
	printf("AnySize <<< size: %d\n", size);

	// **** loop a few times to average the time ****
	for (pass = 0; pass < PASSES; pass++)
	{

		// **** loop through the elements in the array ****
		for (i = 0, bp = arr; i < size; i++, bp++)
		{

			// **** add the value of this element to the sum ***
			sum += *bp;
		}

	}

	// **** return the array sum ****
	return sum;
}


int	__stdcall	AddByteArray	(
								void
								)

//***************************************************************@@
// - This function tests the speed of an operation when traversing
// an integer array in different orders.
//*****************************************************************

{

	// **** variables ****
	register char	*bp			= NULL;

	register int	i			= 0;

	char			*arr		= NULL;

	int				status		= 0;

	double			sum			= 0;

	longlong		elapsedTime	= 0,
					endTime		= 0,
					frequency	= 0,
					startTime	= 0;

	// **** allocate the array ****
	arr = (char *)calloc((size_t)1, (size_t)(ROWS * COLS));

	// **** seed the random number generator ****
//	srand((unsigned)time(NULL));

	// **** populate the array with random values ****
	for (i = 0, bp = arr; i < (ROWS * COLS); i++, bp++)
	{
//		*bp = rand() % MAX_VAL;
		*bp = 1;
	}

	// **** get the frequency for the timer ****
	QueryPerformanceFrequency((LARGE_INTEGER *)&frequency);

	// **** loop once per test ****	
	for (int i = 0; i < 3; i++)
		{

		// **** get the start time ****
		status = (int)QueryPerformanceCounter((LARGE_INTEGER *)&startTime);

		// **** compute the array sum ****
		switch (i)
			{
			case 0:
				sum = LargeSmall(arr, ROWS, COLS);
			break;

			case 1:
				sum = SmallLarge(arr, ROWS, COLS);
			break;

			case 2:
				sum = AnySize(arr, ROWS * COLS);
			break;

			default:
				printf("AddByteArray <<< unexpected i: %d\n", i);
				return -1;
			break;
			}

		// **** get the end time ****
		status = (int)QueryPerformanceCounter((LARGE_INTEGER *)&endTime);

		// **** compute the elapsed time ****
		elapsedTime = endTime - startTime;

		// **** display the sum and elapsed time ****
		if (frequency != (longlong)0)
			printf("AddByteArray <<< sum: %.2f elapsed %.2f seconds\n", sum, (double)elapsedTime / (double)frequency);
		}

	// **** inform caller what went on ****
	return 0;
}

Regardless of the programming language, it seems that the execution of the first function LargeSmall() tends to be faster than that of the LargeSmall(). The reason for this is that the operations required by the CPU to index the rows is faster than when indexing the columns. The rows just require an increment while the columns require a computation because cells are not contiguous.

I always recommend a test driven approach. Start with some comments in the main and then tackle the additional functions in a progressive way until all is working properly. If during the process a new simpler and faster approach is discovered, go for it. Once all is operational, if needed, spend time optimizing the code.

When optimizing one should make sure that the effort is needed for the resulting gains in performance. It is not worth optimizing a function / method that runs on a background task, not too often called to achieve a few milliseconds. In contrast, if you have a function that is called many times a second, a few milliseconds (or in the worse case seconds) would be mandatory. This is just common sense.

I do not have more time this morning to go into a discussion of when to optimize code. Make sure that the code is peer reviewed and that you listen to comments and suggestions. Also make sure other developers and QA have the time to experiment and test the software before it is released.

All the code in this post was generated in a Windows machine. The C# code was developed using Visual Studio 2017 Enterprise Edition. The Java code was developed using Eclipse Oxygen and the C code was developed using Visual Studio Professional 2013.

If you have comments or questions regarding this or any other post in this blog, please leave me a message or if you prefer send me a message via email. I try to spend a couple hours every day learning, experimenting and posting in this blog.

John

john.canessa@gmail.com

Follow me on Twitter:  @john_canessa

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.