Sieve of Eratosthenes

It seems like my wife has a new doctor’s appointment towards the end of the day today. It has been a very hectic week. Hopefully we will be done with putting out fires soon. I enjoy waking up early morning and spending my time learning and working in two-hour blocks. Typically I get in 5 blocks on workdays and 2 to 3 on weekend days.

A few days ago I was solved a HackerRank challenge which dealt with prime numbers. I wrote a post but did not go into too many details. The approach I used was based on the Sieve of Eratosthenes.

The idea is to create a set of consecutive integers starting with number two (first prime).  We then discard numbers that are multiple of two, following with multiples of three, and multiples of five, and so forth.  As we write our implementation we will optimize the code.

What is a prime number? An integer greater than 1 that cannot be formed by multiplying two smaller numbers is called a prime number. For example 19 is a prime because there are no two smaller integer numbers starting with 2 that can be multiplied to produce 19. We can try by taking the square root of 19 which is approximately equal to 4.35889894354. The only integer numbers we could try are:  2, 3, and 4 because if we try 4 * 4 = 16 and 5 * 5 = 25. To read more about prime numbers check this Wikipedia page.

Let’s take a look at the code that generates the prime numbers written in Java:

	/*
	 * Generate list of prime numbers up to the specified one
	 */
	static int[] primeNumbers(int n) {
		
		// **** declare array of integers ****
		int[] nums = new int[n + 1];
		
		// **** generate list of integers ****
		for (int i = 0; i <= n; i++)
			nums[i] = i;
		
		// **** disregard nums[1] (not a prime) ****
		nums[1] = 0;
		
		// **** display the initial array ****
		displayIntArray(nums);

		// ***** used to set the upper limit for the loop ****
//		int srn = n;
		int srn = (int)Math.sqrt(n);
		
		// ???? ????
		System.out.println("srn: " + srn);
				
		// **** loop marking non-prime numbers ****
		for (int i = 2; i <= srn; i++) {
			
			// **** find the first number greater than i in the list that is non-zero ****
			if (nums[i] == 0)
				continue;
			
			// ???? ????
			System.out.println("i: " + i);
			
			// ***** ****
			for (int j = i * i; j <= n; j += i) {
				
				// ???? ????
				System.out.println("j: " + j);
				
				// **** flag as non-prime ****
				nums[j] = 0;	
			}

			// **** display the array ****
			displayIntArray(nums);
		}
		
		// **** count primes ****
		int count = 0;
		for (int i = 2; i <= n; i++)
			if (nums[i] != 0)
				count++;
		
		// ???? ????
		System.out.println("count: " + count);
		
		// **** instantiate array of primes ****
		int[] primes = new int[count];
		
		// **** populate array of primes ****
		int j = 0;
		for (int i = 2; i <= n; i++) {
			if (nums[i] != 0)
				primes[j++] = nums[i];
		}
			
		// **** return array of primes ****
		return primes;
	}

We start by generating an array of integers staring from 0 and ending on n. Depending on the programming language you are using, the first element in an array may be labeled o or 1 and the last n – 1 or n. In the case of Java arrays are indexed starting with element 0 and ending in element n – 1. For this reason our array of integers is declared of size n + 1.

We then populate the array with numbers in the range 0 to n corresponding to nums[0] and nums[n].

Given that we will use the approach of marking array locations with non-prime numbers with contents of 0, we have the first two entries in our array set to 0 (0 and 1 are not prime numbers).

We then display the array we are using to mark the prime numbers.

Our next step is to set the upper limit for the main loop. Please note that I have srn (should have labeled it upperLimit) to n or to the square root of n. The second one is an optimization to the algorithm as we will shortly see.

In the main loop we start with 2 (first prime number) and skip any array entry that has already been set to 0 indicating that the position in the array represents a non-prime number. In the pseudo code described in the Wikipedia article, the authors used a Boolean array in which non-prime numbers are flagged as false. Given that, as you will shortly see, we are displaying the array constantly to observe progress in the algorithm.

The loop starts with number 2. Given that the contents of num[2] == 2 and not 0, we proceed.

The next loop is used to mark as non-prime all multiples of the prime number. In the case of 2 we mark as non-prime 4, 6, 8, 10, 12, 14, 16, 18 and 20.

The outer loop now searches for the next non-prime number. In this case this number is 3. The inner loop now marks as non-primes all multiples of 3 as follows:  6, 9, 12, 15, and 18. Pretty cool, don’t you think?

After the outer loop reaches the end, we collect the prime numbers in an array and return them to the caller.

The console output for the pass with srn set to n follows:

n: 20
ar: 
        2    3    4    5    6    7    8    9   10 
  11   12   13   14   15   16   17   18   19   20 

srn: 20
i: 2
j: 4
j: 6
j: 8
j: 10
j: 12
j: 14
j: 16
j: 18
j: 20
ar: 
        2    3         5         7         9      
  11        13        15        17        19      

i: 3
j: 9
j: 12
j: 15
j: 18
ar: 
        2    3         5         7                
  11        13                  17        19      

i: 5
ar: 
        2    3         5         7                
  11        13                  17        19      

i: 7
ar: 
        2    3         5         7                
  11        13                  17        19      

i: 11
ar: 
        2    3         5         7                
  11        13                  17        19      

i: 13
ar: 
        2    3         5         7                
  11        13                  17        19      

i: 17
ar: 
        2    3         5         7                
  11        13                  17        19      

i: 19
ar: 
        2    3         5         7                
  11        13                  17        19      

count: 8
primes: 
   2    3    5    7   11   13   17   19 

Note the debug messages with the values for i.

After switching to use the square root the console output follows:

 
n: 20
ar: 
        2    3    4    5    6    7    8    9   10 
  11   12   13   14   15   16   17   18   19   20 

srn: 4
i: 2
j: 4
j: 6
j: 8
j: 10
j: 12
j: 14
j: 16
j: 18
j: 20
ar: 
        2    3         5         7         9      
  11        13        15        17        19      

i: 3
j: 9
j: 12
j: 15
j: 18
ar: 
        2    3         5         7                
  11        13                  17        19      

count: 8
primes: 
   2    3    5    7   11   13   17   19 

The results match. The difference is that we tested less values for i.

Hope you enjoyed this post. It is always important to understand what you are doing in always to produce robust and efficient code. I agree that if your goal is to check or generate a few prime numbers, a binary search on a fixed array might do the trick.

I wanted to note on the primeNumbers() function that it does not seem to follow the S in the SOLID set of design patterns. I wrote a short post on SOLID a couple years ago. You can also read more about it in this Wikipedia article.

The S in solid stands for the Single Responsibility Principle. It boils down to have a function perform one and only one thing on the object. The idea is that a function should only do a single thing; in this case, generate a list of prime numbers. Back in the time UNIX was very popular (LINUX has replaced most UNIX installations), one of the mottos of UNIX was to do one thing and do it well. If you needed more, you could use a pipe and pass data from one command to another in order to achieve your goal. The opposite would be to write complete applications / commands to perform exactly what you need. This is a recipe for disaster. A change of requirements resulting in code modification will probably induce undesired side effects which will just increase the technical debt. I am not considering the testing that needs to be performed to make sure all is well after modifying some part of the code. I will now step off my white horse.

If you are interested in design patterns for software development, I typically go to one of these books:  Agile Software Development Principles, Patterns and Practice by Robert C Martin and Design Patterns by the Gang of Four (Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides). These books did not get damaged by the water leak issue earlier this week.

If you are interested in the full code, you may find it in my GitHub repository.

OK, l got started so let me get back on the horse for a couple sentences. Many years ago people used the Waterfall Model for software development. The main drawbacks are that requirements change and the time it takes from conception to delivery.

Some years went by and different versions of Agile showed up. The main advantage is that it adapts better to changes in requirements and end users / clients are able to see software in less time. That said, the artifacts and tools associated with Agile are there for a reason. They mitigate the complexity and limitations of the methodology.

A few years ago the concept of microservices was created. Today there are platforms (e.g., Docker) that helps software be created with high reuse, and with little if any modification. Dependencies are reduced and overall technical debt is diminished. Please do not be confused, microservices are not a silver bullet but they are the next step from Agile.

Hope you enjoy this post. If you have comments or questions or if you need some help with any phase of the lifecycle in a software product / service, please feel free and leave me a note bellow. Perhaps I can be of assistance. The request for help notes will not become public.

Keep on learning and experimenting. That is a great way to produce better software and have fun while at it.

John

Follow me on Twitter:  @john_canessa

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.