Tag Content Extractor

It is a nice sunny day, at least when I was having breakfast earlier this morning, in the Twin Cities of Minneapolis and St. Paul. Better yet, it is Friday!!!

Spoke with one of my sons. He and his family had scheduled a holiday and were on the road. When they moved, they build a home. Some years went by and last fall they decided to buy a new one that they liked. Shortly after they moved and put their first home on the market. A few months went by and finally they closed on it yesterday. I am very glad for them. Having two mortgages is not convenient at all.

Earlier this morning I tackled the Tag Content Extractor from HackerRank. It calls for a solution based on regular expressions.

After spending time and watching some videos to refresh the subject, I came up with the following Java code which I implemented using my Eclipse IDE:

	public static void regexChecker(String theRegex, String str2Check) {
		
		// ???? ????
		System.out.println("theRegex ==>" + theRegex + "<=="); System.out.println("str2Check ==>" + str2Check + "<== str2Check.length: " + str2Check.length());
		
		// **** ****
		Pattern pattern = Pattern.compile(theRegex);
		
		// **** ****
		Matcher regexMatcher = pattern.matcher(str2Check);
		
		// **** look for a match ****
		boolean matchFound = false;
		while(regexMatcher.find()) {
			
			// ???? ????
			System.out.println("regexMatcher.groupCount: " + regexMatcher.groupCount());
			
			for (int i = 0; i <= regexMatcher.groupCount(); i++)
				System.out.println("group(" + i + "): " + regexMatcher.group(i));
			
			System.out.println("regexMatcher.start: " + regexMatcher.start());
			System.out.println("regexMatcher.end: " + regexMatcher.end());

			// **** display the contents of the tag ****
			System.out.println(regexMatcher.group(2));
			
			// **** flag that a match was found ****
			matchFound = true;
		}
		
		// **** if match not found ****
		if (!matchFound)
			System.out.println("None");
		
		// ???? ????
		System.out.println();
	}

We need to find a pattern that complies, that is text bracketed by a set of matching tags which happens to adhere to the HTML standard.

I do not know about you, but at work I am seldom exposed to problems that require me to use regular expressions. I have a copy of Regular Expression Pocket Reference book and use Google search to find materials to refresh and learn. In this case I consulted both.

Allow me to show you the test examples that I used:

4

<h1>Nayeem loves counseling</h1>


<h1>
<h1>Sanjay has no watch</h1>
</h1>

<par>So wait for a while</par>
<Amee>safat codes like a ninja</amee>
<SA premium>Imtiaz has a secret crush</SA premium>

theRegex ==><(.+)>([^<]+)</\1><== str2Check ==>
<h1>Nayeem loves counseling</h1>

<== str2Check.length: 32
regexMatcher.groupCount: 2
group(0): 
<h1>Nayeem loves counseling</h1>

group(1): h1
group(2): Nayeem loves counseling
regexMatcher.start: 0
regexMatcher.end: 32
Nayeem loves counseling

theRegex ==><(.+)>([^<]+)</\1><== str2Check ==>
<h1>
<h1>Sanjay has no watch</h1>
</h1>

<par>So wait for a while</par><== str2Check.length: 67
regexMatcher.groupCount: 2
group(0): 
<h1>Sanjay has no watch</h1>

group(1): h1
group(2): Sanjay has no watch
regexMatcher.start: 4
regexMatcher.end: 32
Sanjay has no watch
regexMatcher.groupCount: 2
group(0): <par>So wait for a while</par>
group(1): par
group(2): So wait for a while
regexMatcher.start: 37
regexMatcher.end: 67
So wait for a while

theRegex ==><(.+)>([^<]+)</\1><== str2Check ==><Amee>safat codes like a ninja</amee><== str2Check.length: 37 None theRegex ==><(.+)>([^<]+)</\1><== str2Check ==><SA premium>Imtiaz has a secret crush</SA premium><== str2Check.length: 50
regexMatcher.groupCount: 2
group(0): <SA premium>Imtiaz has a secret crush</SA premium>
group(1): SA premium
group(2): Imtiaz has a secret crush
regexMatcher.start: 0
regexMatcher.end: 50
Imtiaz has a secret crush


1

<h1> Hello World  </h1>


theRegex ==><(.+)>([^<]+)</\1><== str2Check ==>
<h1> Hello World  </h1>

<== str2Check.length: 23
regexMatcher.groupCount: 2
group(0): 
<h1> Hello World  </h1>

group(1): h1
group(2):  Hello World  
regexMatcher.start: 0
regexMatcher.end: 23
 Hello World  

 
3

<h1>Hello</h1>
<h2>World</h2>

<a>Good Day</a> <name>John</name>

<h1>Hi<</h2>
<h2>John; how are you?</h2>


theRegex ==><(.+)>([^<]+)</\1><== str2Check ==>
<h1>Hello</h1>
<h2>World</h2>

<== str2Check.length: 28
regexMatcher.groupCount: 2
group(0): 
<h1>Hello</h1>

group(1): h1
group(2): Hello
regexMatcher.start: 0
regexMatcher.end: 14
Hello
regexMatcher.groupCount: 2
group(0): 
<h2>World</h2>

group(1): h2
group(2): World
regexMatcher.start: 14
regexMatcher.end: 28
World

theRegex ==><(.+)>([^<]+)</\1><== str2Check ==><a>Good Day</a> <name>John</name><== str2Check.length: 33
regexMatcher.groupCount: 2
group(0): <a>Good Day</a>
group(1): a
group(2): Good Day
regexMatcher.start: 0
regexMatcher.end: 15
Good Day
regexMatcher.groupCount: 2
group(0): <name>John</name>
group(1): name
group(2): John
regexMatcher.start: 16
regexMatcher.end: 33
John

theRegex ==><(.+)>([^<]+)</\1><== str2Check ==>
<h1>Hi<</h2>
<h2>John; how are you?</h2>

<== str2Check.length: 40
regexMatcher.groupCount: 2
group(0): 
<h2>John; how are you?</h2>

group(1): h2
group(2): John; how are you?
regexMatcher.start: 13
regexMatcher.end: 40
John; how are you?

The first one was provided by the challenge. The other two I just made up.

Note that the screen captures contain debug statements. They were commented out for the submission.

My entire solution can be found on my GitHub repository.

If you have comments or questions or if you need my services for a software development project, please leave me a note bellow. Requests for assistance will not be made public.

Keep of reading and experimenting. It is the only way to learn.

John
Follow me on Twitter: @john_canessa

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.