Tuesday, June 22, 2010

Groovier Java RegEx Token Determination

In my last blog post, I looked at using a simple Java application to determine which characters would work as desired for splitting a String with String.split. Simple utilities like this one are often a perfect fit for Groovy and in the blog post I show a Groovy script version of the Java class featured in the previous post. I also demonstrate one Groovy gotcha.

Here is the Groovy script ported from the Java class detailed in the last blog post.


#!/usr/bin/env groovy

import java.util.regex.PatternSyntaxException

/**
* This simple script accepts a String as a potential regular expression token
* and demonstrates how this provided String work work as a token in a
* {@code String.split} invocation.
*/

NEW_LINE = System.getProperty("line.separator")

if (args.length < 1)
{
println "${NEW_LINE}No argument was provided. A candidate String token must be provided.${NEW_LINE}"
System.exit(-1)
}

String candidateToken = args[0]
println "Provided token is: ${candidateToken}"

stringWithCandidateToken =
"Java${candidateToken}has${candidateToken}regular${candidateToken}expression${candidateToken}support${candidateToken}."
println "String with candidate token is: ${stringWithCandidateToken}"

try
{
splitStrings = stringWithCandidateToken.split(candidateToken)
splitStrings.each()
{
println it
}
}
catch (PatternSyntaxException badRegExpPatternSyntax)
{
println "Unable to parse ${stringWithCandidateToken} on token ${candidateToken} using String.split method - ${badRegExpPatternSyntax.toString()}"
}


Although Groovy doesn't require exception handling even for checked exceptions, I intentionally caught the PatternSyntaxException (an unchecked/runtime exception) so that I could print out a little nicer error message than the normal stack trace.

The Groovy script writes out the same results as the simple Java application, so I won't show all the same screen snapshots shown in the previous blog post. However, there is one character that does behave differently with the Groovy script than it did with the Java application. When an asterisk (*) is provided as the candidate regular expression token for the Groovy script, the results are different than when it's provided to the Java application. In the Java application, the asterisk led to the PatternSyntaxException. In the Groovy script, something different happens:



Although the Groovy script version at first glance appears to have worked better than the simple Java application because it did not result in an exception, the whole point of this script and application was to determine an appropriate character to split on. This is a reminder that while Groovy largely IS Java, there are times when Groovy is different from Java.

In the case of the asterisk, Groovy tried to expand the asterisk first and it so happened that build.xml is the first file listed in the directory in which that script presides. As the screen output above indicates, the Groovy script thought that this file name was the token rather than *. This demonstrates that there are some issues with Groovy's handling of asterisk on the command-line as documented in Command line arguments containing * (asterisk) not passed correctly
and in Issues with the Windows startup batch files.

There are many things I like about Groovy and there are many ways in which it complements and enhances my Java development experience. This blog post has shown how Groovy can be very helpful in helping determine what works best with a Java API, but also demonstrates that there are some differences in Groovy and Java behavior that can mislead one if he or she thinks that Groovy always behaves exactly like Java.

No comments: