In some ways parsing the JavaNCSS results is the least interesting part of developing a Hudson plugin, as once I have implemented the parser, it is available for everyone. For that reason I will focus more on:
best practice techniques for parsing results
common gotchas
designing for extension
Getting started
First off, we need to analyse the results file format. In the case of JavaNCSS there are multiple ways that the results file can be generated: from the JavaNCSS program directly, from ANT or from Maven. This leads us onto gotcha #1
Gotcha #1:
Never assume that a build tool generates the same format of output when run from the command line, ANT or Maven.A case in point for Gotcha #1 is Findbugs which generates one XML format from the command line and ANT, and generates a different format that appears similar at first glance when run from Maven (mail thread). In this case it turns out that Maven 1 used the different format output, and it is feared that some people came to depend on this Maven 1 format, so when the plugin for Maven 2 was developed, they kept the Maven 1 format. In any case, the moral is don’t assume, check!
So we use the sample projects from Part 1 and generate an ANT and a Maven 2 XML report. First off, here is the report from ANT:
<?xml version="1.0"?>
<javancss>
<date>2008-04-12</date>
<time>11:22:30</time>
<packages>
<package>
<name>com.onedash.common</name>
<classes>1</classes>
<functions>3</functions>
<ncss>10</ncss>
<javadocs>3</javadocs>
<javadoc_lines>12</javadoc_lines>
<single_comment_lines>0</single_comment_lines>
<multi_comment_lines>0</multi_comment_lines>
</package>
<package>
...
</package>
...
<total>
<classes>5</classes>
<functions>8</functions>
<ncss>46</ncss>
<javadocs>9</javadocs>
<javadoc_lines>37</javadoc_lines>
<single_comment_lines>0</single_comment_lines>
<multi_comment_lines>0</multi_comment_lines>
</total>
<table>
<tr><td>Packages</td><td>Classes</td><td>Functions</td><td>NCSS</td><td>Javadocs</td><td>per</td></tr>
<tr><td>4.00</td><td>5.00</td><td>8.00</td><td>46.00</td><td>9.00</td><td>Project</td></tr>
<tr><td></td><td>1.25</td><td>2.00</td><td>11.50</td><td>2.25</td><td>Package</td></tr>
<tr><td></td><td></td><td>1.60</td><td>9.20</td><td>1.80</td><td>Class</td></tr>
<tr><td></td><td></td><td></td><td>5.75</td><td>1.13</td><td>Function</td></tr>
</table>
</packages>
<objects>
<object>
<name>com.onedash.common.Factory</name>
<ncss>7</ncss>
<functions>3</functions>
<classes>0</classes>
<javadocs>3</javadocs>
</object>
<object>
...
</object>
...
<averages>
<ncss>6.60</ncss>
<functions>1.60</functions>
<classes>0.00</classes>
<javadocs>1.80</javadocs>
</averages>
<ncss>46.00</ncss>
</objects>
<functions>
<function>
<name>com.onedash.common.Factory.Factory()</name>
<ncss>1</ncss>
<ccn>1</ccn>
<javadocs>1</javadocs>
</function>
<function>
...
</function>
...
<ncss>46.00</ncss>
</functions>
</javancss>
OK, first off, for those following the tutorial exactly, I have cheated a little. I added some more source files into the project to make sure that I have multiple classes is different packages. You can see the source code I built from here. Additionally, I have trimmed the output somewhat to highlight the interesting bits, removing the duplicate entries.
From this report file we can see a basic XML structure:
The root element is
<javancss>
and has child elements:<date>
,<time>
,<packages>
,<objects>
, and<functions>
The
<date>
and<time>
elements are the timestamp when the report was generated with the date inYYYY-MM-DD
format and the time inHH:MM:SS
formatThe
<packages>
element has child elements:<package>
,<total>
, and<table>
. There are multiple<package>
; elements, but only one<total>
and<table>
element.The
<package>
elements have child elements:<name>
,<classes>
,<functions>
,<ncss>
,<javadocs>
,<javadoc_lines>
,<single_comment_lines>
and<multi_comment_lines>
. The<name>
element contains the name of the package as aString
and the other elements contain totals asInteger
s.The
<total>
element has child elements:<classes>
,<functions>
,<ncss>
,<javadocs>
,<javadoc_lines>
,<single_comment_lines>
and<multi_comment_lines>
. These elements are the sum of all the corresponding<package>
children inside the<packages>
parentThe
<table>
element seems to be a HTML table.
The
<objects>
element has child elements:<object>
,<averages>
and<ncss>
. There are multiple<object>
elements, the<averages>
element contains the average results for all the<object>
elements and the<ncss>
element providing some form of total or average.The
<functions>
element has child elements:<function>
and<ncss>
. Again there are multiple<function>
elements with the<ncss>
element providing some form of total or average (interestingly the result appears to be the same as from<objects>
).
Now, let’s take a look at what Maven 2 gives us:
<?xml version="1.0"?>
<?xml version="1.0"?>
<javancss>
<date>2008-04-12</date>
<time>11:43:06</time>
<packages>
<package>
<name>com.onedash.common</name>
<classes>1</classes>
<functions>3</functions>
<ncss>10</ncss>
<javadocs>3</javadocs>
<javadoc_lines>12</javadoc_lines>
<single_comment_lines>0</single_comment_lines>
<multi_comment_lines>0</multi_comment_lines>
</package>
<package>
...
</package>
...
<total>
<classes>5</classes>
<functions>8</functions>
<ncss>46</ncss>
<javadocs>10</javadocs>
<javadoc_lines>42</javadoc_lines>
<single_comment_lines>3</single_comment_lines>
<multi_comment_lines>3</multi_comment_lines>
</total>
<table>
<tr><td>Packages</td><td>Classes</td><td>Functions</td><td>NCSS</td><td>Java
<tr><td>4.00</td><td>5.00</td><td>8.00</td><td>46.00</td><td>10.00</td><td>P
<tr><td></td><td>1.25</td><td>2.00</td><td>11.50</td><td>2.50</td><td>Packag
<tr><td></td><td></td><td>1.60</td><td>9.20</td><td>2.00</td><td>Class</td><
<tr><td></td><td></td><td></td><td>5.75</td><td>1.25</td><td>Function</td></
</table>
</packages>
<objects>
<object>
<name>com.onedash.common.api.Namer</name>
<ncss>2</ncss>
<functions>1</functions>
<classes>0</classes>
<javadocs>1</javadocs>
</object>
<object>
...
</object>
...
<averages>
<ncss>6.60</ncss>
<functions>1.60</functions>
<classes>0.00</classes>
<javadocs>2.00</javadocs>
</averages>
<ncss>46.00</ncss>
</objects>
<functions>
<function>
<name>com.onedash.common.api.Namer.newName()</name>
<ncss>1</ncss>
<ccn>1</ccn>
<javadocs>0</javadocs>
</function>
<function>
...
</function>
<ncss>46.00</ncss>
</functions>
</javancss>
Thankfully, this is the same format as for ANT. You will also be relieved to know that this is the format generated by the JavaNCSS program directly. Thus we only have to write one parser, and we do not have to detect what format we are parsing. But before I forget:
Best Practice #1:
When there are multiple formats of a report generated by different tools, make sure that your Hudson plugin can detect the different formats and can handle them appropriately (by either delegating to a different parser implementation, or by handling the differences on the fly).One of the goals of Hudson is to minimise configuration. So when a plugin can detect an configuration option automatically, it should detect it automatically (possibly providing an “Advanced” option button to let users override the detection if Hudson gets it wrong)
Start small
Looking at the JavaNCSS output, I see that there is a lot of information... and I only have one more Part left in this series! So I am not going to parse everything. I am sure that in the future I will extend the Hudson plugin to parse all of the file, but for now I am just going to concentrate on the <pacakages> element. This gives users something useful and it’s better than nothing.
But what happens when I do get around to parsing the <objects>
and <functions>
elements? People may have lots of old builds and they will want to see the trends of the <objects>
and <functions>
results. I have two choices:
Tell them “Sorry, out of luck”
Save the results with the build, and then the newer parser can extract the results when people want the trend.
Choosing between these two options can be difficult. My preference is to go with option two, as long as the results are not a really big file.
Best Practice #2:
If you are not parsing everything in the results file, and the results file is not too big, and it can be parsed without reference to the source code, copy the results file to Hudson so that future versions of your plugin can read the information you are not currently parsing.
Don’t over parse
The results that we parse are going to be placed into an Action
object. This Action
object will be serialized. When Hudson starts up, it reads all the results of all the builds. If we place too much information in our Action
object, this can have a detrimental effect on Hudson’s performance. When users have 50+ projects each with a couple of hundred builds, they will thank you for keeping your Action objects small.
Gotcha #2:
Don’t store too much in your Action objects.
Don’t under parse
OK, so I have just given out about storing too much in your Action objects. There is a second problem... not storing enough! Most reporting plugins try to present a trend graph to show progress over a number of builds. If we don’t store the information required to generate this trend graph inside our Action objects, then displaying the trend graph will require parsing all the result files for all builds of a project. This can have a detrimental effect on Hudson’s performance. When users have projects with a couple of hundred builds, they will thank you for keeping the information to generate the main trend graph inside your Action objects.
Gotcha #3:
Store the information for generating the Project level trend graph in your Action objects.A case in point for Gotach #3 is the cobertura plugin, which at the time of writing, does not store the information for the main trend graph in the Action object. I fully intend to fix this situation once I have finished this series!
How to parse
Most of the result files that you will encounter are XML based. We are writing our plugins in Java, so that gives us a range of parsers to choose from, e.g.
SAX
DOM
StAX
Roll your own
Etc.
Given that report files can end up very big for very big projects, we need to be careful how we parse the results:
Gotcha #4:
Don’t parse XML results using DOM, as this will require reading the entire report file into memory.
I am going to stick my neck out and make a recommendation:
Best Practice #3:
Use an XML pull parser to parse XML report files.They are generally faster, use less memory, and are better suited to a “hit-and-run” style of result extraction.
Be able to aggregate parsing results
You may think that there will only ever be one result file that you need to parse. Maven 2 usually throws a spanner into that model, and everyone has their own ANT build script, so:
Gotcha #5:
Don’t assume you only have to parse one report file for each project.This gotcha arrives from the code coverage plugins (emma, clover, cobertura). Initially, you would think that people are only interested in one code coverage result, i.e. the coverage for the project... so they will only have one result file that we need to parse, right? Wrong! Some tools/build scripts generate a report for each module but only generate a summary report in non-conforming HTML. Some tools / build scripts generate a report for unit tests and integration tests separately. It’s a mess, and don’t get me started on using different tools for different test types...
The parsing engine
Ok, so here is the parsing engine:
package hudson.plugins.javancss.parser;
import hudson.model.AbstractBuild;
import hudson.util.IOException2;
import org.xmlpull.v1.XmlPullParser;
import org.xmlpull.v1.XmlPullParserException;
import org.xmlpull.v1.XmlPullParserFactory;
import java.io.*;
import java.util.*;
public class Statistic implements Serializable {
private AbstractBuild<?, ?> owner;
private String name;
private long classes;
private long functions;
private long ncss;
private long javadocs;
private long javadocLines;
private long singleCommentLines;
private long multiCommentLines;
public static Collection<Statistic> parse(File inFile)
throws IOException, XmlPullParserException {
Collection<Statistic> results = new ArrayList<Statistic>();
FileInputStream fis = null;
BufferedInputStream bis = null;
try {
fis = new FileInputStream(inFile);
bis = new BufferedInputStream(fis);
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(false);
XmlPullParser parser = factory.newPullParser();
parser.setInput(bis, null);
// check that the first tag is <javancss>
expectNextTag(parser, "javancss");
// skip until we get to the <packages> tag
while (parser.getDepth() > 0
&& (parser.getEventType() != XmlPullParser.START_TAG
|| !"packages".equals(parser.getName()))) {
parser.next();
}
while (parser.getDepth() > 0
&& (parser.getEventType() != XmlPullParser.START_TAG
|| !"package".equals(parser.getName()))) {
parser.next();
}
while (parser.getDepth() >= 2
&& parser.getEventType() == XmlPullParser.START_TAG
&& "package".equals(parser.getName())) {
Map<String, String> data = new HashMap<String, String>();
String lastTag = null;
String lastText = null;
int depth = parser.getDepth();
while (parser.getDepth() >= depth) {
parser.next();
switch (parser.getEventType()) {
case XmlPullParser.START_TAG:
lastTag = parser.getName();
break;
case XmlPullParser.TEXT:
lastText = parser.getText();
break;
case XmlPullParser.END_TAG:
if (parser.getDepth() == 4
&& lastTag != null
&& lastText != null) {
data.put(lastTag, lastText);
}
lastTag = null;
lastText = null;
break;
}
}
if (data.containsKey("name")) {
Statistic s = new Statistic(data.get("name"));
s.setClasses(Long.valueOf(data.get("classes")));
s.setFunctions(Long.valueOf(data.get("functions")));
s.setNcss(Long.valueOf(data.get("ncss")));
s.setJavadocs(Long.valueOf(data.get("javadocs")));
s.setJavadocLines(Long.valueOf(data.get("javadoc_lines")));
s.setSingleCommentLines(Long.valueOf(data.get("single_comment_lines")));
s.setMultiCommentLines(Long.valueOf(data.get("multi_comment_lines")));
results.add(s);
}
parser.next();
}
} catch (XmlPullParserException e) {
throw new IOException2(e);
} finally {
if (bis != null) {
bis.close();
}
if (fis != null) {
fis.close();
}
}
return results;
}
private static void skipTag(XmlPullParser parser)
throws IOException, XmlPullParserException {
parser.next();
endElement(parser);
}
private static void expectNextTag(XmlPullParser parser, String tag)
throws IOException, XmlPullParserException {
while (true) {
if (parser.getEventType() != XmlPullParser.START_TAG) {
parser.next();
continue;
}
if (parser.getName().equals(tag)) {
return;
}
throw new IOException("Expecting tag " + tag);
}
}
private static void endElement(XmlPullParser parser)
throws IOException, XmlPullParserException {
int depth = parser.getDepth();
while (parser.getDepth() >= depth) {
parser.next();
}
}
public Statistic(String name) {
this.name = name;
}
...
// Simple getters and setters for all the private fields
...
// equals based on all private fields, hashCode based on
// name and owner.
...
// toString
...
}
Essentially the main work is done in the static parse method. It takes a File and tries to parse it. We get an XML Pull Parser for the stream and ensure that it is neither namespace aware nor validating as the file format does not use namespaces and we will be forgiving on the XML format.
The first tag should be <javancss>
and after that we skip until we get a <pacakages>
tag. Once we have found the <packages>
tag we skip until we hit the first <package>
tag.
We are reverse engineering the JavaNCSS file format, so we will not make any assumptions about the order of the child elements in the <package> element. We put all the child elements into a Map
keyed by the element name, and then when we reach the end of the <package>
element we pull out the information we were after from the Map
and put it into a Statistic
object and add that to the collection of results that we will return.
As soon as we hit the end of the <packages>
element, we stop parsing.
Supporting aggregation
In order to support aggregation of multiple results, we'll add some utility methods to the Statistic
class, first we need methods that allow us to calculate totals:
package hudson.plugins.javancss.parser;
...
public class Statistic implements Serializable {
...
public static Statistic total(Collection<Statistic>... results) {
Collection<Statistic> merged = merge(results);
Statistic total = new Statistic("");
for (Statistic individual : merged) {
total.add(individual);
}
return total;
}
public void add(Statistic r) {
classes += r.classes;
functions += r.functions;
ncss += r.ncss;
javadocs += r.javadocs;
javadocLines += r.javadocLines;
singleCommentLines += r.singleCommentLines;
multiCommentLines += r.multiCommentLines;
}
...
}
The total
method just calculates the total of all the statistics in a collection of statistics. We will also need to be able to merge different result sets. This should aggregate totals for each package separately and return a collection with one total statistic for each package:
package hudson.plugins.javancss.parser;
...
public class Statistic implements Serializable {
...
public static Collection<Statistic> merge(
Collection<Statistic>... results) {
if (results.length == 0) {
return Collections.emptySet();
} else if (results.length == 1) {
return results[0];
} else {
Map<String, Statistic> merged =
new HashMap<String, Statistic>();
for (Collection<Statistic> result : results) {
for (Statistic individual : result) {
if (!merged.containsKey(individual.name)) {
merged.put(individual.name,
new Statistic(individual.name));
}
merged.get(individual.name).add(individual);
}
}
return merged.values();
}
}
...
}
That is pretty much it for the parser engine.
The Ghostwriter
Now we need to hook the engine into our publisher. We will need to configure the UI elements and the Action
s... all tasks for the final part, but for now, we'll just hook it up. We want to run the parsing on the slave side so we implement Ghostwriter.SlaveGhostwriter
.
package hudson.plugins.javancss;
import hudson.FilePath;
import hudson.model.AbstractBuild;
import hudson.model.BuildListener;
import hudson.plugins.helpers.BuildProxy;
import hudson.plugins.helpers.Ghostwriter;
import hudson.plugins.javancss.parser.Statistic;
import org.xmlpull.v1.XmlPullParserException;
import java.io.File;
import java.io.IOException;
import java.util.Collection;
import java.util.HashSet;
import java.util.Set;
public class JavaNCSSGhostwriter
implements Ghostwriter,
Ghostwriter.SlaveGhostwriter {
private final String reportFilenamePattern;
public JavaNCSSGhostwriter(String reportFilenamePattern) {
this.reportFilenamePattern = reportFilenamePattern;
}
public boolean performFromSlave(
BuildProxy build,
BuildListener listener)
throws InterruptedException, IOException {
FilePath[] paths = build.getExecutionRootDir()
.list(reportFilenamePattern);
Collection<Statistic> results = null;
Set<String> parsedFiles = new HashSet<String>();
for (FilePath path : paths) {
final String pathStr = path.getRemote();
if (!parsedFiles.contains(pathStr)) {
parsedFiles.add(pathStr);
try {
Collection<Statistic> result =
Statistic.parse(new File(pathStr));
if (results == null) {
results = result;
} else {
results = Statistic.merge(results, result);
}
// TODO copy the parsed file to the master
} catch (XmlPullParserException e) {
e.printStackTrace(listener.getLogger());
}
}
}
// TODO add the results into an Action an attach it to the
// build.
return true;
}
}
Basically, we search the supplied wildcard-path for report files and merge all the results together into a collection of results. In the final part of this series we will create our Action to hold the results and wire everything together.
View comments