In some ways parsing the JavaNCSS results is the least interesting part of developing a Hudson plugin, as once I have implemented the parser, it is available for everyone. For that reason I will focus more on:

  • best practice techniques for parsing results

  • common gotchas

  • designing for extension

Getting started

First off, we need to analyse the results file format. In the case of JavaNCSS there are multiple ways that the results file can be generated: from the JavaNCSS program directly, from ANT or from Maven. This leads us onto gotcha #1

Gotcha #1:
Never assume that a build tool generates the same format of output when run from the command line, ANT or Maven.

A case in point for Gotcha #1 is Findbugs which generates one XML format from the command line and ANT, and generates a different format that appears similar at first glance when run from Maven (mail thread). In this case it turns out that Maven 1 used the different format output, and it is feared that some people came to depend on this Maven 1 format, so when the plugin for Maven 2 was developed, they kept the Maven 1 format. In any case, the moral is don’t assume, check!

So we use the sample projects from Part 1 and generate an ANT and a Maven 2 XML report. First off, here is the report from ANT:

<?xml version="1.0"?>
<javancss>
<date>2008-04-12</date>
<time>11:22:30</time>
<packages>
<package>
<name>com.onedash.common</name>
<classes>1</classes>
<functions>3</functions>
<ncss>10</ncss>
<javadocs>3</javadocs>
<javadoc_lines>12</javadoc_lines>
<single_comment_lines>0</single_comment_lines>
<multi_comment_lines>0</multi_comment_lines>
</package>
<package>
...
</package>
...
<total>
<classes>5</classes>
<functions>8</functions>
<ncss>46</ncss>
<javadocs>9</javadocs>
<javadoc_lines>37</javadoc_lines>
<single_comment_lines>0</single_comment_lines>
<multi_comment_lines>0</multi_comment_lines>
</total>
<table>
<tr><td>Packages</td><td>Classes</td><td>Functions</td><td>NCSS</td><td>Javadocs</td><td>per</td></tr>
<tr><td>4.00</td><td>5.00</td><td>8.00</td><td>46.00</td><td>9.00</td><td>Project</td></tr>
<tr><td></td><td>1.25</td><td>2.00</td><td>11.50</td><td>2.25</td><td>Package</td></tr>
<tr><td></td><td></td><td>1.60</td><td>9.20</td><td>1.80</td><td>Class</td></tr>
<tr><td></td><td></td><td></td><td>5.75</td><td>1.13</td><td>Function</td></tr>
</table>
</packages>

<objects>
<object>
<name>com.onedash.common.Factory</name>
<ncss>7</ncss>
<functions>3</functions>
<classes>0</classes>
<javadocs>3</javadocs>
</object>
<object>
...
</object>
...
<averages>
<ncss>6.60</ncss>
<functions>1.60</functions>
<classes>0.00</classes>
<javadocs>1.80</javadocs>
</averages>
<ncss>46.00</ncss>
</objects>

<functions>
<function>
<name>com.onedash.common.Factory.Factory()</name>
<ncss>1</ncss>
<ccn>1</ccn>
<javadocs>1</javadocs>
</function>
<function>
...
</function>
...
<ncss>46.00</ncss>
</functions>
</javancss>

OK, first off, for those following the tutorial exactly, I have cheated a little. I added some more source files into the project to make sure that I have multiple classes is different packages. You can see the source code I built from here. Additionally, I have trimmed the output somewhat to highlight the interesting bits, removing the duplicate entries.

From this report file we can see a basic XML structure:

  • The root element is <javancss> and has child elements: <date>, <time>, <packages>, <objects>, and <functions>

  • The <date> and <time> elements are the timestamp when the report was generated with the date in YYYY-MM-DD format and the time in HH:MM:SS format

  • The <packages> element has child elements: <package>, <total>, and <table>. There are multiple <package>; elements, but only one <total> and <table> element.

    • The <package> elements have child elements: <name>, <classes>, <functions>, <ncss>, <javadocs>, <javadoc_lines>, <single_comment_lines> and <multi_comment_lines>. The <name> element contains the name of the package as a String and the other elements contain totals as Integers.

    • The <total> element has child elements: <classes>, <functions>, <ncss>, <javadocs>, <javadoc_lines>, <single_comment_lines> and <multi_comment_lines>. These elements are the sum of all the corresponding <package> children inside the <packages> parent

    • The <table> element seems to be a HTML table.

  • The <objects> element has child elements: <object>, <averages> and <ncss>. There are multiple <object> elements, the <averages> element contains the average results for all the <object> elements and the <ncss> element providing some form of total or average.

  • The <functions> element has child elements: <function> and <ncss>. Again there are multiple <function> elements with the <ncss> element providing some form of total or average (interestingly the result appears to be the same as from <objects>).

Now, let’s take a look at what Maven 2 gives us:

<?xml version="1.0"?>
<?xml version="1.0"?>
<javancss>
<date>2008-04-12</date>
<time>11:43:06</time>
<packages>
<package>
<name>com.onedash.common</name>
<classes>1</classes>
<functions>3</functions>
<ncss>10</ncss>
<javadocs>3</javadocs>
<javadoc_lines>12</javadoc_lines>
<single_comment_lines>0</single_comment_lines>
<multi_comment_lines>0</multi_comment_lines>
</package>
<package>
...
</package>
...
<total>
<classes>5</classes>
<functions>8</functions>
<ncss>46</ncss>
<javadocs>10</javadocs>
<javadoc_lines>42</javadoc_lines>
<single_comment_lines>3</single_comment_lines>
<multi_comment_lines>3</multi_comment_lines>
</total>
<table>
<tr><td>Packages</td><td>Classes</td><td>Functions</td><td>NCSS</td><td>Java
<tr><td>4.00</td><td>5.00</td><td>8.00</td><td>46.00</td><td>10.00</td><td>P
<tr><td></td><td>1.25</td><td>2.00</td><td>11.50</td><td>2.50</td><td>Packag
<tr><td></td><td></td><td>1.60</td><td>9.20</td><td>2.00</td><td>Class</td><
<tr><td></td><td></td><td></td><td>5.75</td><td>1.25</td><td>Function</td></
</table>
</packages>

<objects>
<object>
<name>com.onedash.common.api.Namer</name>
<ncss>2</ncss>
<functions>1</functions>
<classes>0</classes>
<javadocs>1</javadocs>
</object>
<object>
...
</object>
...
<averages>
<ncss>6.60</ncss>
<functions>1.60</functions>
<classes>0.00</classes>
<javadocs>2.00</javadocs>
</averages>
<ncss>46.00</ncss>
</objects>

<functions>
<function>
<name>com.onedash.common.api.Namer.newName()</name>
<ncss>1</ncss>
<ccn>1</ccn>
<javadocs>0</javadocs>
</function>
<function>
...
</function>
<ncss>46.00</ncss>
</functions>
</javancss>

Thankfully, this is the same format as for ANT. You will also be relieved to know that this is the format generated by the JavaNCSS program directly. Thus we only have to write one parser, and we do not have to detect what format we are parsing. But before I forget:

Best Practice #1:
When there are multiple formats of a report generated by different tools, make sure that your Hudson plugin can detect the different formats and can handle them appropriately (by either delegating to a different parser implementation, or by handling the differences on the fly).

One of the goals of Hudson is to minimise configuration. So when a plugin can detect an configuration option automatically, it should detect it automatically (possibly providing an “Advanced” option button to let users override the detection if Hudson gets it wrong)

Start small

Looking at the JavaNCSS output, I see that there is a lot of information... and I only have one more Part left in this series! So I am not going to parse everything. I am sure that in the future I will extend the Hudson plugin to parse all of the file, but for now I am just going to concentrate on the <pacakages> element. This gives users something useful and it’s better than nothing.

But what happens when I do get around to parsing the <objects> and <functions> elements? People may have lots of old builds and they will want to see the trends of the <objects> and <functions> results. I have two choices:

  • Tell them “Sorry, out of luck”

  • Save the results with the build, and then the newer parser can extract the results when people want the trend.

Choosing between these two options can be difficult. My preference is to go with option two, as long as the results are not a really big file.

Best Practice #2:
If you are not parsing everything in the results file, and the results file is not too big, and it can be parsed without reference to the source code, copy the results file to Hudson so that future versions of your plugin can read the information you are not currently parsing.

Don’t over parse

The results that we parse are going to be placed into an Action object. This Action object will be serialized. When Hudson starts up, it reads all the results of all the builds. If we place too much information in our Action object, this can have a detrimental effect on Hudson’s performance. When users have 50+ projects each with a couple of hundred builds, they will thank you for keeping your Action objects small.

Gotcha #2:
Don’t store too much in your Action objects.

Don’t under parse

OK, so I have just given out about storing too much in your Action objects. There is a second problem... not storing enough! Most reporting plugins try to present a trend graph to show progress over a number of builds. If we don’t store the information required to generate this trend graph inside our Action objects, then displaying the trend graph will require parsing all the result files for all builds of a project. This can have a detrimental effect on Hudson’s performance. When users have projects with a couple of hundred builds, they will thank you for keeping the information to generate the main trend graph inside your Action objects.

Gotcha #3:
Store the information for generating the Project level trend graph in your Action objects.

A case in point for Gotach #3 is the cobertura plugin, which at the time of writing, does not store the information for the main trend graph in the Action object. I fully intend to fix this situation once I have finished this series!

How to parse

Most of the result files that you will encounter are XML based. We are writing our plugins in Java, so that gives us a range of parsers to choose from, e.g.

  • SAX

  • DOM

  • StAX

  • Roll your own

  • Etc.

Given that report files can end up very big for very big projects, we need to be careful how we parse the results:

Gotcha #4:
Don’t parse XML results using DOM, as this will require reading the entire report file into memory.

I am going to stick my neck out and make a recommendation:

Best Practice #3:
Use an XML pull parser to parse XML report files.

They are generally faster, use less memory, and are better suited to a “hit-and-run” style of result extraction.

Be able to aggregate parsing results

You may think that there will only ever be one result file that you need to parse. Maven 2 usually throws a spanner into that model, and everyone has their own ANT build script, so:

Gotcha #5:
Don’t assume you only have to parse one report file for each project.

This gotcha arrives from the code coverage plugins (emma, clover, cobertura). Initially, you would think that people are only interested in one code coverage result, i.e. the coverage for the project... so they will only have one result file that we need to parse, right? Wrong! Some tools/build scripts generate a report for each module but only generate a summary report in non-conforming HTML. Some tools / build scripts generate a report for unit tests and integration tests separately. It’s a mess, and don’t get me started on using different tools for different test types...

The parsing engine

Ok, so here is the parsing engine:

package hudson.plugins.javancss.parser;

import hudson.model.AbstractBuild;
import hudson.util.IOException2;
import org.xmlpull.v1.XmlPullParser;
import org.xmlpull.v1.XmlPullParserException;
import org.xmlpull.v1.XmlPullParserFactory;

import java.io.*;
import java.util.*;

public class Statistic implements Serializable {

private AbstractBuild<?, ?> owner;
private String name;
private long classes;
private long functions;
private long ncss;
private long javadocs;
private long javadocLines;
private long singleCommentLines;
private long multiCommentLines;

public static Collection<Statistic> parse(File inFile)
throws IOException, XmlPullParserException {
Collection<Statistic> results = new ArrayList<Statistic>();
FileInputStream fis = null;
BufferedInputStream bis = null;
try {
fis = new FileInputStream(inFile);
bis = new BufferedInputStream(fis);
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(false);
XmlPullParser parser = factory.newPullParser();
parser.setInput(bis, null);

// check that the first tag is <javancss>
expectNextTag(parser, "javancss");

// skip until we get to the <packages> tag
while (parser.getDepth() > 0
&& (parser.getEventType() != XmlPullParser.START_TAG
|| !"packages".equals(parser.getName()))) {
parser.next();
}
while (parser.getDepth() > 0
&& (parser.getEventType() != XmlPullParser.START_TAG
|| !"package".equals(parser.getName()))) {
parser.next();
}
while (parser.getDepth() >= 2
&& parser.getEventType() == XmlPullParser.START_TAG
&& "package".equals(parser.getName())) {
Map<String, String> data = new HashMap<String, String>();
String lastTag = null;
String lastText = null;
int depth = parser.getDepth();
while (parser.getDepth() >= depth) {
parser.next();
switch (parser.getEventType()) {
case XmlPullParser.START_TAG:
lastTag = parser.getName();
break;
case XmlPullParser.TEXT:
lastText = parser.getText();
break;
case XmlPullParser.END_TAG:
if (parser.getDepth() == 4
&& lastTag != null
&& lastText != null) {
data.put(lastTag, lastText);
}
lastTag = null;
lastText = null;
break;
}
}
if (data.containsKey("name")) {
Statistic s = new Statistic(data.get("name"));
s.setClasses(Long.valueOf(data.get("classes")));
s.setFunctions(Long.valueOf(data.get("functions")));
s.setNcss(Long.valueOf(data.get("ncss")));
s.setJavadocs(Long.valueOf(data.get("javadocs")));
s.setJavadocLines(Long.valueOf(data.get("javadoc_lines")));
s.setSingleCommentLines(Long.valueOf(data.get("single_comment_lines")));
s.setMultiCommentLines(Long.valueOf(data.get("multi_comment_lines")));
results.add(s);
}
parser.next();
}
} catch (XmlPullParserException e) {
throw new IOException2(e);
} finally {
if (bis != null) {
bis.close();
}
if (fis != null) {
fis.close();
}
}
return results;
}

private static void skipTag(XmlPullParser parser)
throws IOException, XmlPullParserException {
parser.next();
endElement(parser);
}

private static void expectNextTag(XmlPullParser parser, String tag)
throws IOException, XmlPullParserException {
while (true) {
if (parser.getEventType() != XmlPullParser.START_TAG) {
parser.next();
continue;
}
if (parser.getName().equals(tag)) {
return;
}
throw new IOException("Expecting tag " + tag);
}
}

private static void endElement(XmlPullParser parser)
throws IOException, XmlPullParserException {
int depth = parser.getDepth();
while (parser.getDepth() >= depth) {
parser.next();
}
}

public Statistic(String name) {
this.name = name;
}

...

// Simple getters and setters for all the private fields

...

// equals based on all private fields, hashCode based on
// name and owner.

...

// toString

...
}

Essentially the main work is done in the static parse method. It takes a File and tries to parse it. We get an XML Pull Parser for the stream and ensure that it is neither namespace aware nor validating as the file format does not use namespaces and we will be forgiving on the XML format.

The first tag should be <javancss> and after that we skip until we get a <pacakages> tag. Once we have found the <packages> tag we skip until we hit the first <package> tag.

We are reverse engineering the JavaNCSS file format, so we will not make any assumptions about the order of the child elements in the <package> element. We put all the child elements into a Map keyed by the element name, and then when we reach the end of the <package> element we pull out the information we were after from the Map and put it into a Statistic object and add that to the collection of results that we will return.

As soon as we hit the end of the <packages> element, we stop parsing.

Supporting aggregation

In order to support aggregation of multiple results, we'll add some utility methods to the Statistic class, first we need methods that allow us to calculate totals:

package hudson.plugins.javancss.parser;

...

public class Statistic implements Serializable {

...

public static Statistic total(Collection<Statistic>... results) {
Collection<Statistic> merged = merge(results);
Statistic total = new Statistic("");
for (Statistic individual : merged) {
total.add(individual);
}
return total;
}

public void add(Statistic r) {
classes += r.classes;
functions += r.functions;
ncss += r.ncss;
javadocs += r.javadocs;
javadocLines += r.javadocLines;
singleCommentLines += r.singleCommentLines;
multiCommentLines += r.multiCommentLines;
}

...

}

The total method just calculates the total of all the statistics in a collection of statistics. We will also need to be able to merge different result sets. This should aggregate totals for each package separately and return a collection with one total statistic for each package:

package hudson.plugins.javancss.parser;

...

public class Statistic implements Serializable {

...

public static Collection<Statistic> merge(
Collection<Statistic>... results) {
if (results.length == 0) {
return Collections.emptySet();
} else if (results.length == 1) {
return results[0];
} else {
Map<String, Statistic> merged =
new HashMap<String, Statistic>();
for (Collection<Statistic> result : results) {
for (Statistic individual : result) {
if (!merged.containsKey(individual.name)) {
merged.put(individual.name,
new Statistic(individual.name));
}
merged.get(individual.name).add(individual);
}
}
return merged.values();
}
}

...

}

That is pretty much it for the parser engine.

The Ghostwriter

Now we need to hook the engine into our publisher. We will need to configure the UI elements and the Actions... all tasks for the final part, but for now, we'll just hook it up. We want to run the parsing on the slave side so we implement Ghostwriter.SlaveGhostwriter.

package hudson.plugins.javancss;

import hudson.FilePath;
import hudson.model.AbstractBuild;
import hudson.model.BuildListener;
import hudson.plugins.helpers.BuildProxy;
import hudson.plugins.helpers.Ghostwriter;
import hudson.plugins.javancss.parser.Statistic;
import org.xmlpull.v1.XmlPullParserException;

import java.io.File;
import java.io.IOException;
import java.util.Collection;
import java.util.HashSet;
import java.util.Set;

public class JavaNCSSGhostwriter
implements Ghostwriter,
Ghostwriter.SlaveGhostwriter {

private final String reportFilenamePattern;

public JavaNCSSGhostwriter(String reportFilenamePattern) {
this.reportFilenamePattern = reportFilenamePattern;
}

public boolean performFromSlave(
BuildProxy build,
BuildListener listener)
throws InterruptedException, IOException {
FilePath[] paths = build.getExecutionRootDir()
.list(reportFilenamePattern);
Collection<Statistic> results = null;
Set<String> parsedFiles = new HashSet<String>();
for (FilePath path : paths) {
final String pathStr = path.getRemote();
if (!parsedFiles.contains(pathStr)) {
parsedFiles.add(pathStr);
try {
Collection<Statistic> result =
Statistic.parse(new File(pathStr));
if (results == null) {
results = result;
} else {
results = Statistic.merge(results, result);
}

// TODO copy the parsed file to the master

} catch (XmlPullParserException e) {
e.printStackTrace(listener.getLogger());
}
}
}

// TODO add the results into an Action an attach it to the
// build.

return true;
}
}

Basically, we search the supplied wildcard-path for report files and merge all the results together into a collection of results. In the final part of this series we will create our Action to hold the results and wire everything together.

1

View comments

  1. Another interesting tidbit:

    @Benchmark
    public void debug1Arg_date(Blackhole blackhole) {
    LOGGER.debug("Started at %s", new Date(time));
    time++;
    blackhole.consume(time);
    }

    @Benchmark
    public void guardedDebug1Arg_date(Blackhole blackhole) {
    if (LOGGER.isDebugEnabled()) {
    LOGGER.debug("Started at %s", new Date(time));
    }
    time++;
    blackhole.consume(time);
    }

    Gives the following benchmark results:

    Benchmark                            Mode  Cnt          Score         Error  Units
    Log4jBenchmark.debug1Arg_date         thrpt   20  179653191.248 ± 2731044.349  ops/s
    Log4jBenchmark.guardedDebug1Arg_date  thrpt   20  207001790.376 ± 2074020.617  ops/s

    We can also compare guarding with SLF4J over logback

    Benchmark                              Mode  Cnt          Score         Error  Units
    SLF4JBenchmark.debug1Arg_date         thrpt   20  220765608.629 ± 6555782.899  ops/s
    SLF4JBenchmark.guardedDebug1Arg_date  thrpt   20  241286730.504 ± 9532328.812  ops/s

    So where performance is critical, seems to be about 10% faster if you guard that log statement!

    0

    Add a comment

  2. You asked for it.
    Logback:

    @Benchmark
    public void debug1Arg_date(Blackhole blackhole) { LOGGER.debug("Started at {}", new Date(time)); time++; blackhole.consume(time); } @Benchmark
    public void debug2Arg_date(Blackhole blackhole) {
        LOGGER.debug("Started at {}", new Date(time), null);
        time++;
        blackhole.consume(time);
    }
    @Benchmark
    public void debugNArg_date(Blackhole blackhole) { LOGGER.debug("Started at {}", new Date(time), null, null); time++; blackhole.consume(time); } static {
        ch.qos.logback.classic.Logger root = (ch.qos.logback.classic.Logger)LoggerFactory.getLogger(org.slf4j.Logger.ROOT_LOGGER_NAME);
        root.setLevel(ch.qos.logback.classic.Level.INFO);
    }

    And the benchmarks

    Benchmark                     Mode  Cnt          Score         Error  Units 
    LogbackBenchmark.debug1Arg_date  thrpt   20  221056534.656 ± 8549233.826  ops/s LogbackBenchmark.debug2Arg_date  thrpt   20  220576341.742 ± 3778270.898  ops/s LogbackBenchmark.debugNArg_date  thrpt   20  134887126.088 ± 2973182.812  ops/s

    JUL over SLF4J (same benchmark code but different dependency)

    Benchmark                     Mode  Cnt          Score         Error  Units 
    SLF4JOverJULBenchmark.debug1Arg_date  thrpt   20  213779286.286 ± 4819012.495  ops/s SLF4JOverJULBenchmark.debug2Arg_date  thrpt   20  213707271.979 ± 2675083.826  ops/s SLF4JOverJULBenchmark.debugNArg_date  thrpt   20  152839334.058 ± 2122611.858  ops/s

    Then if we compare the JUL LogRecord implicitly:

    @Benchmark
    public void logRecord(Blackhole blackhole) { LogRecord lr = new LogRecord(Level.FINE, "Started at {0}"); lr.setParameters(new Object[]{new Date(time)}); LOGGER.log(lr); time++; blackhole.consume(time); }

    This has the following Benchmark

    Benchmark                Mode  Cnt         Score        Error  Units 
    JulBenchmark.logRecord  thrpt   20  16422331.419 ± 148428.926  ops/s

    DO NOT USE AN UNGUARDED LogRecord WHEN IT WILL LIKELY NOT GET LOGGED

    Finally, Apache’s Log4J version 2 (which has overrides to avoid var-args all the way up to 10 parameters):

    @Benchmark
    public void debug1Arg_date(Blackhole blackhole) { LOGGER.debug("Started at %s", new Date(time)); time++; blackhole.consume(time); } @Benchmark
    public void debug2Arg_date(Blackhole blackhole) { LOGGER.debug("Started at %s", new Date(time), null); time++; blackhole.consume(time); } @Benchmark
    public void debug3Arg_date(Blackhole blackhole) { LOGGER.debug("Started at %s", new Date(time), null, null); time++; blackhole.consume(time); } @Benchmark
    public void debug4Arg_date(Blackhole blackhole) { LOGGER.debug("Started at %s", new Date(time), null, null, null); time++; blackhole.consume(time); } @Benchmark
    public void debug5Arg_date(Blackhole blackhole) { LOGGER.debug("Started at %s", new Date(time), null, null, null, null); time++; blackhole.consume(time); } @Benchmark
    public void debug6Arg_date(Blackhole blackhole) { LOGGER.debug("Started at %s", new Date(time), null, null, null, null, null); time++; blackhole.consume(time); } @Benchmark
    public void debug7Arg_date(Blackhole blackhole) { LOGGER.debug("Started at %s", new Date(time), null, null, null, null, null, null); time++; blackhole.consume(time); } @Benchmark
    public void debug8Arg_date(Blackhole blackhole) { LOGGER.debug("Started at %s", new Date(time), null, null, null, null, null, null, null); time++; blackhole.consume(time); } @Benchmark
    public void debug9Arg_date(Blackhole blackhole) { LOGGER.debug("Started at %s", new Date(time), null, null, null, null, null, null, null, null); time++; blackhole.consume(time); } @Benchmark
    public void debug10Arg_date(Blackhole blackhole) { LOGGER.debug("Started at %s", new Date(time), null, null, null, null, null, null, null, null, null); time++; blackhole.consume(time); } @Benchmark
    public void debugNArg_date(Blackhole blackhole) { LOGGER.debug("Started at %s", new Date(time), null, null, null, null, null, null, null, null, null, null); time++; blackhole.consume(time); }

    And the benchmarks:

    Benchmark                         Mode  Cnt          Score         Error  Units 
    Log4j2Benchmark.debug1Arg_date   thrpt   20  182786163.176 ± 1894038.335  ops/s Log4j2Benchmark.debug2Arg_date   thrpt   20  180716268.151 ± 5295999.398  ops/s Log4j2Benchmark.debug3Arg_date   thrpt   20  178064841.181 ± 6987288.015  ops/s Log4j2Benchmark.debug4Arg_date   thrpt   20  181537704.811 ± 4472120.312  ops/s Log4j2Benchmark.debug5Arg_date   thrpt   20  181803075.728 ± 3963211.935  ops/s Log4j2Benchmark.debug6Arg_date   thrpt   20  178229873.962 ± 8092548.001  ops/s Log4j2Benchmark.debug7Arg_date   thrpt   20  181018788.479 ± 5438279.737  ops/s Log4j2Benchmark.debug8Arg_date   thrpt   20  180443652.287 ± 4965518.359  ops/s Log4j2Benchmark.debug9Arg_date   thrpt   20  181456134.533 ± 2014764.085  ops/s
    Log4j2Benchmark.debug10Arg_date  thrpt   20  176706451.426 ± 7911521.599  ops/s
    Log4j2Benchmark.debugNArg_date   thrpt   20  123243343.482 ± 2051852.105  ops/s


    So for the logs not logged we have the following:

    • Fastest is JUL with single argument
    • Second fastest is SLF4J with one or two arguments (Logback is ever so slightly ahead but it’s splitting hairs)
    • Third place is Apache Log4J version 2 with up to 10 arguments
    • Forth place is JUL for more than 1 argument
    • Fifth place is SLF4J over JUL with more than 2 arguments
    • Sixth is Logback with more than 2 arguments
    • Last place is Apache Log4J with more than 10 arguments

    Pick your poison… and we have not considered the perhaps more important performance when actually logging!

    0

    Add a comment

  3. One of my colleagues had a question:

    Now for some context, when the Jenkins project started off, Kohsuke was working for Sun and felt it would be wrong, as a Sun employee, to use a logging framework other than that provided by the JVM as using a different logging framework could be seen as being implication that the built in Java Utils Logging framework was a steaming pile of excrement.

    Now while JUL is not a complete and utter steaming pile of excrement, at times it does indeed throw shades at being one.

    This post is not a defence of Java Utils Logging. This post is an analysis of how to use Java Utils Logging such that performance does not end up down the toilet.

    When you are using Java Utils Logging there are many many ways to actually log something:

    // simple style
    LOGGER.info("A message");
    // explicit level style
    LOGGER.log(Level.INFO, "A message");
    // explicit level and stack trace
    LOGGER.log(Level.INFO, "A message with a stack trace", e);
    // explicit level and single parameter
    LOGGER.log(Level.INFO, "A message with a {0} parameter", param);
    // explicit level and multiple parameters
    LOGGER.log(Level.INFO, "A message with parameter {0} and {1}", new Object[]{param0,param1});
    // explicit level, stack trace and parameter(s)
    LogRecord lr = new LogRecord(Level.INFO, "A message with parameter {0} and {1} and a stack trace");
    lr.setThrown(e);
    lr.setParams(new Object[]{param0,param1});
    LOGGER.log(lr);

    Now each of the above achieves basically the same thing, sending a message to the logs under differing use cases. The LOGGER.info() variant is intended to be used when you have a constant message to report… but invariably somebody starts doing the easy string concatenation with that style, rather than use the parameterised logging as the JUL framework intended, so you will sometimes see:

    LOGGER.info("Accepted incoming connection #" + connectionNumber + " from " + socket.getAddress() + " and processing with " + handler); 

    Rather than

    LOGGER.log(Level.INFO, "Accepted incoming connection #{0} from {1} and processing with {2}", new Object[]{connectionNumber,socket.getAddress(),handler});

    At the INFO level (which is normally logged by default) this may not be as critical, but once we move the code from development to production and notch the levels down the difference can be an order of magnitude or more.

    The reason becomes clear when we look at the source code for Logger:

     
    /**
    * Log a message, with no arguments.
    * <p>
    * If the logger is currently enabled for the given message
    * level then the given message is forwarded to all the
    * registered output Handler objects.
    * <p>
    * @param level One of the message level identifiers, e.g., SEVERE
    * @param msg The string message (or a key in the message catalog)
    */
    public void log(Level level, String msg) {
    if (!isLoggable(level)) {
    return;
    }
    LogRecord lr = new LogRecord(level, msg);
    doLog(lr);
    }
    /**
    * Log a message, with an array of object arguments.
    * <p>
    * If the logger is currently enabled for the given message
    * level then a corresponding LogRecord is created and forwarded
    * to all the registered output Handler objects.
    * <p>
    * @param level One of the message level identifiers, e.g., SEVERE
    * @param msg The string message (or a key in the message catalog)
    * @param params array of parameters to the message
    */
    public void log(Level level, String msg, Object params[]) {
    if (!isLoggable(level)) {
    return;
    }
    LogRecord lr = new LogRecord(level, msg);
    lr.setParameters(params);
    doLog(lr);
    }
    /**
    * Log an INFO message.
    * <p>
    * If the logger is currently enabled for the INFO message
    * level then the given message is forwarded to all the
    * registered output Handler objects.
    * <p>
    * @param msg The string message (or a key in the message catalog)
    */
    public void info(String msg) {
    log(Level.INFO, msg);
    }

    So that LOGGER.info() call requires that we build up the string first, and only after the string has been instantiated do we check if the logger is logging that level whereas the parameterised message just passes the string constant and the object array but doesn’t do much actual work.

    One would hope, given sufficient time, that the LOGGER.log(level,msg) call would also get inlined and the JVM might be smart enough to re-order the isLoggable check ahead of the string concatenation and everything would be equal in the long run.

    Well, that’s a nice theory, but users would need to wait for that to occur, plus your log statements are likely in the middle of large methods anyway and the Logger methods are not final so the JVM would need to put in some guards just in case somebody loaded a Logger subclass that would invalidate the optimisation

    So what is the impact anyway?

    Enter JMH we can run a micro-benchmark to see what the different styles perform like when our logger is not logging.

    Firstly we need to grab a baseline:

    public class JulBenchmark {

    private static final Logger LOGGER = Logger.getLogger(JulBenchmark.class.getName());

    static long time;

    @Benchmark
    public void noLogging(Blackhole blackhole) {
    time++;
    blackhole.consume(time);
    }
     @Benchmark
    public void blackholeStringConcat(Blackhole blackhole) {
    blackhole.consume("Started at " + new Date(time));
    time++;
    blackhole.consume(time);
    }
     @Benchmark
    public void blackholeStringFormat(Blackhole blackhole) {
    blackhole.consume(String.format("Started at %tc", time));
    time++;
    blackhole.consume(time);
    }
     @Benchmark
    public void blackholeSimpleArray(Blackhole blackhole) {
    blackhole.consume(new Object[]{time});
    time++;
    blackhole.consume(time);
    }
     @Benchmark
    public void blackholeDateArray(Blackhole blackhole) {
    blackhole.consume(new Object[]{new Date(time)});
    time++;
    blackhole.consume(time);
    }
    }

    In each of our benchmarks, we pass through the blackhole and we consume the time static field with it so that the time differences we observe are all just for the statements above the time++ and by incrementing the time value we prevent the JVM from optimising concatenation to a constant string value.

    The results:

    Benchmark                                      Mode  Cnt          Score          Error  Units
    JulBenchmark.blackholeDateArray               thrpt   20   97371938.780 ±  2903787.552  ops/s
    JulBenchmark.blackholeSimpleArray             thrpt   20  100902747.726 ±  1383605.583  ops/s
    JulBenchmark.blackholeStringConcat            thrpt   20    3727524.654 ±    62280.481  ops/s
    JulBenchmark.blackholeStringFormat            thrpt   20     601454.586 ±   5689.042  ops/s
    JulBenchmark.noLogging                        thrpt   20  366202499.287 ± 13837552.431  ops/s

    So creating an Array just to throw it down the black hole is two orders of magnitude better than building a string to be thrown down the black hole which is an order of magnitude better than using String.format. It doesn’t really matter whether our array creation even included wrapping the long as a Date.

    Our alarm bells should be ringing rather loudly by now… that String concatenation is a horrific cost to pay and the array creation is really quite cheap compared to doing nothing.

    Now let’s look at what happens if we do some logging at a level that goes nowhere:

    @Benchmark
    public void helperStringConcat(Blackhole blackhole) {
    LOGGER.fine("Started at " + new Date(time));
    time++;
    blackhole.consume(time);
    }
    @Benchmark
    public void levelStringConcat(Blackhole blackhole) {
    LOGGER.log(Level.FINE, "Started at " + new Date(time));
    time++;
    blackhole.consume(time);
    }
    @Benchmark
    public void levelMessage1Arg_custom(Blackhole blackhole) {
    LOGGER.log(Level.FINE, "Started at {0,date,EEE MMM dd HH:mm:ss zzz yyyy}", time);
    time++;
    blackhole.consume(time);
    }
    @Benchmark
    public void levelMessage1Arg_date(Blackhole blackhole) {
    LOGGER.log(Level.FINE, "Started at {0}", new Date(time));
    time++;
    blackhole.consume(time);
    }
    @Benchmark
    public void levelMessageNArg_custom(Blackhole blackhole) {
    LOGGER.log(Level.FINE, "Started at {0,date,EEE MMM dd HH:mm:ss zzz yyyy}", new Object[]{time});
    time++;
    blackhole.consume(time);
    }
    @Benchmark
    public void levelMessageNArg_date(Blackhole blackhole) {
    LOGGER.log(Level.FINE, "Started at {0}", new Object[]{new Date(time)});
    time++;
    blackhole.consume(time);
    }

    So what does the benchmark say for these?

    Benchmark                                      Mode  Cnt          Score          Error  Units
    JulBenchmark.helperStringConcat               thrpt   20    3878438.607 ±    57378.392  ops/s
    JulBenchmark.levelMessage1Arg_custom          thrpt   20  166976758.162 ±  6073258.725  ops/s
    JulBenchmark.levelMessage1Arg_date            thrpt   20  267808265.736 ±  4607295.478  ops/s
    JulBenchmark.levelStringConcat                thrpt   20    3904071.260 ±   112723.048  ops/s
    JulBenchmark.levelMessageNArg_custom          thrpt   20  167622786.200 ±  3108069.452  ops/s
    JulBenchmark.levelMessageNArg_date            thrpt   20  178923942.807 ±  2145153.495  ops/s

    So as we feared… that String concatenation is killing our performance… 

    But what is going on with the single arg date… let’s take a look at the bytecode:

      public void levelMessage1Arg_custom(org.openjdk.jmh.infra.Blackhole);
        descriptor: (Lorg/openjdk/jmh/infra/Blackhole;)V
        flags: ACC_PUBLIC    Code:      stack=5, locals=2, args_size=2
             0: getstatic     #4                  // Field LOGGER:Ljava/util/logging/Logger;
             3: getstatic     #15                 // Field java/util/logging/Level.FINE:Ljava/util/logging/Level;
             6: ldc           #22                 // String Started at {0,date,EEE MMM dd HH:mm:ss zzz yyyy}
             8: getstatic     #2                  // Field time:J
            11: invokestatic  #19                 // Method java/lang/Long.valueOf:(J)Ljava/lang/Long;
            14: invokevirtual #23                 // Method java/util/logging/Logger.log:(Ljava/util/logging/Level;Ljava/lang/String;Ljava/lang/Object;)V
            17: getstatic     #2                  // Field time:J
            20: lconst_1
            21: ladd
            22: putstatic     #2                  // Field time:J
            25: aload_1
            26: getstatic     #2                  // Field time:J
            29: invokevirtual #3                  // Method org/openjdk/jmh/infra/Blackhole.consume:(J)V
            32: return
          LineNumberTable:
            line 110: 0
            line 111: 17
            line 112: 25
            line 113: 32
          LocalVariableTable:
            Start  Length  Slot  Name   Signature
                0      33     0  this   Lcom/onedash/JulBenchmark;
                0      33     1 blackhole   Lorg/openjdk/jmh/infra/Blackhole;
        RuntimeVisibleAnnotations:
          0: #53()

    And Long.valueOf looks like

    public static Long valueOf(long l) {
    final int offset = 128;
    if (l >= -128 && l <= 127) { // will cache
    return LongCache.cache[(int)l + offset];
    }
    return new Long(l);
    }

    I wonder what would happen if we ignored the cache...

     
    @Benchmark
    public void levelMessage1Arg_customWAT(Blackhole blackhole) {
    LOGGER.log(Level.FINE, "Started at {0,date,EEE MMM dd HH:mm:ss zzz yyyy}", new Long(time));
    time++;
    blackhole.consume(time);
    }
    @Benchmark
    public void levelMessageNArg_customWAT(Blackhole blackhole) {
    LOGGER.log(Level.FINE, "Started at {0,date,EEE MMM dd HH:mm:ss zzz yyyy}", new Object[]{new Long(time)});
    time++;
    blackhole.consume(time);
    }

    If we look at the benchmarks for those:

    Benchmark                                      Mode  Cnt          Score          Error  Units
    JulBenchmark.levelMessage1Arg_customWAT       thrpt   20  273177646.084 ±  7367926.789  ops/s
    JulBenchmark.levelMessageNArg_customWAT       thrpt   20  178868531.695 ±  8171421.563  ops/s

    So bypassing the auto-boxing and using explicit boxing of a Long is actually about 7% faster when you are putting the Long in an array… but when passed using the single argument helper, explicit boxing is 63% faster!

    I wonder if somebody should tell IntelliJ to quit whining!

    IntelliJ complains of unnecessary boxing

    Finally, what about if we move the guard out explicitly:

    @Benchmark
    public void guardedLevelMessageNArg(Blackhole blackhole) {
    if (LOGGER.isLoggable(Level.FINE)) {
    LOGGER.log(Level.FINE, "Started at {0,date,EEE MMM dd HH:mm:ss zzz yyyy}", new Object[]{new Long(time)});
    }
    time++;
    blackhole.consume(time);
    }

    What does that benchmark like:

    Benchmark                              Mode  Cnt          Score         Error  Units
    JulBenchmark.guardedLevelMessageNArg  thrpt   20  295175956.393 ± 5381049.710  ops/s

    OK, so back to Baptiste’s original question:

    • The arrays are actually very cheap, at least compared to auto-boxing of say longs
    • The LOGGER.info() style is just not worth the risk, always give the level that makes it easier to switch to parameterized.
    • When not being logged, JUL logging can be very cheap e.g. compare noLogging benchmark with levelMessage1Arg_date or levelMessage1Arg_customWAT where we saw our throughput only drop by about 25%. This compares quite favourably with guardedLevelMessageNArg which only has a drop of 20%

    My recommendations are thus:

    IF YOU HAVE TO USE Java Utils Logging

    1. Bash people on the head so you can use a better logging framework
    2. Never use the LOGGER.info() style statements, too risky that somebody will be lazy and concatenate something in there
    3. In regular code: Don’t worry about the array creation
    4. In proven hot-spot code where logging defaults to off: Wrap all logging statements with a guard of (LOGGER.isLoggable(Level.___)) { … }. If you have multiple guards in the same method, extract the level check to a local variable.

    And please remember point #1

    0

    Add a comment

  4. If there are two only things guaranteed to cause all out war between developers, my vote is that those two things are:

    • Code formatting conventions
    • Version number conventions

    The first is resolved trivially (just shoot anyone suggesting the use of TABs). Until recently most people thought the second was resolved by semver… at least until the developer of one of the more commonly used JavaScript libraries threw the cat amongst the pigeons.

    V1

    The real issue here is that we are trying to cram far too much into a list of number like things (that we call a “version number” - note it’s not a plural). On top of that we have marketing types putting additional demands on these poor abused version numbers.

    Let’s start with some example schemes:

    Every time you change something increment the number by 1.

    This scheme is dead simple to implement if your source repository is Subversion, as the version number is simply the revision number.

    Things get a bit harder if you use a DVCS like Git. You can certainly count commits from the root commit, but you have to worry about squashed commits and history rewriting, so you end up with a canonical repository configured to not allow rewriting of history and the version number is determined by that repository

    For something like Software as a Service, there is a lot to be said with this model, as there is only ever one version in production, and that distribution model tends to follow upgrade only processes.

    Once you have to maintain more than one version (i.e. if you need to fix things in older versions… or if there is an emergency fix needed in production while you await the next big release to make it through testing) then that single version number becomes a liability. You need to have some tracking database to know that version 17 was actually a fixed release of version 10.

    We need some way to track those fixes...

    Well there’s an infinite supply of numbers, so let’s change the scheme

    Every time you release from the mainline, increment the number by 10. If releasing an update to an old release, increment the number by 1.

    Well that buys us some room, but if we ever have more than 9 fixes we’ll be in trouble.

    Notice now that we have switched from being able to determine the version number from a rule. The version number has now become subjective.

    We can refine the scheme to give us some more room, e.g. by switching to decimal numbers.

    Every time you release from the mainline, increment the number by 1. If releasing an update to an old release, increment the number by 0.1 

    If we consistently expect more than 9 updates to old releases then use an increment to 0.01. We can even be lazy about changing the increment and switch to 0.01 once we are at a .9 release

    So far we have been able to keep some invariants:

    Our version numbers are actual numbers.

    Newer things have bigger numbers than older things

    And if we are strict with policy, we can tell some more information from the version numbers. So for example if we say “only changes that  backport bug fixes from trunk” are allowed in older releases then you know that upgrading point releases should be safe.

    At this point we now leave the realm of the actual numbers.

    Semver and OSGi are amongst this class of schemes.

    Version numbers are [major].[minor].[patch]

    If you make a release that breaks backwards compatibility, increase the major number and reset the remaining numbers to 0.
    If you make a release that adds a feature but remains backwards compatible, increase the minor number and reset the patch number to 0.
    If you make a release that just fixes bugs, increase the patch number by 1.

    There are usually additional side rules to nail the scheme down somewhat, but the principle is the same. If you write something that depends on 5.0.0 you should be “safe” with any 5.?.? release. We are now relying on even more subjective criteria, namely the question of when we break backwards compatible behaviour.

    We can fool ourselves with tooling into the belief that there are automated checks of backwards compatibility, but that just captures changes to method signatures. It does not capture the documentation change that says you now need to call the ‘start’ method before using the ‘forward’ method.

    So yes, tooling can help us catch those cases where we should have updated the major version and forgot to. 

    Tooling can be more helpful with catching backwards compatible changes (i.e. new methods in an API) but even then there can be backwards compatible enhancements that are missed by tooling (‘You no-longer have to call the ‘start’ method before using the ‘forward’ method as an un-started widget will be started implicitly on the first call to ‘forward’” would be an example)

    By all means put tooling in place to help you catch those cases where you forgot to change the version number, but do not assume that the tooling absolves you from thinking about the version number.

    When a project is developing rapidly it can be very easy to make mistakes in identifying and categorizing breaking changes, even semver recognises this by having an opt-out of the 0.x.x version numbers… basically under semver as long as you are in a 0.x.x version number anything goes.

    Now layer on top the fact that semver does not distinguish the scope of the breaking change.

    You can have a breaking change where you fixed one method in a small corner of the API to require a previously optional parameter. That requires a major version bump.

    You can have a breaking change where you refactor the API splitting it from one class into two classes in a different namespace. That requires a major version bump.

    So when I see a major version bump I don’t know which of those has happened.

    There are really only two ways to fix that:

    1. Add another segment (splitting the major into "really major" and "sort of major”)
    2. Dual purpose the minor segment.

    So we are either at 4 segments or we loose the ability to differentiate changes that break backwards compatibility from those that are just adding functionality.

    Another thing to consider is that there is a difference in respect of version numbers for providers and consumers. So a change in an API almost always affects the provider of the API but with careful design it is often possible to ensure that the change is backwards compatible for consumers. OSGi is smart enough to make this distinction, semver however ignores this.

    So what we have is a situation where we want to expose information in a single “version number” but the information is contextual.

    Finally I present the AT&T history version number scheme (inherited by Avaya and Alcatel-Lucent):

    Version numbers are of the format [major].[minor].[service pack].[update].[patch].[build] (or depending on who you ask it can be [major].[minor].[service pack + update].[patch].[build])

    Increase the major version when you want to start cutting off support, the contract being to support the current and one back (or two back again depending on who you ask)

    Increase the minor version for a new release

    Increate the service pack for each service pack release. Updates to the service pack can also be released and should yield the equivalent, so Service Pack 2 Update 0 should be the same as Service Pack 1 Update 1 or Service Pack 0 Update 2. This distinction only makes sense when considering that the releases may be tied to physical hardware.

    Each patch that gets released gets its own unique number.

    You can build a patch multiple times before it actually gets released

    This scheme presents a totally different contract from the semver/OSGi scheme and is more focused on support and maintenance contract.

    What is the ideal?

    Well we all need a bikeshed to paint. My suggestion:

    1. The first segment should indicate what branches are being maintained. A project should indicate how many branches it is going to maintain and support, this could be:
      • just the current branch;
      • the current and one back; 
      • the current and two back; etc. 
      Increment this version when you want to drop support for a previous branch. A good time to do this is when you make major API refactorings. What support for the non-current branch means is determined by the project. It may be that older branches can get new features or it may be that only security fixes will be made to older branches. It could even be a mixture.
    2. The second segment should be incremented when there are breaking changes.
    3. The third segment should be incremented when there are new features and those new features do not break consumers.
    4. The forth segment should be incremented for bug fixes.
    5. An optional fifth segment to be used as a qualifier such as alpha, beta, rc, etc.

    This is not perfect, you still may want a separate version number for producers. But by adding the support contract to the version number you allow people to know when they should get concerned about running an older version. You also stop the rampant version number inflation that semver can lead to that seems to annoy some people… on the other hand we are now up to 4 version segments.

    If there is serious interest in this scheme I may dream up a name and publish a spec!

    Thoughts?

    0

    Add a comment

  5. One of the headline grabbing things that people seem to be raving about over Log4J 2.0 is async loggers.

    Now do not get me wrong, they are great… as long as you know what you are doing.

    If you are using async loggers you had better make sure you are only passing immutable objects to your loggers, because otherwise you may end up logging the state of your objects at the time of the log record being written and not the time you called log.

    Now there are many other good reasons why you should use immutable objects for your logging, but async logging is the tipping point.

    A few years ago I was burned by this exact class of problem in a conferencing call bridge, where we had a log message recording the participants that were actively talking (i.e. their volume was greater than the threshold). Spurious line noise would result in messages like:

    Bob has started talking, Bob’s volume is 1, threshold is 64
    Bob has stopped talking, Bob’s volume is 120, threshold is 64

    Which was a result of the mutable object that held the volume for Bob being changed by the media server in between creating the log record and writing the log record to disk.

    Consider the following simple test class:

    package test;

    import org.apache.logging.log4j.LogManager;
    import org.apache.logging.log4j.Logger;
    import java.util.concurrent.atomic.AtomicLong;
    public class App {
        private static final AtomicLong value = new AtomicLong();
        public String toString() {
            return Long.toString(value.get());
        }
        public long next() {
            return value.incrementAndGet();
        }
        public static void main(String[] args) {
            for (int i = 0; i < 32; i++) {
                new Thread() {
                    final Logger logger = LogManager.getLogger(App.class);
                    final App instance = new App();
                    @Override
                    public void run() {
                        for (int i = 0; i < 100000; i++) {
                            logger.warn("{} == {}", instance.next(), instance);
                        }
                    }
                }.start();
            }
        }
    }

    Now run this code...

    Here is the first few lines of logging output

    2014-07-28 15:59:45,729 WARN t.App [Thread-13] 13 == 13 
    2014-07-28 15:59:45,730 WARN t.App [Thread-29] 29 == 29
    2014-07-28 15:59:45,729 WARN t.App [Thread-15] 15 == 15
    2014-07-28 15:59:45,729 WARN t.App [Thread-6] 6 == 6
    2014-07-28 15:59:45,730 WARN t.App [Thread-30] 30 == 30
    2014-07-28 15:59:45,729 WARN t.App [Thread-20] 20 == 20
    2014-07-28 15:59:45,729 WARN t.App [Thread-8] 8 == 8
    2014-07-28 15:59:45,730 WARN t.App [Thread-28] 28 == 28
    2014-07-28 15:59:45,729 WARN t.App [Thread-19] 19 == 19
    2014-07-28 15:59:45,729 WARN t.App [Thread-18] 18 == 18
    2014-07-28 15:59:45,729 WARN t.App [Thread-5] 5 == 6
    2014-07-28 15:59:45,731 WARN t.App [Thread-13] 33 == 37
    2014-07-28 15:59:45,731 WARN t.App [Thread-8] 39 == 39
    2014-07-28 15:59:45,731 WARN t.App [Thread-28] 40 == 41
    2014-07-28 15:59:45,731 WARN t.App [Thread-18] 42 == 43
    2014-07-28 15:59:45,731 WARN t.App [Thread-5] 43 == 43

    Spot anything wrong with that?

    Now of course I have written the class to have mutable state… because I wanted to test if Log4J 2.0 was capturing the toString() at the time of logging… which it isn’t.

    So if you have an object with a toString() method that depends on mutable state, you will have to:

    • Check that the logger is enabled for the logging level
    • Call toString() by hand
    • Pass the toString() result as the parameter.
    To make my previous code work with Asynchronous loggers (other than by fixing the mutable state) I would need to log like this:
    if (logger.isWarnEnabled()) {
        logger.warn("{} == {}", instance.next(), instance.toString());
    }

    Yep, we are back to that old pattern.

    TL;DR Unless you know that every parameter object passed to a log statement is using immutable objects, enabling Asynchronous loggers in Log4J may well result in logs with messages you cannot trust.  

    Update

    Here is the log4j2.xml configuration file I used

    <?xml version="1.0" encoding="UTF-8"?>

    <!-- Don't forget to set system property
    -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
    to make all loggers asynchronous. -->

    <Configuration status="WARN">
    <Appenders>
    <!-- Async Loggers will auto-flush in batches, so switch off immediateFlush. -->
    <RandomAccessFile name="RandomAccessFile" fileName="async.log" immediateFlush="false" append="false">
    <PatternLayout>
    <Pattern>%d %p %c{1.} [%t] %m %ex%n</Pattern>
    </PatternLayout>
    </RandomAccessFile>
    </Appenders>
    <Loggers>
    <Root level="info" includeLocation="false">
    <AppenderRef ref="RandomAccessFile"/>
    </Root>
    </Loggers>
    </Configuration>

    Which is the example on the Log4j Website

    0

    Add a comment

  6. We have all had the bad technical managers… the ones who tell you to go and do something… and then micromanage the task they have “delegated” to you.

    They are a pain to work for.

    One of the qualities of a good manager is to tailor their delegation to the people they are delegating to.

    So they task Gillian, a senior engineer, with “writing some smoke tests to verify that the application works when deployed to the app server” and they are happy to let Gillian work within that scope because they trust that Gillian will come back to seek clarification if the description is lacking, or if there are issues down the road.

    Kevin, a junior engineer, on the other hand, has proven over many many tasks to need more detail, so they give Kevin a more detailed written description of the task and they check in on Kevin as he is going along… because Kevin has proven that he needs a certain level of micromanagement… the good manager is all the time testing to see if Kevin can survive with less micromanagement.

    Note: one of the biggest cognitive distortions of bad managers is the thinking that in order to treat everyone fairly, you must treat everyone equally. If we did that then either Gillian gets micromanaged which is unfair to Gillian or Kevin gets under-managed which is unfair to Kevin.

    So, what does all that have to do with liking Maven?

    Well Maven is a build tool. If you need Maven to build a .war file, you should just tell it to build a .war file

    <project>
      <groupId>…</groupId>
      <artifactId>…</artifactId>
      <version>…</version>
      <packaging>war</packaging>
      <dependencies>
        ...
      </dependencies>
    </project>

    Why do you want to micromanage Maven and tell it every little thing it must do to build the .war file. Wait until you see what it did with your instructions, then and only then should you refine your instructions.

    So… If you don't like Maven, you would not be a good manager… The corollaries are not necessarily true though...

    • Liking Maven does not mean you will be a good manager… there are other qualities that make a good manager… understanding when and how to micromanage is only one of those qualities
    • Being a good manager does not mean you will like Maven… for example if all your developers (who love to micromanage and thankfully are not managers because of their micromanagement adoration) tell you that Maven is a pain… well a good manager will hear what their team is saying and take it on board… after all they are there to manage, not to do… if they were to see a Maven build where it has not been micromanaged by the developers, then a different view they will form.
    0

    Add a comment

  7. There are two ways to build a Maven project with Jenkins*
    • Use a free-style project with a Maven build step
    • Use a Maven-style project
    The first way runs the build as Maven intended. The second way adds a whole lot of hooks and can even modify the build in ways that Maven did not intend.
    The first way requires that you configure stuff yourself. The second way tries to “guess” what you want and auto-configure it.
    The first way is initially less user friendly, i.e. you have more UI to click through to get the full set of reports. The second way is initially more user friendly… but when things go wrong… well sorry out of luck.
    If something goes wrong with the first way, worst case you add a shell build step above the Maven build step that just runs SET, trigger a build, login to the build slave, switch to the user the build is running as, apply the environment your SET build step output and then run the Maven command that the build’s console log captured. That will give you an exact reproduction of the Maven build and you can debug why your build is not working.
    When something goes wrong with the second way, well good luck. By all means try to do the same as you would for a free-style project, but at the end of the day, there is no way you can replicate the injected hooks that Jenkins puts into your Maven build. You can get an approximate reproduction, and hey, that may just be enough to let you figure out what is wrong and fix your build… but there are cases where you cannot.
    Hence why, since 2007, I have been saying that the Maven job type is considered evil…
    • It has very attractive because is easy to configure (so users use it) and gives nice per-module reports
    • When it blows up, and it will blow up, it blows up big
    -Stephen
    * well actually three ways§ if you include the literate job type
    § (Update 2017-08-01) well actually four ways if you include pipeline and the withMaven helper
    6

    View comments

  8. I was given a link to yet another article preaching about how 100% code coverage is the only goal (the article is preaching, not Tatu BTW)

    There is an issue I have with how the 100% coverage advocates present their argument, but we'll get to that later.

    The case for 100% coverage is put quite simply:

    Anything less than 100% coverage means that there are some lines in your code that are untested, and are you seriously willing to let code be released without being tested?

    Now if your code is 100 lines long, and you have 95% coverage, 5 lines doesn't seem too bad. In fact looking at the code coverage report you see those 5 lines, and you make a judgement call that they are an acceptable risk. No harm done, right?

    When the code grows to 1,000 lines long, 95% coverage is giving you less of that warm and fuzzy feeling, but a quick scan of the coverage report and you can visually check the 50 lines untested, and you make a judgement call that they are an acceptable risk. No harm done, right?

    With a 10,000+ LOC code base and 95% coverage and your boss wants to know if it is OK to push this code into production now, the 500+ uncovered lines that you need to review are now much more than a headache…

    Are you a convert now? Do you believe in 100% coverage? It's a nice argument. But there is a flaw… in some cases that is!

    When you are dealing with a dynamically typed language, especially some of the more expressive ones, the only tests you can have are the tests you write yourself. In those languages, the 100% coverage zealots have me sold (at least for code that needs to be maintained and is longer than 100 lines or so!)

    But in different languages we can, and do, have additional tests that are provided by tooling:

    • Syntax tests that verify that every line of code is syntactically correct
    • Scoping tests (in languages that force declaring names before first use within a scope) that verify that each line only accesses those names within the correct scope for the line.
    • Type tests (in statically typed languages) 

    And that's not all… most of the code I write runs on the JVM, there are more tests I can add to the mix when on the JVM

    • Static analysis tools, such as Checkstyle, PMD and Findbugs provide an additional set of tests that I can run automatically on my code looking for common problems and possible mistakes. In fact I've found and fixed bugs with Findbugs in code that had 100% coverage already.
    • Annotation's in code can be used to, not only document the code contracts, but aid and assist Findbugs in catching bugs. Specifically I am referring to the @CheckForNull and @NonNull annotations. I apply these annotations to the production code, and there are tests applied to the code for free by the toolchain I use

    So when I am writing Java code, every line I write already has at least five tests covering it and I haven't even started adding unit tests into the mix!

    Now I am not arguing that the above tests are enough for your code on it's own… but when you look at my unit test coverage at 83.7% and ask am I "happy to ship with 1,630 untested lines of code", I will answer that those 1,630 lines of code are tested, they may not be our best tests, but we have tests on them.

    Show me a real code base with 100% coverage, and I will show you a good number of crappy tests helping that code base get to 100% coverage...

    On the other hand, if you ask me am I happy to ship that Ruby on Rails / Node.JS / etc application into production with 99.5% coverage, I'll say no way are we shipping that code with 50 untested LOC.

    0

    Add a comment

  9. For use when you have multiple JVM providers (Apple & Oracle), you want to be able to switch between JDKs for each CLI

    usejava ()
    {
        local sel=$1.jdk
        if [ -x "/Library/Java/JavaVirtualMachines/jdk$sel/Contents/Home/bin/java" -a ! -x "/Library/Java/JavaVirtualMachines/$1/Contents/Home/bin/java" ]
        then
            sel=jdk$sel
        fi
        local base=/Library/Java/JavaVirtualMachines
        if [ -x "/System/Library/Java/JavaVirtualMachines/$sel/Contents/Home/bin/java" ]
        then
            base=/System/Library/Java/JavaVirtualMachines
        fi
    
        if [ -z "$1" -o ! -x "$base/$sel/Contents/Home/bin/java" ]
        then
            local prefix="Syntax: usejava "
            for i in /Library/Java/JavaVirtualMachines/* /System/Library/Java/JavaVirtualMachines/*
            do
                if [ -x "$i/Contents/Home/bin/java" ]
                then
                    /bin/echo -n "$prefix$(basename $i | sed -e "s/^jdk//;s/\.jdk$//;")"
                    prefix=" | "
                fi
            done
            /bin/echo ""
        else
            if [ -z "$JAVA_HOME" ]
            then
                export PATH=$base/$sel/Contents/Home/bin:$PATH
            else
                export PATH=$(echo $PATH|sed -e "s:$JAVA_HOME/bin:$base/$sel/Contents/Home/bin:g")
            fi
            export JAVA_HOME=$base/$sel/Contents/Home
            echo -n -e "\033]0;$(java -version 2>&1 | sed -e "s/.*\"\(.*\)\".*/Java \1/;q")\007"
        fi
    }

    There is additional fun to be had, given that most Java based launchers that try to fix JAVA_HOME when not set will guess the Apple JVM path… so the following Java program can help

    public class FixJavaHome {
      public static void main(String[] args) {
      	String javaHome = System.getProperty("java.home");
      	if (javaHome.endsWith("/jre")) {
      		javaHome = javaHome.substring(0,javaHome.length() - "/jre".length());
      	}
        System.out.println("export JAVA_HOME=\""+javaHome+'\"');
      }
    }
    

    Install like so

    mkdir -p ~/bin/FixJavaHome && cd ~/bin/FixJavaHome && cat > FixJavaHome.java <<EOF
    public class FixJavaHome {
      public static void main(String[] args) {
      	String javaHome = System.getProperty("java.home");
      	if (javaHome.endsWith("/jre")) {
      		javaHome = javaHome.substring(0,javaHome.length() - "/jre".length());
      	}
        System.out.println("export JAVA_HOME=\""+javaHome+'\"');
      }
    }
    EOF
    javac FixJavaHome.java
    cd -

    If you add the following to your ~/.bash_profile

    eval $(java -cp ~/bin/FixJavaHome/ FixJavaHome)
    echo -n -e "\033]0;$(java -version 2>&1 | sed -e "s/.*\"\(.*\)\".*/Java \1/;q")\007"
    

    Then your JAVA_HOME should be set up from the start, as well as your Terminal window title

    0

    Add a comment

  10. http://j.mp/TfdoxL a link to a free online iOS app development training course that looks worth checking out
    0

    Add a comment

About Me
About Me
Labels
Useful links
Blog Archive
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.