Realizing the software of the future...today

Leanness and Code Quality of Java Serialization Frameworks

Running SCC & SonarQube on selections

By Nico Vaidyanathan Hidalgo

During part 1 of this series, I defined fast, lean, and high quality as the primarily desired traits for object serialization. This article dives into leanness. Mary & Tom Poppendieck's Lean Software Development inspired a lot of my thinking in this regard.

Leanness is important for improving security, performance, and comprehensibility. I evaluate the leanness of the selected libraries by comparing the size of their code-bases in terms of lines of code, number of classes/abstractions, and dependency footprint. Generally the minimum necessary is preferred, but it’s worth noting a few important caveats. Less lines of code is not always better. Fewer lines might be defect-dense or difficult to understand. Similarly, fewer classes/abstractions are not necessarily superior. Fewer classes may signal a lack of separation between interface and implementation or yield repeated instances of GOD CLASS/BIG BALL OF MUD.

I count lines of code and classes with SCC and SonarQube. I present the results in a table and some analysis of each calculation. Finally I provide a ranked list based on my criteria for leanness.

SCC

Java IO

SCC output for Java
Limiting only to Java files in the java.io package, using a COCOMO project type of embedded, and an average developer salary of $150K USD

Gson

SCC output for Gson
Limiting only to Java files, using a COCOMO project type of semi-detached, and an average developer salary of $150K USD

Jackson

SCC output for Jackson Core SCC output for Jackson Databind
Running on both the jackson-core project and the jackson-databind project, limiting only to Java files, using a COCOMO project type of embedded, and an average developer salary of $150K USD

Johnzon

SCC output for Johnzon
Limiting only to Java files, using a COCOMO project type of semi-detached, and an average developer salary of $150k USD

Kryo

SCC output for Kryo
Limiting only to Java files, using a COCOMO project type of semi-detached, and an average developer salary of $150k USD

Discussion

I find the Boehm et al's COCOMO calculations interesting from my graduate work in software engineering, but honestly I don’t put much weight into them. Boehm’s work is valiant given the scope of the field, but linear regression models on 63 projects in the 1970s and 161 projects in the 1990s simply does not fill me with confidence.

The models are interesting in that they find GSON and JavaIO to be roughly the same development cost, despite estimates of GSON taking approximately 2 years more to produce 10K more lines of reduced complexity. Johnzon ranks approximately the same as Jackson despite having 7K fewer lines, both seeming to require about 20 developers and around 30 Person-Years. Kryo’s cost estimate is surprisingly cheap given its quality and performance.

For my analysis I’m more interested in grading the leanness of the projects based on their size and complexity. I want to optimize for the least number of lines of code, least number of files, and complexity per line. This leads to the following:

LibraryLines of CodeFilesComplexity/line
Java IO 14,439 89 1030.26
Kryo 23,706 131 1816.33
Gson 25,951 217 1329.95
Johnzon 43,636 398 2452.08
Jackson (Core + Databind) 51,808+141,710 283+1101 2081.25+6282.13

SonarQube

I downloaded and set up a local instance of SonarQube 9.7.1.62043 and manually instrumented each project to provide code coverage numbers using JaCoCO 0.8.8, where possible.

Java IO

I found myself simply unable to import the portion of OpenJDK containing only java.io into SonarQube. SonarQube integrates most easily with Java projects built in Gradle/Maven, the JDK itself is another matter. After some time investigating integrating it into the Makefiles and getting JCov to report coverage statistics, I gave up. I would be happy to add the JDK’s serialization analysis if someone can figure out how to instrument/import it correctly.

Gson

SonarQube for Gson
Import of Gson into SonarQube

Jackson

SonarQube for Jackson
Import of Jackson-core into SonarQube. I wasn’t able to import the databind project due to errors with its JavaDoc.

Johnzon

SonarQube for Johnzon
Import of Johnzon into SonarQube.

Kryo

SonarQube for Kryo
Import of Kryo into SonarQube. Of note here is that Kryo’s non-standard Maven project structure made it prohibitive to instrument with JaCoCo, so I was unable to get coverage numbers.

Discussion

It is interesting to note how wildly different the source line count is between SCC and SonarQube. The counts are much lower in SonarQube, perhaps evocative of a different Java specific scheme for analyzing the source files.

SonarQube applies some pretty strong opinions on what it interprets as Bugs or Code Smells, but as these numbers can be interesting in evaluating the leanness of a library. Overall, the leanest library of highest quality would have the lowest ratio of bugs/lines, smells/lines, and minimize duplications. These are calculated below:

Library Bugs/lines Smells/lines % duplications % coverage
Gson .001 .0461 2 78.3
Jackson .0005 .0857 15.8 77.9
Johnzon .001 .0416 3.5 63.9
Kryo .002 .1076 10 ??

Where do we go from here?

Static code analysis tools can give us a rough idea of the leanness of software and initial impressions of the code quality, but as Grady Booch often repeated in his On Architecture podcast, "The raw, running, naked code is The Truth." In the next piece, I dive into the architecture and discover the generalizable pattern for serialization libraries.