Motivating a new Serialization API
It’s not reinventing the wheel if you’re smoothing a square
By Nico Vaidyanathan Hidalgo
In these works we’ll review the motivation, design, implementation, performance analysis, and optimization for building the fastest, leanest, and highest-quality serialization library demonstrably available in Java.
Case Study – Amazon
In the mid 2010s I worked on the Service Frameworks team in Amazon. Service Frameworks owned the core libraries and software frameworks for building Service Oriented Architectures. Our flagship product was Coral, a Remote Procedure Call Web Service framework used by tens of thousands of services in production.
Amazon built Coral in Java for internal usage as an alternative to SOAP web services. The closest external analogues are Apache Thrift or Google ProtocolBuf. Developers wrote server side code in Java; then generated clients in multiple languages. Coral powered many products including Prime Now, Prime Video, and almost all of AWS. In fact, Coral ran almost the entire company. When I joined the team, almost a decade had passed since its original development; all of the original authors had left.
Like all successful software, Coral had technical debt. Principal Engineers aggressively analyzing performance for cost-reduction discovered that it contained hand-written object serialization code accounting for more than 40% of an average service’s runtime call graph and resource utilization. Leadership called for optimization.
Coral’s original design relied on a hand-crafted double dispatch VISITOR pattern affectionately called “Transmuters.” Coral’s original authors developed Transmuters while products such as Java Serialization, Thrift, ProtocolBuffers, Jackson, and Gson already existed. However, Amazon’s technical leadership generally preferred not to rely on 3rd party code for important core functions. With time, the code and complexity of the components used to implement Transmuters grew significantly. Moreover, all of the original engineers left. As a result, it became difficult for the owning team to reason about, debug, or optimize while simultaneously delivering new features and providing hands-on support for builders.
Consequently, a Tiger Team of engineers tackled the problem. After more than a year of work–involving hundreds of man-hours of effort in development, testing, release, and adoption of the new serialization technology– the resource utilization of the average service across the company dropped significantly. As a result., many services could be re-provisioned on cheaper hardware, saving millions. Response times from service calls dropped across the company–as many of the services affected were directly used by the retail website and AWS– correlating with increased revenue. Cheaper hardware, happier customers, from better technology.
This example is admittedly anecdotal. Due to the code being closed source, I cannot provide an in-depth scientific or engineering analysis of the problem and solution. It simply demonstrates my intimate familiarity with object serialization. The generalizable takeaway here is that common library/framework code can cost a company millions of dollars and many hours of maintainability issues if it is slow, complex, and unwieldy solely due to scale.
Impact on Service Oriented Architectures and Microservices
As outlined by Steve Yegge in Stevie’s Platforms Rant, Amazon became most well-known in the 2000s for being the first major company to fully embrace a Service Oriented Architecture and do it right. This led Amazon to build the massive AWS organization that is the backbone of the modern internet, embodied SOA success in a way that most of the rest of the industry has tried to emulate, and evolved technical architectures towards microservices and Command Query Responsibility Segregation. AWS even monetized the microservice trend with Lambda, marketed as “function-as-a-service.” The smaller services get, the more library/framework concerns such as object serialization become proportionately more impactful.
Consider the following simple RPC over HTTP web microservice.
List<String> lookupUSZipCode(String state, String city)
This microservice simply looks up values in a hard-coded hash table stored in memory. In such a service, the O(1) lookup becomes an almost insignificant part of the CPU and memory usage compared the work the web server does reading bytes from the socket, parsing and forming an HTTP Request object, passing that object to the service framework which translates it into the input object for the service, passing the input object to the service to actually execute the “business logic”, then translating the output object to a HTTP Response object for the web server to return. If the infrastructure code is comparatively slow, any kinds of optimizations within the service itself–such as caching or switching to a faster collection implementation– mean relatively little. It’s building a castle on sand.
Search Engine Marketing/Optimization repeatedly discovered in the 2000s that faster and easier traversals through the goal funnel for customers correlated with increased revenues and customer satisfaction. As technology has become more and more pervasive in human life–with the advent of dating applications, electronic sports, video/audio streaming services, and digital assistants– the difference between high-latency and low-latency satisfaction has expanded from e-commerce as well. While network connectivity issues cannot be completely obviated by software, software can itself run considerably faster when core components such as object serialization run faster.
Security: More Code = More Potential Surface Area of Attack
Software security is difficult and technical. A variety of attack vectors, such as buffer over/underflows, SQL injection, clickjacking, and privilege escalation make naive code that does the easiest thing potentially vulnerable to a variety of attacks. Java serialization offers a unique case study. Most items Joshua Bloch wrote in Effective Java about serialization directly result from security vulnerabilities discovered by users after its release. Brian Goetz and Stuart Marks’ presentation, “Why we hate Java Serialization and what we might do about it”, effectively decided that object serialization “tried to do too much”, leading to these security vulnerabilities. Java’s standard serialization is not the only library to suffer this way. The immensely popular Jackson serialization library had 80 CVEs listed as of November 2022. Both of these libraries are relatively large in terms of number of lines of code, increasing their attack surface area. As the conventional wisdom in information assurance says, “the less code there is, the less potential attack surface.”
In 2022, with terabyte hard drives and gigahertz processors, it may seem misguided or “a source of premature optimization” to consider a few kilobytes to hundreds of kilobytes worth of unused or potentially vulnerable class files as “wasteful.” There are some environments, however, where such concerns are more impactful. AWS Lambda supports microservices written in Java. Such microservices live on transient servers and may be reprovisioned on demand, leading to the possibility that the JVM will be started and stopped on practically every service invocation. JVM startup is generally the slowest part of Java environments, it scales linearly with the number of classes on the classpath. Any effort that reduces the number of classes loaded in such an environment can improve performance perceptibly.
Simplicity, as defined in a software context by Rich Hickey in Simple Made Easy, relates to the minimal number of components and interactions necessary to accomplish a task. Kent Beck codifies simplicity as a value in Xtreme Programming with a set of rules for simple design. Many professional software developers will argue that the value of simplicity is lower than other qualities such as correctness, performance, maintainability, or ease of implementation. Yet as I showed in my doctoral dissertation, evidence suggests that focus on simpler systems results in more correct and maintainable software. This is because code simplicity directly aligns with the principles of Cognitive Load Theory. More complex code is harder to reason about, debug, fix, and optimize. Simpler code enables all other evolution because programmers spend more time reading than writing code over the lifetime of a piece of software, as shown in repeated studies. Simple code effectively sequences and chunks functionality through a minimal set of aptly named and straightforward to follow abstractions that precisely and succinctly tell the story of a system. Such qualities are not only a “matter of taste”, they are directly measurable and can be derived from an analysis of software complexity metrics and the conscious application of certain stylistic rules such as those outlined in Robert Martin’s Clean Code.
I’ve listed simplicity after security and performance, reflecting the common perspective of ordered importance many professionals hold in the field. The actual order, in my view, is reversed. A paramount focus on simplicity results in code that is easier to make performat and secure. Essentially, two quotes by some of the most brilliant minds of the 20th century tell the story.
When I am working on a problem, I never think about beauty but when I have finished, if the solution is not beautiful, I know it is wrong. -- Buckminster Fuller
Any intelligent fool can make things bigger and more complex. It takes a touch of genius - and a lot of courage - to move in the opposite direction. -- E.F Schumacher
Why High Quality?
This might seem like a rhetorical question, as very few developers intentionally write low-quality code. The interesting part comes when people are asked to define software quality. Software engineering researchers often define metrics such as McCabe’s Cyclomatic Complexity, yet many engineers have never heard of these constructs. In my professional experience, I’ve never seen these metrics described in a code-review or any type of technical meeting. The most agreed upon sets of rules for “quality” often become the types of checks that can be automated and included in tools like Checkstyle, Fortify, SonarQube, and other such static code analysis tools. These tools are often configurable. They are also often disabled when software engineers discover disagreeable restrictions, such as capping the number of characters on a line at 80.
Nevertheless, software quality remains a highly sought after goal, even imprecisely defined. Kent Beck and other Agile software thought leaders often use descriptive language such as Code Smells and describe heuristics to make “high quality software.” Yet these descriptors often have an “I know it when I see it” element to them. I’ve personally witnessed (and been a part of quite a few) contentious conversations between engineers as to whether a certain piece of code meets a quality bar.
Most agree that higher quality software should “minimize cost and maximize benefit.” Most generally hold that higher quality software is easier to reason about and understand. My research provides some corroborating evidence. I found that higher-quality refactored code correlated with less time to identify and repair a bug in the widely used JodaTime library.
Most of my academic and professional career have been spent in pursuit of software quality. Although I have yet to find a predictive linear model, experience has shown me that quality does correlate with Kent Beck’s Design Rules. I’ll summarize them with some discussion on how to measure them.
Beck’s Design Rules
Runs All The Tests
“Runs All the Tests” can be measured by code’s line, statement, and branch coverage. The more coverage, the more confidence can be had that predictable and isolated failure cases have been thought through and mitigated. Tools such as JCov, Cobertura, and JaCoCo all make direct measurement of these metrics possible.
“Reveals Intention” hearkens back to simplicity. Qualitatively, code that follows Robert Martin’s advice in Clean Code seems to “do it better.” It is harder to quantify directly. I’ve had some success towards quantification by comparing the perceived Cognitive Load based on a 7-point Likert Scale in my doctoral dissertation experiment. By rigidly applying the Miller’s Magic Number 7+-2 metric to lines of code in a function, functions in a class, and classes in a package, I refactored JodaTime’s code into a experimental version where participants reported lower average cognitive load and were more frequently able to identify and fix a bug.
Following the outcomes from that experiment, and the procedure laid out by Douglas Hubbard in “How to Measure Anything”, quantifying whether code reveals intention would involve convening a group of experts. They would review a training set of code using a subjective 7-point Likert scale, calibrate through group discussions to achieve rough consensus on gradations, then estimate the Cognitive Load for the code in question. If this looks suspiciously like a Classification problem from Machine Learning, it’s because the overall approach is quite similar. Although I’m not aware of any such efforts being conducted at a large scale, it is theoretically possible.
No duplication is surprisingly tricky because it contains an easy part and a hard part. Static code analysis tools can easily detect copy-pasted repeated code. The IDEA IntelliJ Integrated Development Environment has such a feature enabled in its code checks for Java projects. Less obvious and insidious Code Smells such as Shotgun Surgery, Divergent Change, and Parallel Inheritance Hierarchy require some analysis to divine the commonalities despite superficial differences.
Fewest Elements may be the easiest of the rules to quantify, once the others are followed. Within a library itself, applying Code Smell elimination techniques like removing Dead Code and eliminating Speculative Generality align with Fewest Elements, so detecting them implies a failure to do so. Counting the number of lines/methods/classes in a library can also give a rough indicator. Not “exact”, as stylistic choices and different feature sets–such as support for different wire formats– may create situations where fewer elements does not imply feature parity. But it can be a good starting point.
While Beck’s Design Rules can form a baseline for evaluating new software without history, mature software that’s been deployed and used in anger offers additional data points. Compelling metrics include the number of defects, average time it takes to fix identified and acknowledged defects, and the impact/severity of defects such as Common Vulnerabilities and Exposures (CVE). Each of these has caveats. Number of defects observed scales with the number of users as suggested by Eric S. Raymond’s Cathedral and Bazaar model. It’s possible that software without many reported issues is simply unused. Time to fix is scaled by resource availability and allocation. If a project doesn’t have sufficient development resources to support bug fixes and new features, or doesn’t allocate them efficiently, even “simple fixes” can take a while. Impact generally requires an in-depth analysis and some intuition to gauge accurately. Misclassification is not uncommon, sometimes seemingly “high impact” issues have a surprisingly small blast radius and vice-versa.
In this piece I’ve laid out the motivation/requirements for a new object serialization API in Java. It should be fast because the cost of slow is massively impactful at scale. Serialization/translation of object models becomes an increasingly larger part of the software performance profile the more partitioned functionality becomes. It should be lean because leaner software is friendlier for security, performance, and comprehensibility–which powers every other positive attribute. It should be high-quality because quality minimizes cost and maximizes effectiveness for software throughout its lifetime. Next, I examine how existing offerings meet these needs.