Book Review: Data-Oriented programming

This is the review of the Data-Oriented programming (Reduce software complexity) book.

(My) Conclusion

This book is about a “new” programming paradigm called Data Oriented Programming (a.k.a. DOP). I put the word new between quotes because the underlying concepts and principles have been around for much longer in various forms and under different names.

DOP is a programming paradigm aimed at simplifying the design and implementation of software systems, where information is at the center in systems such as frontend or backend web applications and web services. Instead of designing information systems around software constructs that combine code and data (e.g., objects instantiated from classes), DOP encourages the separation of code from data.

The author don’t really explains the history of the DOP but it looks like the term was coined for the first time in the gaming industry. One significant influence on the development of data-oriented programming principles was the publication of “Game Engine Architecture” by Jason Gregory in 2009.

The term “data-oriented programming” (DOP) was popularized by Noel Llopis, a software engineer, in his blog post titled “Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)” published in 2010. In this blog post, Llopis introduced the concept of data-oriented design as an alternative approach to Object-Oriented Programming (OOP) for certain types of performance-critical applications, especially in the context of video games and real-time simulations.

Part 1 – Flexibility

To introduce DOP (Data-Oriented Programming), the initial section of the book commences with an illustration of a project implemented solely in OOP (Object-Oriented Programming). The aim is to demonstrate that a system developed using OOP tends to become intricate and challenging to comprehend due to the mixing of code and data within objects. Additionally, such systems pose greater difficulties when adapting to changing system requirements.

The first principle of DOP advocates a distinct segregation between code (behavior) and data. Data is organized into data entities, while the code is housed within code modules. These code modules consist of stateless functions; this functions receive data they manipulate as an explicit argument.

The second principle of DOP is to represent application data with generic data structures. The most common generic data structures are maps (aka dictionaries) and arrays (or lists). But other generic data structures (e.g., sets, trees, and queues) can be used as well.

The third principle of DOP is that the date is immutable. Within the DOP framework, data modifications are achieved by generating new versions of the data. Although the reference to a variable can be altered to point to a fresh version of the data, the actual value of the data remains unchanged.

This concept of structural sharing enables the efficient creation of new data versions in terms of memory and computation. Instead of duplicating common data between the two versions, they are shared, leading to better memory utilization.

Part 2 –  Scalability

The second part of the book delves deeper into the practical application of DOP, addressing its primary limitations from a beginner’s perspective and proposing solutions for handling these challenges:

  1. DOP is primarily concerned with generic data structures, making it difficult to determine the specific types of data being used. In contrast, OOP allows for clear identification of data types associated with each data element.
  2. Implementing immutable data structures with structural sharing can result in performance issues, especially for large data structures.
  3. DOP’s reliance on in-memory data poses challenges when interacting with relational databases or other external services.

To tackle the first issue, the author introduces the fourth (and final) DOP principle: “Separate data schema from data representation.” This involves representing the data schema using JSON schemas. Various libraries are available for implementing JSON schema validators, see JSON Schema Validators.

To tackle the second concern, the author introduces the “persistent data structures”. Persistent data structures refer to data structures that allow for efficient modification and querying while preserving the previous versions of the data structure. These data structures are designed to support operations that create new versions of the structure rather than modifying the existing structure in-place.

Some examples presented in the first part of the book are re-implemented using persistent data structures in JavaScript using the Immutable.js and in Java using Paguro.

To address Database and WebServices operations, same of the DOP principles are used. For instance, regardless of whether the data originates from a relational or non-relational database, it is depicted using generic data structures like maps or lists. These generic data structures can then be manipulated using generic functions. These functions include operations such as generating a list composed of the values from specific data fields, producing a map variant by excluding a particular data field, or grouping maps into a list based on the values of a specific data field.

Part 3 – Maintainability

The final section of the book delves into more advanced subjects, such as data validation using JSON Schema regex patterns, creating data model diagrams from JSON Schema (utilizing tools like JSON Schema Viewer or Malli), and generating unit tests based on schemas using JSON Schema Faker.

Additionally, certain topics, such as implementing polymorphism through multimethods , while not directly related to DOP, are nonetheless highly intriguing and educational.

 

 

How to fix “ClassNotFoundException” for Burp Suite extension using Jersey

Context

I am the maintainer of a BurpSuite extension that is implementing a REST API on top of Burp Suite. The goal of this REST API is to offer basic actions (retrieve a report, trigger a scan, retrieve the list of scanned url) and is executed on a headless Burp Suite from a CICD pipeline.

From the technical point of view, the extension is implemented in Java and I’m using the JAX-RS specification in order to implement the REST-APIs and Jersey as JAX-RS implementation.

Problem

One of the REST entry points was returning a Set<OBJECT> where OBJECT is a POJO specific to the extension. When a client was calling this entry point, the following exception was thrown:

Caused by: java.lang.ClassNotFoundException: org.eclipse.persistence.internal.jaxb.many.CollectionValue
at java.base jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
at org.eclipse.persistence.internal.jaxb.JaxbClassLoader.loadClass(JaxbClassLoader.java:110)

Root Cause

A ClassNotFoundException is thrown when the JVM tries to load a class that is not available in the classpath or when there is a class loading issue. I was sure that the missing class (CollectionValue) was in the extension classpath so the root cause of the problem was a class loading issues.

In Java the classes are loaded by a Java classloader. A Java classloader is a component of the Java Virtual Machine (JVM) responsible for loading Java classes into memory at runtime. The classloader’s primary role is to locate and load class files from various sources, such as the file system, network.

Classloaders in Java typically follow a hierarchical delegation model. When a class is requested for loading, the classloader first delegates the request to its parent classloader. If the parent classloader cannot find the class, the child classloader attempts to load the class itself. This delegation continues recursively until the class is successfully loaded or all classloaders in the hierarchy have been exhausted.

The classloader hierarchy of a thread that is serving a JAX-RS call looks like this:

The classloader hierarchy of the thread that is executing the Burp Suite extension looks like this:

So, the root cause of the ClassNotFoundException is that the classloader hierarchy of the threads serving the JAX-RS calls it does not include the (Burp Suite) extension classloader and so none of the classes from the (Burp Suite) extension classpath can be loaded by the JAX-RS calls.

Solution

The solution is to create a custom classloader that will have to be injected into the classloader hierarchy of the threads serving the JAX-RS calls. This custom classloader will implement the delegation pattern and will contains the original JAX-RS classloader and the Burp Suite extension classloader.

The custom classloader will delegate all the calls to the original Jersey classloader and in the case of loadClass method (which is throwing a ClassNotFoundException) if the Jersey classloader is not finding a class then it will delegate the call to the Burp Suite extension classloader.

The custom classloader will look like this:
public class CustomClassLoader extends ClassLoader{
  private final ClassLoader burpClassLoader;
  private final ClassLoader jerseyClassLoader;
  
  public CustomClassLoader(
                            ClassLoader bcl,
                            ClassLoader jcl){
     this.burpClassLoader = bcl;
     this.jerseyClassLoader = jcl;
  }

 @Override
  public String getName(){
     return "CustomJerseyBurpClassloader";
  }
  
 @Override
  public Class<?> loadClass(String name)
      throws ClassNotFoundException {
     try {
        return this.jerseyClassLoader.loadClass(name); 
     } catch (ClassNotFoundException ex) {
         //use the Burp classloader if class cannot be load from the jersey classloader
        return this.burpClassLoader.loadClass(name); 
    }    
  } 

//all the other methods implementation will just delegate 
//to the return jerseyClassLoader
//for ex:
 @Override
  public URL getResource(String name) {
  return return this.jerseyClassLoader.getResource(name);  
  }
 .......
}  

Now, we have the custom classloader; what is missing is to replace the original Jersey classloader with the custom one for each REST call of the API. In order to do this, we will create a Jersey ContainerRequestFilter which will be called before the execution of each request.

The request filter will look like this:
public class ClassloaderSwitchFilter 
  implements ContainerRequestFilter {
  @Override
  public void filter(ContainerRequestContext requestContext) 
        throws IOException {
        Thread currentThread = Thread.currentThread();
        ClassLoader initialClassloader = 
              currentThread.getContextClassLoader();

        //custom classloader already injected
        if (initialClassloader instanceof CustomClassLoader) {
            return;
        }

        ClassLoader customClassloader =
                new CustomClassLoader(
                        CustomClassLoader.class.getClassLoader(),
                        initialClassloader);
        
        currentThread.setContextClassLoader(customClassloader);
  }
}

Introduction to Web Assembly for Java engineers

Introduction

The goal of this ticket is to present the different technological components of WebAssembly  in comparison with the Java technological stack.

Why comparing WebAssembly with Java ? I think that WebAssembly have more chances to succeed in achieving the slogan “Write once, run anywhere” that have been coined  more than 25 years ago to illustrate the cross-platform benefits of the Java language.

WebAssembly is a standard that contains a virtual Instruction Set Architecture (ISA) for a stack machine. WebAssembly is designed to run on a virtual machine. The virtual machine allows WebAssembly to run on a variety of computer hardware and digital devices but today the most common way to execute WebAssembly code is from browsers.

In a nutshell the comparison will be done using the following points of interest and the next image is summarizing this:

  • Executable Code
  • Programming Languages
  • ToolChains/Compilers
  • Execution Environment

Executable Code

Both technologies, WebAssembly and Java have the notion of executable code.

In Java this is called bytecode and is part of the JVM specification, see See Chapter 4. The class File Format and Chapter 6. The Java Virtual Machine Instruction Set of the The Java Virtual Machine Specification.

In WebAssembly this is called WASM. Actually there are 2 formats; a binary format and a text, human readable format called WAT (WebAssembly Text).

Java bytecode and WebAssembly (WASM) are both low-level, platform-independent binary formats but there are some notable differences:

  • Java bytecode is strongly typed. It has a well-defined type system that enforces type safety. WebAssembly is designed with a more loosely typed system. It operates on a set of basic value types, including integers, floats, and vectors.
  • Java bytecode has built-in support for object-oriented programming features, including classes, interfaces, and inheritance. WebAssembly is more low-level compared to Java bytecode and lacks the rich type system found in Java bytecode.
  • Java bytecode runs in the Java Virtual Machine (JVM), which manages memory automatically, including garbage collection. WebAssembly provides a linear memory model, which is essentially a resizable array of bytes. It allows more direct memory access and manipulation.
  • The JVM abstracts the memory management, making it relatively opaque to developers. In WebAssembly the developers have explicit control over memory allocation and deallocation making it potentially more error-prone.

Programming Language

To develop applications, Java developers have to use the Java language. In contrast, WebAssembly is intentionally crafted to serve as a versatile and language-agnostic platform suitable for a broad spectrum of programming languages.

WebAssembly supports an array of programming languages, including but not limited to C/C++, R, TypeScript (using the AssemblyScript language), Scala, Kotlin, and even Java.

Furthermore,WebAssembly offers a human-readable text format known as WAT. It is designed to be a more readable and writable representation of WebAssembly code compared to the binary format.

ToolChains/Compilers

In order to transform the Java source code into bytecode, the Java developers are using a compiler. The WebAssembly have a similar concept; compilers or toolchains to transform the source code into wasm. Here are a few examples of toolschains:

  • wat2wasm – a command-line tool provided by the WABT (WebAssembly Binary Toolkit)  and its purpose is to convert WebAssembly Text Format code to the binary WebAssembly format (Wasm). The WAT also includes an wasm2wat tool which converts Wasm to Wat.
  • emscriptem – an open-source compiler toolchain that translates C and C++ code into WebAssembly (Wasm) or JavaScript.
  • wasm-pack – to generate WebAssembly from Rust language.
  • AssemblyScript – is a subset of TypeScript specifically designed for WebAssembly.
  • TeaVM – an ahead-of-time compiler for Java bytecode that emits JavaScript and WebAssembly that runs in a browser. Moreover, the source code is not required to be Java, so TeaVM successfully compiles Kotlin and Scala.

Execution Environment

In the Java case the execution environment is the Java Virtual Machine.In the case of WebAssembly there are multiple ways to execute an application.

The initial execution environment for which  WebAssembly was created is the browser. All the modern browsers are offering support for WebAssembly execution; the execution performance is near-native.

Running WebAssembly on browsers have a few constraints:

  • WebAssembly runs in a sandboxed environment within the browser for security reasons. While this is generally beneficial, it also imposes restrictions on certain operations, such as direct access to the DOM or file system. Interactions with the browser environment are typically done through JavaScript
  • WebAssembly modules cannot directly access browser APIs. Interactions with the DOM, events, and other browser features are typically done through JavaScript, requiring careful coordination between the two.
  • Browsers impose memory constraints on WebAssembly applications to ensure a secure and stable user experience. The memory allocated to a WebAssembly module is limited, and exceeding these limits can result in termination of the module.
  • Loading and parsing WebAssembly modules can take time, especially for larger applications. The initial loading time may be impacted, affecting the user experience.

Node.js has support for WebAssembly on the server side through the wasm module. This module allows you to load and interact with WebAssembly modules directly in your Node.js applications.

Last but not least, the WebAssembly Working Group, which is a part of the World Wide Web Consortium (W3C) created WebAssembly System Interface (WASI). The goal of WASI is to provide a standardized set of interfaces that allows WebAssembly modules to interact with the host environment in a secure, and platform-independent manner.

WASI defines a system interface that includes a set of system calls, similar to traditional operating system interfaces. The standard also provides a sandboxed execution environment for WebAssembly modules, ensuring that they have limited and controlled access to the host system.

WASI aims to be platform-independent, allowing WebAssembly modules to run on different operating systems without modification. This is achieved by defining a standardized set of system calls that abstract away the specifics of the underlying host system.

Various WebAssembly execution environments, also known as runtimes, are incorporating the WebAssembly System Interface (WASI). Notable examples include wasmtime, a standalone WebAssembly runtime developed by the Bytecode Alliance; lucet-WASI, a high-performance WebAssembly compiler and runtime created by Fastly; and  wasi-libc, serving as the WASI Reference Implementation.

It’s worth mentioning that Docker started implementing WASI last year, enabling native execution of WebAssembly (wasm) files. For additional information, you can refer to the details provided in the announcement of Docker+Wasm Technical Preview 2.

Lessons learned from using Jenkins on containers a.k.a CloudBees CI

Recently, I encountered some issues with a (Jenkins) declarative pipeline running on CloudBees CI, specifically for a Python project. The CloudBees CI instance was operating within an OpenShift Platform (OCP) cluster, and I was employing the Kubernetes plugin for managing agents/slaves.

Here are the lessons learned, some of them linked to OCP/K8S, some linked to CloudBees CI, some linked to Python and some other linked to all this technologies put together:

  1. Make sure that the application packaged as image is running properly when executed from a plain Docker/Podman system.
  2. The container/s will be run into a pod so the (Dockerfile) WORKDIR instruction (if defined) in the container/s will be ignored. If you need to define a working directory then you should specify this into the pod definition via pod.spec.template.spec.containers.workingDir.
  3. The default working directory for the running container/s will be (Jenkins) ${env.WORKSPACE}.
  4. If the pipeline code is fetched from a Git repository then the repository will be automatically mapped as volume inside the container/s under the folder ${env.WORKSPACE}
  5. For container/s running Python the sys.path variable will be: ‘ ‘, ${env.WORKSPACE}, container.PYTHONPATH  in this specific order, where ‘ ‘ is the current directory which by default will be ${env.WORKSPACE} (see point number 3). So the default sys.path will be: ${env.WORKSPACE}, ${env.WORKSPACE}, container.PYTHONPATH 
  6. The (nasty) side effect of the previous point is that if there are Python modules in Git repository (which is mounted in the container, see point number 4) and in the container/s, having the same name, then the module/s from the Git repository will be used for execution and not the one from container/s.
  7. If you want to revert the situation from previous point the only way is to play on the first element of the sys.path variable which will always be the current directory (‘ ‘). If the first element of the sys.path is / then the container/s modules will be used for execution instead of the Git repository modules.
  8. The sys.path variable is computed at runtime by the Python interpreter so it cannot be modified in advance (like the PYTHONPATH environment variable) prior to the execution of a program.  

 

How to properly use (Java) Text Blocks with String.format

Introduction

As of Java 15 there is a new feature called Text Block (also sometimes called Multi-Line Strings). The Text Blocks can be used by declaring the string with “””:

String multiline = """
                line1
                line2
                """;

Since Java 1.5 the String class have a format method.Java’s String.format() is a static method that returns a formatted String using the given locale, format String, and arguments.

Problem

It is a bad practice (see SpotBugs FS: Format string should use %n rather than \n) to use platform specific <EOL>character/s within strings to be formatted. For example if your string to be formatted contains Linux EOL character (\n) it might be wrongly interpreted if the code is executed on Windows platform on which the EOL character is \r\n.

In format strings, it is generally preferable to use %n, which will produce the platform-specific line separator at runtime.

Now, the Text Blocks will have multiple lines so what is the right way to still use multi-line strings and have a portable format strings ?

Solution

  • use %n format specifier to represent a newline character
  • use \ escape character so that the new lines inserted by the IDE are ignored.The \<line-terminator> escape sequence explicitly suppresses the inclusion of an implicit new line character.
 String multiline = """
                line1%n\
                line2%n\
                """;