(My) OWASP Belgium Chapter meeting notes

These are my notes of OWASP Belgium Chapter meeting of 16th of June.

OWASP Summit 2017 debrief

The talk was a debrief about the OWASP Summit 2017 which was held in London; more than 200 participants, 176  working sessions, 6 rooms. To see all the outcomes of the summit you can check the Summit Outcomes.

Some info about some of the discussed topics:

  • OWASP Top 10 2017
    • discussions about the process
    • have a broader audience, not developers only
    • more can be found here.
  • mobile security testing guide
    • guide updated
    • new content added; more topics like the best practices for use of OAUTH2 (??)
    • more can be found here.
  • define agile security practices
    • participants redefined the session goals to discuss security practices for agile development teams.
  • SAMM 2
    • more can be found here.
  • app sec education
    • what is the perfect/best curriculum to teach app sec at school.
  • security GitHub integration
    • drafted a letter to be able to  reach out github with a request for comment.
    • more can be found here.
  • threat modeling (TM) sessions
    • OWASP wants to be more visible on threat modeling.
    • TM OWASP pages revamp
    • TM templates
    • TM iot devices
    • TM diagram techniques
    • TM cheat sheets & lightweight TM
    • new slogan: “The sooner the better, never too late”
  • OWASP playbook series
    • actionable consistent process to getting started with various application security topics.
    • more can be found here, here and here.
  • OWASP Testing guide v5

Threat modeling lessons from Star Wars

This is an introductory talk about threat modeling having as goal to demystify the threat modeling is hard and can be done only by very smart/trained people.

You can start to threat model by answering 4 questions:

  1. What are you building?
    • You must represent/draw somehow the item that you want to build.
    • The DFDs (data flow diagrams) are the most common way to represent the system under build but other options are available like Swim Lanes diagrams.
    • You can use any kind of diagram that fits your needs.
  2. What can go wrong?
    • Find the threats using STRIDE, Attack Trees, CAPEC Kill chain, Check Lists.
    • A small introduction to STRIDE mnemonics was done.
  3. What are you going to do about it?
  4. Did you do an acceptable job at 1-3?

The second part of the talk was called “Top 10 lessons” and actually contained a list of 10 misconceptions about the threat modeling:

  1. Think like an attacker
    • it is very difficult to think like an attacker doesn’t help you to know what you have to do.
  2. You’re never done threat modeling
    • the 4 states of a threat modeling:
      • model
      • identify threats
      • mitigate
      • validate
  3. The way to threat model is…
    • should focus on what delivers value by helping people find good threats
    • for each threat modeling phase (model, identify, mitigate, validate) there are different techniques to do the job.
  4. Threat modeling as one skill
    • there are different techniques : DFDs , Attack trees, etc…
  5. Threat modeling is born not taught
    • threat modeling is like playing a violin; you need to train yourself and you will not be able to play correctly from the beginning.
    • practice, practice, practice
  6. The wrong focus
    • focus on the software being build not on the assets that you want to protect or by thinking about your attackers.
  7. Threat modeling is for specialists
    • threat modeling should be like version control, anyone can and should threat model.
  8. Threat modeling without context
    • see threat modeling not in a vacuum but as part of a chain, that can help different teams (dev team, operations team) to fix (security) problems.
  9. Laser like focus on threats
    • requirements drive threats.
    • threats expose requirements.
    • threats needs mitigations.
    • un-mitigatable threats drive requirements.
  10. Threat modeling at the wrong time
    • you must start threat modeling early.

Main take-aways: anyone can threat model and should; all the necessary technique can be learned.

5 (software) security books that every (software) developer should read

I must admit that the title is a little bit catchy; a better title would have been “5 software security books that every developer should be aware of“. Depending on your interest you might want to read entirely these books or you could just know that they exists. There must be tons of software security books on the market but this is my short list of books about software security that I think that each developer that is interested in software security should be aware of.

Hacking – the art of exploitation This book explains the basics of different hacking techniques, especially the non-web hacking techniques: how to find vulnerabilities (and defend against)  like buffer overflow or stack-based buffer overflow , how to write shellcodes, some basic concepts on cryptography and attacks linked to the cryptography like the man-in-the-middle attack of an SSL connection. The author tried to make the text easy for non-technical peoples but some programming experience is required (ideally C/C++) in order to get the best of this book. You can see my full review of the book here.

Iron-Clad Java: Building secure web applications This book presents the hacking techniques and the countermeasures for the web applications; you can see this books as complementary of the previous one; the first one contains the non-web hacking techniques, this one contains (only) web hacking techniques; XSS, CSRF, how to protect data at rest, SQL injection and other types of injections attacks. In order to get the most of the book some Java knowledge is required. You can see my full review of the book here.

Software Security-Building security in  This books explains how to introduce the security into the SDLC; how to introduce abuse cases and security requirements in the requirements phase, how to introduce risk analysis (also known as Threat Modeling) in the design phase and software qualification phase. I really think that each software developer should at least read the first chapter of the book where the authors explains why the old way of securing application (seeing the software applications as “black boxes” than can be protected using firewalls and IDS/IPS) it cannot work anymore in the today software landscape. You can see my full review of the book here: Part 1, Part 2 and Part 3.

The Tangled Web: A Guide to Securing Modern Web Applications This is another technical book about security on which you will not see a single line of code (the Software Security-Building security in is another one) but it highly instructive especially if you are a web developer. The book presents all the “bricks” of the today Internet: HTTP, WWW, HTML, Cookies, Scripting languages, how these bricks are implemented in different browsers and especially how the browsers are implementing the security mechanism against rogue applications. You can see my full review of the book here.

Threat modeling – designing for security Threat modeling techniques (also known as Architectural Risk Analysis) were around for some time but what it has changed in the last years is the accessibility of these technique for the software developers.  This book is one of the reasons for which the threat modeling is accessible to the developers. The book is very dense but it  suppose that you have no knowledge about the subject. If you are interested in the threat modeling topic you can check this ticket: threat modeling for mere mortals.

(My) OWASP Belgium Chapter meeting notes

These are my notes of OWASP Belgium Chapter meeting of 29th of May.

HTTP for the good or the bad

The talk was about the (mostly php) webshells and how the bad guys are using it.

(Webshels) common features :

  • file manipulation
  • system command execution
  • DB administration
  • network scanning

How the bad guys are trying to protect the access to the webshell url once is installed on the vulnerable servers:

  • obfuscation
  • use random get parameters
  • use the .httpaccess file – use the
  • user agent
  • fully qualified domain names
  • (HTTP) referrer header
  • custom HTTP headers – use custom HTTP header to grant access to the webshell url.
  • fake arguments
  • IP geolocalisation – used an external service to geolocalize the connected client.
  • black listed IPs – use the (black) list of IPs from which the client cannot connect.

(Common) mistakes made by the webshell developers:

  • use deprecated functions.
  • all of them are suffering from the XSS vulnerabilities (but are hard to be exploited).
  • no httpOnly cookies.
  • weak authentication; no password protection against brute-force attack.
    • the check of th password is done via a hash check (very often the real password is in the code as comment).

 

Panopticon – a cross-patform dissambler

Panapticon goals:

  • disassemble the code
  • do a static analysis of the code
  • have a very user friendly UI.

Panapticon “special” features:

  • semantic-based analysis; approximative what happens at run time without executing the code.
  • display, compare and run execution traces.
  • scripting support:Ruby/Python/Js

Book review : The Tangled Web: A Guide to Securing Modern Web Applications

This is a review of the The Tangled Web: A Guide to Securing Modern Web Applications book.tangledwebbook

(My) Conclusion

This books makes a great job explaining how the “bricks” of the Internet (HTTP, HTML, WWW, Cookies, Script Languages) are working (or not) from the security point of view. Also a very systematic coverage of the browser (in)security is done even if some of the information it starts to be outdated. The book audience is for web developers that are interested in inner workings of the browsers in order to write more secure code.

Chapter 1 Security in the world of web applications

This goal of this chapter is to set the scene for the rest of book. The main ideas are around the fact that security is non-algorithmic problem and the best ways to tackle security problems are very empirical (learning from mistakes, develop tools to detect and correct problems, and plan to have everything compromised).

Another part of the chapter is dedicated to the history of the web because for the author is very important to understand the history behind the well known “bricks” of the Internet (HTTP, HTML, WWW) in order to understand why they are completely broken from the security point of view. For a long time the Internet standards evolutions were dominated by vendors or stakeholders who did not care much about the long-term prospects of technology; see the Wikipedia Browser Wars page for a few examples.

Part I Anatomy of the web (Chapters 2 to 8)

The first part of the book is about the buildings blocks of the web: the HTTP protocol, the HTML language, the CSS, the scripting languages (JavaScript, VBScript) and the external browser plug-ins (Flash, SilverLight). For each of these building blocks, the author presents how are implemented and how are working (or not) in different browsers, what are the standards that supposed to drive the development and how these standards are very often incomplete or oblivious of security requirements.

In this part of the book the author speaks only briefly about the security features, knowing that the second part if the book supposed to be focused on security.

Part II Browser security features (Chapter 9 to 15)

The first security feature presented is the SOP (Same Policy Origin), which is also the most important mechanism to protect against hostile applications. The SOP behaviour is presented for the DOM documents, for XMLHttpRequest, for WebStorage and how the security policies for cookies could also impact the SOP.

A less known topic that is treated is the SOP inheritance; how the SOP is applied to pseudo-urls like about:, javascript: and data:. The conclusion is that each browser are treating the SOP inheritance on different ways (which can be incompatible) and it is preferable to create new frames or windows by pointing them to a server-suplied blank page with a definite origin.

Another less known browser features (that can affect the security) are deeply explained; the way the browsers are recognizing the content of the response (a.k.a content sniffing), the navigation to sensitive URI schemes like “javascript:”, “vbscript:”, “file:”, “about:”, “res:” and the way the browsers are protecting itself against rogue scripts (in the case of the rogue scripts protection the author is pointing the inefficiently  of the protections).

The last part is about different mechanisms that browsers are using in order to give special privileges to some specific web sites; the explained mechanisms are the form-based password managers, the hard-coded domain names and the Internet Explorer Zone model.

Part III Glimpse of things to come (Chapter 16 to 17)

This part is about the developments done by the industry to enhance the security of the browsers,

For the author there are two ways that the browser security could evolve; extend the existing frameworks/s or try to restrict the existing framework/s by creating new boundaries on the top of existing browser security model.

For the first alternative, the following solutions are presented: the W3C Cross-Origin Resource Sharing specification , the Microsoft response to CORS called XDomainRequest  (which by the way was deprecated by Microsoft) and W3C Uniform Messaging Policy.

For the second alternative the following solutions are presented: W3C (former Mozilla) Content Security Policy , (WebKit) Sandboxed frames and Strict Transport Security.

The last part is about how the new planned APIs and features could have impact on the browser and applications security. Very briefly are explained the “Binary HTTP”, WebSocket (which was not yet a standard when the book was written), JavaScript offline applications, P2P networking.

Chapter 18 Common web vulnerabilities

The last chapter is a nomenclature of different known vulnerabilities grouped by the place where it can happen (server side, client side). For each item a brief definition is done and links are provided towards previous chapters where the item has been discussed.

(My) CSSLP Notes – Secure Software Concepts

Note: This notes were strongly inspired by the following book: CSSLP Certification All in one.

General Security Concepts

BasicsCSSLP-logo

The security of IT systems can be defined using the following attributes:

  • confidentiality – how the system prevents the disclosure of information.
  • integrity – how the system protects data from the unauthorized access.
  • availability – access to the system by authorized personnel.
  • authentication – process of determining the identity of a user. Three methods can be used to authenticate a user:
    • something you know (ex: password, pin code)
    • something you have (ex: token, card)
    • something you are (ex: biometrics mechanisms)
  • authorization – process of applying access control rules to a user process to determine if a particular user process can access an object.
  • accounting (auditing) – records historical events on a system.
  • non-repudiation – preventing a subject from denying a previous action with an object in a system.

System principles

  • session management – design and implementation of controls to ensure that the communications channels are secured from unauthorized access and disruption of communications.
  • exception management – the process of handling any errors that could appear during the system execution.
  • configuration management – identification and management of the configuration items (initialization parameters, connection strings, paths, keys).

Secure design principles

  • good enough security – there is a trade off between security and other aspects associated with a system. The level of required security must be determined at design time.
  • least privilege – a subject should have only the necessary rights and privileges to perform a specific task.
  • separation of duties – for any given task, more than one individual needs to be involved.
  • defense in depth (layered security) – apply multiple dissimilar security defenses.
  • fail-safe – when a system experience a failure, it should fail to a safe state; all the attributes associated with the system security (confidentiality, integrity, availability) should be appropriately maintained.
  • economy of mechanism – keep the design of the system simple and less complex; reduce the number of dependencies and/or services that the system needs in order to operate.
  • complete mediation – checking permission each time subject requests access to objects.
  • open design – design is not a secret, implementation of safeguard is. (ex: cryptography algorithms are open but the keys used are secret)
  • least common mechanism – minimize the amount of mechanism common to more than one user and depended on by all users. Every shared mechanism (especially one involving shared variables) represents a potential information path between users and must be designed with great care to be sure it does not unintentionally compromise security.
  • psychological acceptability – accessibility to resources should not be inhibited by security mechanisms. If security mechanisms hinder the usability or accessibility of resources, then users may opt to turn off those mechanisms.
  • weakest link – attackers are more likely to attack a weak spot in a software system than to penetrate a heavily fortified component.
  • leverage existing components – component reuse have many advantages, including the increasing of efficiency and security. From the security point of view the component reuse is reducing the attack surface.
  • single point of failure – a system design should not be susceptible to a single point of failure.

Security Models

Access Control Models

Access controls define what actions a subject can perform on specific objects.

  • Bell-LaPadula confidentiality model – It is focused on maintaining the confidentiality of objects. Bell-LaPadula operates by observing two rules: the Simple Security Property and the * Security Property.
    • The Simple security property states that there is “no read up:” a subject at a specific classification level cannot read an object at a higher classification level.
    • The * Security Property is “no write down:”a subject at a higher classification level cannot write to a lower classification level.
  • Take-Grant  – systems specify the rights that a subject can transfer to a from another subject or object. The model is based on representation of the controls in forms of directed graphs with the vertices being the subjects and the objects. The edges between them represent the right between the subject and objects. The representation of rights takes the form of {t (take), g (grant), r (read), w (write)}.
  • Role-based Access control – users are assign a set of roles they may perform. The roles are associated to the access permissions necessary to perform the tasks.
  • MAC (Mandatory Access Control) Model – in MAC systems the owner or subject cannot determine whether access is to be granted to another subject; it is the job of the operating system to decide.
  • DAC (Discretionary Access Control) Model – in DAC systems the owner of an object can decide which other subjects may have access to the object what specific access they may have.

Integrity Models

  • Biba integrity model  – (sometimes referred as Bell-LaPadula upside down) was the first formal integrity model.  Biba is the model of choice when integrity protection is vital. The Biba model has two primary rules: the Simple Integrity Axiom and the * Integrity Axiom. 
    • The Simple Integrity Axiom is “no read down:”a subject at a specific classification level cannot read data at a lower classification. This protects integrity by preventing bad information from moving up from lower integrity levels.
    • The * Integrity Axiom is “no write up:”a subject at a specific classification level cannot write to data at a higher classification. This protects integrity by preventing bad information from moving up to higher integrity levels.
  • Clark-Wilson  –  (this is an informal model) that protects integrity by requiring subjects to access objects via programs. Because the programs have specific limitations to what they can and cannot do to objects, Clark-Wilson effectively limits the capabilities of the subject.Clark-Wilson uses two primary concepts to ensure that security policy is enforced; well-formed transactions and Separation of Duties.

Information Flow Models

Information in a system must be protected when at rest, in transit and in use.

  • The Chinese Wall model – designed to avoid conflicts of interest by prohibiting one person, such as a consultant, from accessing multiple conflict of interest categories (CoIs). The Chinese Wall model requires that CoIs be identified so that once a consultant gains access to one CoI, they cannot read or write to an opposing CoI.

 

Risk Management

Vocabulary

  • risk – possibility of suffering harm or loss
  • residual risk – risk that remains after a control was added to mitigate the initial risk.
  • total risk – the sum of all risks associated with an asset.
  • asset – resource an organization needs to conduct his business.
  • threat – circumstance or event with the potential to cause harm to an asset.
  • vulnerability – any characteristic if an asset that can be exploited by a threat to cause harm.
  • attack – attempting to use a vulnerability.
  • impact – loss resulting when a threat exploits a vulnerability.
  • mitigate – action taken to reduce the likelihood of a threat.
  • control – measure taken to detect, prevent or mitigate the risk associated with a threat.
  • risk assessment – process of identifying risks and mitigating actions.
  • qualitative risk assessment – subjectively determining the impact of an event that effects assets.
  • quantitative risk assessment –  objectively determining the impact of an event that effects assets.
  • single loss expectation (SLE) – linked to the quantitative risk assessment, it represents the monetary loss or impact of each occurrence of a threat.
    • SLE = asset value * exposure factor
  • exposure factor – linked to the quantitative risk assessment, is a measure of the magnitude of a loss.
  • annualized rate of occurrence (ARO) – linked to the quantitative risk assessment, is the frequency with an event is expected to occur on an annualized basis.
    • ARO = number of events / number of years
  • annualized loss of expectancy (ALE) – linked to the quantitative risk assessment, it represents how much an event is expected to cost per year.
    • ALE = SLE * ARO

Types of risks:

  • Business Risks:
    • fraud
    • regulatory
    • treasury management
    • revenue management
    • contract management
  • Technology Risks:
    • security
    • privacy
    • change management

Types of controls

Controls can be classified on types of actions they perform. Three classes of controls exist:

  • administrative
  • technical
  • physical

For each of these classes, there are four types of controls:

  • preventive (deterrent) – used to prevent the vulnerability
  • detective – used to detect the presence of an attack.
  • corrective (recovery) – correct a system after a vulnerability is exploited and an impact has occurred; backups are  a common form of corrective controls.
  • compensation – designed to act when a primary set of controls has failed.

Risk management models

General risk management model

The steps contained in a general risk management model:

  1. Asset identification – identify and clarify all the assets, systems and processes that need to be protected.
  2. Threat assessment – identify the threats and vulnerabilities associated with each asset.
  3. Impact determination and qualification
  4. Control design and evaluation – determine which controls to put in place to mitigate the risks.
  5. Residual risk management – evaluate residual risks to identify where additional controls are needed.

Risk management model proposed by Software Engineering Institute

SEI model steps :

  1. Identity – enumerate potential risks.
  2. Analyze – convert the risk data gather into information that can be used to make decisions.
  3. Plan – decide the actions to take to mitigate them.
  4. Track – monitor the risks and mitigations plans.
  5. Control – make corrections for deviations from the risk mitigation plan.

Security Policies and Regulations

One of the most difficult aspects of prosecution of computer crimes is attribution. Meeting the burden of proof requirement in criminal proceedings, beyond a reasonable doubt, can be difficult given an attacker can often spoof the source of the crime or can leverage different systems under someone else’s control.

Intellectual property

Intellectual property is protected by the U.S law under one of four classifications:

  • patents – Patents provide a monopoly to the patent holder on the right to use, make, or sell an invention for a period of time in exchange for the patent holder’s making the invention public.
  • trademarks – Trademarks are associated with marketing: the purpose is to allow for the creation of a brand that distinguishes the source of products or services.
  • copyrights – represents a type of intellectual property that protects the form of expression in artistic, musical, or literary works, and is typically denoted by the circle c symbol. Software is typically covered by copyright as if it were a literary work. Two important limitations on the exclusivity of the copyright holder’s monopoly exist: the doctrines of first sale and fair use. The first sale doctrine allows a legitimate purchaser of copyrighted material to sell it to another person. If the purchasers of a CD later decide that they no longer cared to own the CD, the first sale doctrine gives them the legal right to sell the copyrighted material even though they are not the copyright holders.
  • trade secrets – business-proprietary information that is important to an organization’s ability to compete. Software source code or firmware code are examples of computer-related objects that an organization may protect as trade secrets.

Privacy and data protection laws

Privacy and data protection laws are enacted to protect information collected and maintained on individuals from unauthorized disclosure or misuse.

Several important pieces of privacy and data protection legislation include :

  • U.S. Federal Privacy Act of 1974 – protects records and information maintained by U.S. government agencies about U.S. citizens and lawful permanent residents.
  •  U.S. Health Insurance Portability and Accountability Act (HIPAA) of 1996 – seeks to guard protected health information from unauthorized use or disclosure.
  • Payment Card Industry Data Security Standard (PCI-DSS) – the goal is to ensure better protection of card holder data through mandating security policy, security devices, control techniques and monitoring of systems and networks.
  • U.S. Gramm-Lech-Bliley Financial Services Modernization Act (GLBA) – requires financial institutions to protect the confidentiality and integrity of consumer financial information.
  • U.S. Sarbanes-Oxley Act of 2002 (SOX) – the primary goal of SOX is to ensure adequate financial disclosure and financial auditor independence.

Secure Software Architecture – Security Frameworks

  • COBIT (Control Objectives for Information and Related Technology)– assist management in bringing the gap between control requirements, technological issues and business risks.
  • COSO (Committee of Sponsoring Organizations of the Treadway Commission) – COSO has established a Enterprise Risk Management -Integrated Framework against which companies and organizations may assess their control systems.
  • ITIL (Information Technology Infrastructure Library) – describes a set of practices focusing on aligning IT services with business needs.
  • SABSA (Sherwood Applied Business Security Architecture) – framework and methodology for developing risk-driven enterprise information security architecture.
  • CMMI (Capability Maturity Model Integration) – process metric model that rates the process maturity of an organization on a 1 to 5 scale.
  • OCTAVE (Operationally Critical Threat, Asset and Vulnerability Evaluation) – suite of tools, techniques and methods for risk-based information security assessment.

 

Software Development Methodologies

Secure Development Lifecycle Components

  • software team awareness and education – all team members should have appropriate training. The key element of team awareness and education is to ensure that all the members are properly equipped with the correct knowledge.
  • gates and security requirements – the term gates it signifies a condition that one must pass through. To pass the security gate a review of the appropriate security requirements is conducted.
  • threat modeling – design technique used to communicate information associated with a threat throughout the development team (for more infos’ you could check my other ticket : threat modeling for mere mortals).
  • fuzzing – a test technique where the tester applies a series of inputs to an interface in an automated fashion and examines the output for undesired behaviors.
  • security reviews – process to ensure that the security-related steps are being carried out and not being short-circuited.

Software Development Models

  • waterfall model – is a linear application development model that uses rigid phases; when one phase ends, the next begins.
  • spiral model – repeats steps of a project, starting with modest goals, and expanding outwards in ever wider spirals (called rounds). Each round of the spiral constitutes a project, and each round may follow traditional software development methodology such as Modified Waterfall. A risk analysis is performed each round.
  • prototype model – working model of software with some limited functionality. Prototyping is used to allow the users evaluate developer proposals and try them out before implementation.
  • agile model
    • Scrum  – contain small teams of developers, called the Scrum Team. They are supported by a Scrum Master, a senior member of the organization who acts like a coach for the team. Finally, the Product Owner is the voice of the business unit.
    • Extreme Programming (XP) – method that uses pairs of programmers who work off a detailed specification.

Microsoft Security Development Lifecycle

SDL is software development process designed ti enable development teams to build more secure software and address security compliance requirements.

SDC is build around the following three elements:

  • (security) by design – the security thinking is incorporated as part of design process.
  • (security) by default – the default configuration of the software is by design as secure as possible.
  • (security) in deployment – security and privacy elements are properly understood and managed through the deployment process.

SDL components:

  • training   security training for all personnel, targeted to their responsibility associated with the development effort.
  • requirements
    • establishment of the security and privacy requirements for the software.
    • creation of quality gates ans bug bars. Defining minimum acceptable levels of security and privacy quality at the start helps a team understand risks associated with security issues, identify and fix security bugs during development, and apply the standards throughout the entire project.Setting a meaningful bug bar involves clearly defining the severity thresholds of security vulnerabilities (for example, no known vulnerabilities in the application with a “critical” or “important” rating at time of release) and never relaxing it once it’s been set.
    • development of security and privacy risk assessment. Examining software design based on costs and regulatory requirements helps a team identify which portions of a project will require threat modeling and security design reviews before release and determine the Privacy Impact Rating of a feature, product, or service.
  • design – establish design requirements, perform attack/surface analysis/reduction and use threat modeling.
  • implementation – application of secure coding practices and the use of static program checkers to find common errors.
  • verification – perform dynamic analysis (tools that monitor application behavior for memory corruption, user privilege issues, and other critical security problems), fuzz testing and conduct attack surface review.
  • release – conduct final security review and create an incident response plan.
  • response – execute incident response plan.

How to remotely connect to an in-memory HSQLDB database

hsqldbVery often in-memory instances of HSQLDB are used in the context of unit tests; the unit test starts a database instance (eventually on a random port), provision the database with some data, run the test against the database end then stop it.

Now, from time to time you need to debug the unit tests and sometimes you also need to run manually some queries on the in-memory database (using HSQLDB manager or any other software).

So here are the steps in order to be able to connect to an in-memory HSQLDB instance:

  • Start the DB in the “remote open db” mode.This is driven by the “server.remote_open” property that is false by default (you can look to the code of org.hsqldb.server.ServerProperties class to see other properties that might be interesting). The code that starts a dataabse instacne in remote open mode will look something like:
HsqlProperties props = new HsqlProperties();
props.setProperty("server.remote_open", true);
....add more properties here
Server server = new Server();
server.setProperties(props);
server.start();
  • Connect to the data base instance. Now that the database has started the next step is to connect to data base instance. Because we started an in-memory instance the connection url is a memory database url that looks like jdbc:hsqldb:mem:instanceName You will not be able to connect using this url because neither the host and the port are available, instead you should use a server database url that looks like jdbc:hsqldb:hsql://host:port/instanceName

 

How to find (buggy) calls of java.lang.Object.equals() with incompatible types

Introductionjavabug

All this started from a change (and a mistake) done by one of my co-workers that did a small change in the code with rather dramatic consequences;

So, imagine we have a (java) class T having a private field f of type String:

public class T {
    private Boolean f;

    public void setF(final Boolean f) {
        this.f = f;
    }
    public Boolean getF() {
        return f;
    }
    .... 
}

Now, imagine (again) that the type of f field will change from String to something else, let’s say Boolean; the compiler will help you to find the places where the setF method is called with the wrong parameter and also where the getF method do not comply with the new signature.

But there is at least one case where the compiler cannot help you; it is the case of the call of java.lang.Object.equals method. So something like “aString”.equals(t.getF()) in the case when the f field is a boolean will always return false because we try to check the equality for incompatible types.

The goal of this ticket is to find solutions to catch this kind of problem at runtime or (ideally) at compile time. Anyway, if you want to catch this kind of problem in your code and you do not want to reinvent the wheel (as I will do in this ticket), just use the FindBugs. FindBugs is capable to catch this kind of error (EC_UNRELATED_TYPES) and many, many more (see FindBugs Bug Descriptions).

How to find the buggy calls at runtime

The only way that I see to catch the buggy equals calls in a generic way is to use AOP. The basic idea is to write an aspect that will intercept all the calls to the equals methods and check that the caller and the callee are of the same type.

For the implementation I choose to use AspectJ framework (for a very good introduction to AspectJ I strongly recommend the AspectJ in action book) and the code it looks like this:

import org.aspectj.lang.annotation.Aspect;
import org.aspectj.lang.annotation.Before;
import org.aspectj.lang.annotation.Pointcut;
import org.github.adriancitu.equals.WrongUseOfEqualsException;

@Aspect
public class EqualsCheckerAspect {
    @Pointcut(
            "execution(public boolean Object+.equals(Object)) "
            + "&& target(target) "
            + "&& args(methodArgument)"
            + "&& !within(EqualsCheckerAspect)")
    public void executeEqualsPointcut(
            final Object target, 
            final Object methodArgument) {
    }
    
    @Before("executeEqualsPointcut(target,methodArgument)")
    public void executeEqualsAdvice(
            final Object target, 
            final Object methodArgument) throws Exception {
       
        if (target != null && !target.getClass().equals(methodArgument.getClass())) {
            throw new WrongUseOfEqualsException("Tried to call the equals methods on different classes types");
        }
    }
}

I will try briefly to explain what the previous code is doing; this is not a tutorial about AspectJ or AOP so, some of the definitions will be a little bit approximative, so AOP purists please forgive me.

The aspect (which is a Java class annotated with the @Aspect annotation) contains 2 parts; the pointcut (the method annotated with the @Pointcut annotation) and the advice (the method annotated with the @Before annotation).

The pointcut represents the parts of program flow that we want to intercept; in our case we want to intercept all the calls to the method having the following signature boolean equals(Object) on any instance of java.lang.Object class and on any subclasses (that’s the meaning of the + from the ..boolean Object+.equals). The  target and args are just AspectJ structures that helps to capture the caller and the callee of the equals method.

The advice represents the code that will be executed when a poincut is intercepted. In our  case we would like to do the verification that the caller and the callee are of the same type before the execution of the equals method (this is the meaning of the @Before annotation). The advice code represented by the executeEqualsAdvice method is quite simple, it just retrieve the caller and the callee objects (already captured by the target and args) and check that are instances of the same class.

This solution will fix the problem but it have some drawbacks:

  • the check is done at runtime
  • you have a dependency on AspectJ

How to find the buggy calls at compile time

It would be really nice to be able to catch this problem directly at compile time and eventually (not mandatory) having no other technical dependency.

Our salvation is coming from the javac plug-in mechanism which is new in Java8. The javac plug-in mechanism allows to a user to specify one or more plug-ins on the javac command line, to be started soon after the compilation has begun. There is no official tutorial about how to craft and use a javac plug-in, the only information that I’ve found is this Javadoc link and this blog ticket.

The javac plug-in mechanism gives access to the abstract syntax tree of the compiled program by implementing the visitor pattern so we will use this mechanism to catch and check the executions of the equals methods. The entire code of the plug-in can be found on this GitHub repository.

The steps of crafting a (javac) plug-in are the following ones:

  1. The entry point of the plug-in should implement the com.sun.source.util.Plugin interface. The code of the class can be found here : RuntimeEqualsCheckPlugin.java
  2. Create a class that implements the com.sun.source.util.TaskListener interface in order to perform additional behavior after the type-checking phase. The code of the class can be found here: EqualsMethodCallTaskListener.java
  3. Create an abstract syntax tree visitor by extending the com.sun.source.util.TreePathScanner<R,P>
    end extend the behavior for the method invocation. The code for this class can be found here: CodePatternTreeVisitor2.java

The most important part of CodePatternTreeVisitor2.java class is the overridden visitMethodInvocation method. I will show you the code but the most important lesson that I learned is that in the code you have to work with trees, everything is a subtype of the Tree interface.

public Void visitMethodInvocation(MethodInvocationTree methodInvocationTree, Void aVoid) {
  final List<? extends ExpressionTree> arguments = methodInvocationTree.getArguments();
  final ExpressionTree methodSelect = methodInvocationTree.getMethodSelect();
  switch (methodInvocationTree.getKind()) {
      case METHOD_INVOCATION:
        Tree.Kind methodSelectKind = methodSelect.getKind();
        switch (methodSelect.getKind()) {
            case MEMBER_SELECT:
             //t1.equals
             //or
             //field.equals
             MemberSelectTree memberSelectTree = (MemberSelectTree) methodSelect;
             //it's a equals method invocation
             if (isEqualsCall(
                  new TreePath(getCurrentPath(), methodSelect),
                  methodInvocationTree.getArguments() != null ?
                       methodInvocationTree.getArguments().size() : 0)) {
                //t1
                //or
                //field
                ExpressionTree expression = memberSelectTree.getExpression();
                TypeMirror callerType =
                  trees.getTypeMirror(
                            new TreePath(getCurrentPath(), expression));
                Optional argumentType =
                            omputeArgumentType(methodInvocationTree.getArguments());
                if (argumentType.isPresent() 
                      && !callerType.equals(argumentType.get())) {
                      System.err.println("Try to call equals on different parameters at line "
                          + getLineNumber(methodInvocationTree)
                          + " of file " +
                          currCompUnit.getSourceFile().getName()
                          + "; this is a bug!"
                                );
                 }
              }
          }
        }
        return super.visitMethodInvocation(methodInvocationTree, aVoid);
    }

The steps to execute the (javac) plug-in are the following ones:

  1. Set up a file called com.sun.source.util.Plugin located in META-INF/services/ of your plug-in code because the plug-ins is located via java.util.ServiceLoader. The content of the file should contain the name of the plug-in (com.github.adriancitu.equals.com.github.adriancitu.equals.runtime.RuntimeEqualsCheckPlugin in this case).
  2. Create a jar with all the plug-in files and the com.sun.source.util.Plugin file.
  3. Execute your plug-in against the javac using the  -processorpath flag to indicates the path where the plug-in JAR file is located and -Xplugin flag to indicate the name of the plug-in to run. In our case the command will be something like :
    javac -processorpath plugIn.jar
    -Xplugin:RuntimeEqualsCheckPlugin ./NoDependenciesTest.java

So finally was possible to have a compile time solution, the only drawback that I see for the plug-in solution is that the API is not very user friendly.