To make applications available and accessible from almost everywhere, companies deploy their applications on the web. The deployment of an application can vary a lot, but the most common structure for a web application is based on a three-tier architecture as illustrated in Figure 2.1. The first tier is the presentation tier which contains the visual com-ponents rendered by the browser. The logic tier is the second part and contains the application´s business logic. The third tier is the storage tier, where the business logic stores data as needed .
From Figure 2.1 it can be seen that a tier only communicates with the tier closest to themselves. This demands the logic tier to become a safe-guard for the storage tier where valuable and possibly sensitive infor-mation is stored. The sensitive information might, for example, consist of username, email, personal security numbers and credit card infor-mation .
The scope of the thesis lies in the logic tier where both trusted and untrusted data is processed. This is the tier where validation is needed to ensure security. The programming language for the logic tier can vary a lot, but one commonly used and the chosen language for this thesis is Java .
Structured Query Language
Communication between the logic and storage tier is done through a standardized language called Structured Query Language, mostly known as SQL. The SQL is created to manipulate and access databases programmatically. The majority of today’s database uses SQL . The language works by building queries specifying the required informa-tion or task. The query is then evaluated and handled by the SQL en-gine .
Discussions regarding information security often rely on the CIA Triad. The CIA refers to confidentiality, integrity, and availability as displayed in Figure 2.2. Confidentiality ensures that data is only accessed by authorized individuals. Integrity specifies that application data should be accurate and unaltered. While availability is the ability to access the application and application data .
Figure 2.2: An illustration of the CIA Triad, model used when discussing in-formation security.
2.3 Security Vulnerabilities
The organization Open Web Applications Security Project, known as OWASP, is an online community which aims to provide knowledge on how to secure web applications . The OWASP has produced reports about the top ten security risks for web applications, and the latest was published in 2017. The report contains information about the ten most common security risks for the given year. Information such as how the security risk is exploited and possible prevention methods are pre-sented. This thesis will focus on security risk number one and seven from the mentioned report. These two security risks deal with vulner-abilities regarding information disclosure and code injection. The two vulnerabilities are Injection attack and Cross-Site Scripting .
SQL Injection Attacks
The most common security risk is Injection Attacks . An Injection Attack is an attack where the attacker’s input changes the intent of the execution. The typical results of Injection Attacks are file destruction, lack of accountability, denial of access and data loss .
Injection attacks are executed towards a broad set of different areas, but the area discussed and analyzed in this thesis are SQL Injections. The SQL Injections can be divided into two different subgroups. These two subgroups are SQL Injection and Blind SQL Injection .
The SQL Injection occurs when an SQL query is tampered with, result-ing in gaining content or executing a command on the database which was not intended. Listing 2.1 displays an SQL query which is open to SQL Injections. This due to that the variable UserId never is validated before it is propagated into the query [8, 42].
Listing 2.1: Pseudo code acceptable to SQL Injection through malicious usage of userInput.
u s e r I d = userInput
”SELECT␣ ∗ ␣FROM␣ Users ␣WHERE␣ u s e r I d ␣=␣ ” + u s e r I d
The query works as intended if the user input, labeled as userInput, is a valid Integer (since Integer is what we have decided that user id is in the application). An example of malicious usage of user input is 10 or 1 = 1. This input would result in the query seen in Listing 2.2.
Listing 2.2: An example of SQL Injection where the whole Users table is re-turned
SELECT ∗ FROM Users WHERE u s e r I d = 10 or 1 = 1
This query results in an execution that always evaluates to true and therefore returns the whole table of users. This problem can be pre-vented in a couple of different ways. The first possibility is through validation of input, by verifying user input as in Listing 2.3 it is possi-ble to protect the query from being vulnerable to a SQL Injection.
Listing 2.3: An example of SQL Injection prevention through variable saniti-azion.
u s e r I d = userInput
i s I n t e g e r ( u s e r I d )
”SELECT␣ ∗ ␣FROM␣ Users ␣WHERE␣ u s e r I d ␣=␣ ” + u s e r I d
A second common alternative to resolve the attack is to use SQL Pa-rameters which handle the verification for the user. This leaves the verification and validation of input up to the SQL engine. An example written with SQL Parameters can be seen in Listing 2.4.
Listing 2.4: An example of SQL Injection prevention through SQL Parameters. u s e r I d = userInput
sqlQuery = ”SELECT␣ ∗ ␣FROM␣ Users ␣WHERE␣ u s e r I d ␣=␣@0” db . Execute ( sqlQuery , u s e r I d )
Blind SQL Injection
There also exists a blind SQL Injection which is very similar to the SQL Injection. The only difference is that the attacker does not receive the requested information in clear text from the database. The information is instead received by monitoring variables such as how long time the response takes or what kind of error messages it returns. An example of the first kind is an SQL query that tells the SQL engine to sleep de-pending on a condition. An example of this can be seen in Listing 2.5 [8, 42].
Listing 2.5: An example of Blind SQL Injection where query response is de-layed five seconds if a user with id one is in the Users table.
SELECT ∗ FROM Users WHERE u s e r I d = 1 WAITFOR DELAY ’ 0:0:5 ’
The second variant of a Blind SQL Injection is through analyzing error messages and, on what they return, build an image of the targeted data. This is mostly done by testing different combinations of true and false queries [8, 42].
The introduction of the Same-Origin Policy, however, did not stop the attackers. The next wave of attacks was mostly directed towards chat rooms where it was possible to inject malicious Cross-Site Scripts into the message input form. This would then be reflected by the server itself, when displaying the message for other users, and thereby by-passing the Same-Origin Policy .
There are three different types of Cross-Site Scripting. These three are reflected, stored, and DOM-based Cross-Site Scripting.
Reflected Cross-Site Scripting
Reflected Cross-Site Scripting is mainly conducted through a malicious link that a user accesses. The malicious link will exploit a vulnerable input on the targeted web application and through the input reflect malicious content to the user .
Stored Cross-Site Scripting
Stored Cross-Site Scripting means that malicious scripts get stored in the targeted web applications database. This malicious script is then loaded and presented to each user who is trying to access the applica-tion .
DOM-based Cross-Site Scripting
DOM-based Cross-Site Scripting is very similar to Reflected Cross-Site Scripting, but it does not necessarily have to be reflected from the ap-plication server. DOM-based Cross-Site Scripting modifies the DOM tree, and through that, it exploits the user .
We have now presented a set of problems that the Taint tracking, also known as taint analysis, is a tool to combat. The tool analyzes the flow of information in the application . The goal of taint tracking is to prevent possible attacks such as Injection and Cross-Site Scripting by enforcing the usage of sanitizers on all untrusted input data. The Taint tracking can be implemented in two different forms: either static or dy-namic. The static taint tracking is an evaluation tool possible to include in the integrated development environment where it notifies the devel-oper of possible security vulnerabilities. The dynamic taint tracking, on the other hand, is a tool used simultaneously as the application exe-cution. The Dynamic tracking analyses the input data to discover vul-nerabilities at runtime and achieve higher accuracy compared to static tracking. The advantage with the static form is the ability to run before runtime, but its disadvantage is the lower accuracy in tracking of taint.
The Taint trackers operate by tracking untrusted data and acting upon data trying to enter sensitive code areas without first being sani-tized. Perl and Ruby are two programming languages which have been adapted to use taint checking [33, 24]. There are some tools which en-able taint checking for other platforms. One of them is TaintDroid  for the Android platform.
The process of taint tracking consists of four steps which are described in Table 2.1. The first step is to mark all data from untrusted sources as tainted. This is done through a taint flag attached to the variables. Step two is the possibility of detainting data, but this is only done af-ter that the data has been sanitized through predefined sanitizers. The third step is propagating taint where tainted data propagates its tainted flag onto all data it comes in contact with. The fourth and last step is checking the taint flags in areas called sinks which are entry points to sensitive code [32, 49]. The decision of what to do if a tainted vari-able tries to pass through a sink varies depending on the application. However, remedial actions should be conducted. These actions should be, depending on the application owner´s choice, logging the events, throwing an error, or modifying the tainted values into safe predefined values.
Tainting Marking all data from sources as tainted. Detainting Marking all data from sanitizers as non-tainted.
Taint Propagation Propagating taint to all data coming in contact with tainted data.
Assert Non-taint Assert that data passing through sinks are non-tainted.
An example of the taint propagation process can be seen in Listing 2.6. In the example getAttribute is a source, executeQuery is a sink and val-idate is a sanitizer. On line one, the input from the source is flagged as tainted, and the taint propagates onto userId. The sanitizer on line two validates userId and removes the tainted flag. Lastly, the sink on line tree executes the query since the argument is not tainted. If a user sends in a malicious userId containing ”101 OR 1 = 1” the validator would sanitize the String and safely execute the sink command. How-ever, removing line two would result in tainted data entering the sink. Without a dynamic taint tracker this would result in giving the ma-licious user the entire list of users. With a dynamic taint tracker, on the other hand, the result is the sink halting the execution, therefore, preventing unwanted information disclosure.
Listing 2.6: A code example of accurately handling user input before access-
ing sensitive code area.
1 u s e r I d = g e t A t t r i b u t e ( ” u s e r I d ” ) ;
2 v a l i d a t e ( u s e r I d )
3 executeQuery ( ”SELECT␣ ∗ ␣FROM␣ Users ␣WHERE␣ u s e r I d ␣=␣ ”
+ u s e r I d ) ;
Java has been a programming language in use since the early 90’s. The founder’s objective was to develop a new improved programming lan-guage that simplified the task for the developers but still had a familiar C/C++ syntax. . Still today Java is one of the most common pro-gramming languages .
Java is a statically typed language which means that no variable can be in use before being declared. The variables can be of two different types: either primitives or as references to objects. Among the primi-tive types does Java have support for the eight following: byte, short, int, long, float, double, boolean and char .
Java Virtual Machine
There exist a plethora of implementations of the Java Virtual Machine, but the official developed by Oracle is the HotSpot . One of the core ideas of Java during its development was to ”write once, run any-where.” The slogan was created by Sun Microsystems which at the time was the company developing Java and the Java Virtual Machine. . The idea behind the Java Virtual Machine was to enable one language to be platform independent and then modify the Java Virtual Machine to run on as many platforms as possible. The Java Virtual Machine is a virtual machine with its own components of heap storage, stack, pro-gram counter, method area, and runtime constant pool.
Figure 2.3 illustrates the architecture of the Java Virtual Machine. The Class Loader loads the compiled Java code and adds it into the Java Virtual Machine Memory. The Execution Engine reads the loaded byte-code from the Java Virtual Machine Memory and executes the appli-cation instructions. The Java Virtual Machine has built-in support for Java Agents which is a tool running between the Java Virtual Machine and the executed Java application. An Agent is loaded and given ac-cess to the application by the Class Loader. The Class Loader will trig-ger the implemented Java Agent and allow for instrumentation of each class file loaded by the Class Loader before being loaded into the Java Virtual Machine [50, 22].
Java instrumentation is a way to modify the execution of an applica-tion without knowing or modifying the application code itself. Good use cases for Java instrumentation are, for example, monitoring agents, event loggers and taint trackers. Instrumentation is an official Java package that provides services needed to modify the bytecode of pro-gram instructions. It is conducted through implementing an Agent that makes it possible to transform every class loaded by the Class Loader before being used for the first time. However, there is a library of classes which cannot be instrumented by an Agent. This library is the rt.jar containing the Base Java Runtime Environment which is needed to start up the Java Virtual Machine including the Class Loader. The in-strumentation of the Base Java Runtime Environment needs to be done before running the Java application.
The Java Agent operates on bytecode which is time-consuming work for the developer. To ease the task of instrumentation is the bytecode instrumentation library Javassist used [20, 23].
There exist several libraries that can be of help to the developer in the task of creating a Java Agent. The help comes in libraries of methods to manipulate Java bytecode. The library used in this thesis is Javassist. Javassist stands for Java Programming Assistant and provides two lev-els of API. The two are on source respectively bytecode level. We used the source level API which is providing the functionality of manipu-lating Java bytecode with little bytecode knowledge .
The Javassist source level API provides classes representing instances of classes, methods, and fields. These API classes contain methods to use when computing if the given class, method or field should be instrumented. The classes representing methods do also contain the methods insertBefore, insertAfter or insertAt. These three methods allow inserting Java code to the beginning, the end or at a specific position of the method.
Table of contents :
1.6 Ethics & Sustainability
2.1 Web Application
2.1.1 Structured Query Language
2.2 CIA Triad
2.3 Security Vulnerabilities
2.3.1 SQL Injection Attacks
2.3.2 Cross-Site Scripting
2.4 Taint Tracking
2.5.1 Java Virtual Machine
3 Related Work
4.2 Sources, Sinks & Sanitizers
4.3.1 The Utils Project
5.1 Test Environment
5.2.1 Web Applications
5.2.2 Micro Benchmarks
6.1 Web Applications
6.2 Introduced Overhead
7.1 Taint Propagation
7.2 Sources, Sinks & Sanitizers
7.3 Methodology of Evaluation
8 Future Work
A Raw Data