Studying attacks targeting fingerprinting-based authentication systems

Get Complete Project Material File(s) Now! »

Protecting data access

Encryption. The first defenses concern the way data is sent on the Web. First, the HTTPS protocol intends to use the TLS/SSL encryption technology over the classic HTTP protocol [56]. It allows users to exchange data under an encryption layer that protects against cookie hijacking and other threats. Felt et al. [98] measured the current adoption of HTTPS on the Web. They collected data from Chrome and Firefox users from 2014 to 2017, and showed the proportion of pages loaded under HTTPS keeps growing, and reached 58 − 90% on Chrome depending on the OS used, and 50-57% on Firefox. They also measured disparities according to the region and country of the user. In addition to HTTPS, The HTTP Strict Transport Security (HSTS) header has been designed to use HTTPS by default. HSTS is an HTTP header a server adds to its response. It tells the client to always reach it with HTTPS for a certain period of time. The browser receiving the response will store this information and will send the future requests to this domain directly with the HTTPS. Felt et al. [98] showed the HSTS header is only available on 3% of websites on the Alexa Top 1M. Restricting cookies access. Browsers also implemented several defenses to protect cookies against interception and other malicious usages [57]. The Secure cookie response header allows websites to define cookies that must only be sent over HTTPS, preventing the interception of cookies via network monitoring. Websites can use the instruction HTTPOnly that will prevent the access of cookies via JavaScript, preventing them from being stolen via XSS attacks. Finally, websites can use the Domain, Path, and the experimental SameSite instruction to restrict the access of cookies and protect against CSRF attacks.

Bots protection techniques

Due to the growing importance of bots, several measures tend to control their possibilities and defend against them. The robots.txt file allows websites to define a list of folders and files crawlers and bots should not access. Bots should first read the content of this file and adapt their behavior accordingly before starting crawling and indexing the website. While search engine bots often respect the instructions of the robots.txt file [52, 29, 69], nothing forces them to do so. Additionally, the robots.txt file is available publicly and malicious agents can see if a website wishes to protect resources against crawlers.
IP address reputation. Bot detection techniques can refer to the reputation of IP addresses to classify users. Previously, bots were often run behind proxies and specific IP addresses that were easy to detect and block. Now, bots often use many IP addresses, including residential ones that have an excellent reputation. This evolution highlights the limits of this technique. Traffic analysis. Other techniques analyze the behavior of users to detect suspicious behaviors that could be the work of bots. They monitor the server logs as well as the user interactions, such as mouse movement or keyboard usages to detect suspicious behaviors [109], such as too many pages browsed in a limited period of time or an absence of mouse movements or click. However, these techniques need a large amount of data before being able to accurately classify the traffic. Captchas. They are Turing tests that aim at distinguishing bots from humans. When a website has some doubt about the nature of a user, it can expose a captcha to the user whose goal is to solve it. If she can, she is considered as a human, otherwise she might be a bot. Several techniques exist and propose various tests to the user. 3 techniques are presented in Figure 2.3: a textual captcha, a Google’s reCAPTCHA, and a Geetest captcha. However, the technique is subject to shortcomings. i) First, Bursztein et al. [91] found that users might fail solving captchas. ii) Second, and more important, Sivakorn et al. [139] showed in 2016 that it was possible to solve Google’s reCAPTCHA automatically, again questioning the efficiency of the technique.

Strengthening password creation

As passwords are the most dominant authentication factor, a common technique to improve security of an authentication system relies on the strengthening of the password creation rules. Shay et al. [137] conducted a study of 470 students and staff belonging to the Carnegie Mellon University (CMU) to measure their opinion about the new password policy of the University. The University system used to have no rule when creating a password. The University updated its policy and the rules to create and update a password:
• The new password must contain at least 8 characters, and include at least one upper-case letter, one lower-case letter, one digit and one symbol,
• After removing the non-alphabetic characters, the password cannot match a dictio-nary word,
• The password cannot contains 4 occurrences or more of the same character.
In their response, participants say they needed an average of 1.77 tries to create this new password and estimate at 1.25 the number of attempts to authenticate on the system. Similarly to the study presented in Section 2.2.1.2, more than 80% of the participants admit to reuse passwords in different authentication systems. They also studied the values of password thanks to the answer of the participants. In average, the passwords contain 10.49 characters, which is more than the requirement of 8 characters. They are formed by 5.94 lower-case letters, 1.54 upper-case letters, 2.70 digits, and 1.70 symbol.
While the strengthening of the password creation rules is crucial for security, several attacks and threats presented in Section 2.2.2 collect the complete password, such as phishing or data leak. In this context, the single use of password exhibits several flaws.

Multi-Factor Authentication

Multi-Factor Authentication (MFA) aims at fixing these flaws. Instead of relying on a single authentication factor, MFA leverages additional factors to enhance web authentication [128]. During a web authentication attempt, each factor must be verified to validate the identity of the user. The authentication system can use as many factors as necessary—2 factors (2FA), 3 factors (3FA)—but has to find a good balance between user experience and security. If the actions required to authenticate are too complex or too numerous, users will become uncooperative to adopt the authentication scheme.
Authentication factors can be divided into 3 main categories:
• Knowledge factor: This is something the user knows, such as a password;
• Ownership factor: This is something the user owns, such as an email account, a smartphone or a physical token. To authenticate and prove that she owns this factor, she is often provided a code received on the factor. It can be sent by SMS or via a smartphone application to prove the user owns a mobile phone, or by email to prove she owns the email. The code is often called a One-Time Password (OTP), which is only valid for this specific authentication attempt, often with a short validity duration period. When typing this code on the authentication page, the server compares the submitted code to the one sent on the factor. If it matches, then she can authenticate;
• Inherence factor: This is something the user is. She does not have anything to do to get this factor because it is based on the nature of the user. It concerns properties of the user, such as a biological fingerprint, or a behavior pattern. These systems often require specific scanners to collect the property of the user during an authentication attempt. This category also includes factors concerning the context of the authentication attempt, such as the IP address of the user.
All the factors presented above do not improve the security in the same ways, and have different impacts on the user experience. Some factors, such as behavior patterns or based on the context of the authentication attempt, are called implicit because they do not require a user-specific action to be collected and verified and do not cause a degradation of the user experience during the authentication attempt. By opposition, many factors are said to be explicit because they require an action of the user to be collected—typing a password, a OTP, putting a finger on the fingerprint scanner. Explicit factors are often more secure than implicit factors because they cannot be collected as easily, meaning an attacker must rely on a user interaction to collect the explicit factor. Because of this user interaction, the usage of an explicit factor degrades the user experience. The main challenge for authentication systems relies on the balance between a security improvement and a user experience degradation. Authentication systems can integrate factors in an immediate or delayed mode. An immediate factor will be verified during the authentication attempt, while a delayed factor will be checked during the session duration. To minimize the impact on the user experience, the implementation of a factor in a delayed mode often implies the use of an implicit factor, such as an IP address check.

Table of contents :

List of figures
List of tables
1 Introduction
1.1 Context
1.2 Objectives
1.3 List of Scientific Publications
1.4 List of Tools and Prototypes
1.5 Outline
2 State of the Art
2.1 Context
2.1.1 Birth of the Web
2.1.2 Web evolution
2.2 Web authentication
2.2.1 Concept
2.2.2 Threats and attacks
2.2.3 Protecting data access
2.2.4 Bots protection techniques
2.2.5 Improving authentication
2.3 Browser fingerprinting
2.3.1 Definition
2.3.2 Properties
2.3.3 Attributes
2.4 Browser fingerprinting studies
2.4.1 Measuring browser fingerprinting properties
2.4.2 Detection and classification
2.5 Browser fingerprinting countermeasures
2.5.1 Blocking scripts
2.5.2 Unifying attributes value
2.5.3 Changing attributes value over time
2.5.4 Induced information leaks
2.6 Browser fingerprinting usages
2.6.1 User Tracking
2.6.2 Bot Detection
2.6.3 User Authentication
2.7 Conclusion
3 FP-Redemption: Studying Browser Fingerprinting Adoption for the Sake of Web Security
3.1 A Dataset of Secure Web Pages
3.1.1 Websites Under Study
3.1.2 Web Page Acquisition
3.1.3 Monitored Fingerprinting Attributes
3.1.4 Resulting Dataset Description
3.2 Classification of Fingerprinters
3.2.1 Incremental Script Classification
3.2.2 Script Classification Results
3.2.3 Algorithm results validation
3.3 Analysis of Secure Web Pages
3.3.1 Browser Fingerprinting Attributes
3.3.2 Similarities of Browser Fingerprinting Scripts
3.3.3 Origins of Browser Fingerprinting Scripts
3.3.4 Web page type and website category & country impact
3.3.5 Additional Security Mechanisms
3.4 Websites resilience against 2 attack models
3.4.1 Stolen credentials
3.4.2 Cookie hijacking
3.5 Discussion
3.5.1 Intents in fingerprinting usages
3.5.2 Fingerprinting is barely used for security
3.5.3 Deficiencies in the state of the art
3.6 Conclusion
4 FP-Controlink: Studying fingerprinting under a controlled environment to link fingerprints
4.1 Methodology
4.1.1 Controlled environment
4.1.2 Browser versions
4.1.3 Attributes
4.1.4 Data collection
4.2 Causes of fingerprints diversity
4.2.1 Desktop evaluation
4.2.2 Mobile evaluation
4.2.3 Layers responsible for an attribute change
4.3 Fingerprints evolution through browser versions
4.3.1 Release versions
4.3.2 Nightly/beta versions
4.3.3 Categorizing attributes
4.4 A browser fingerprints linking algorithm
4.4.1 Main goal
4.4.2 Design
4.4.3 Parameters
4.5 Evaluation of the linking algorithm
4.5.1 Datasets
4.5.2 Key performance metrics
4.5.3 Parameters values
4.5.4 In-the-wild results
4.6 Discussion
4.6.1 Ethical consideration
4.6.2 Choosing parameters value
4.6.3 Linking algorithm improvements.
4.7 Conclusion
5 Advanced risk-based authentication using browser fingerprinting
5.1 Authentication scheme
5.1.1 Design
5.1.2 Challenges
5.2 Implementation
5.2.1 Legacy Authentication Systems
5.2.2 Rising to the challenges
5.2.3 Authentication scheme and CAS plugin
5.3 Evaluation
5.3.1 Dataset constitution
5.3.2 Key Performance Metrics
5.3.3 Trusted network fingerprints and authentication attempts
5.3.4 Linking algorithm scores
5.3.5 Collection and analysis time
5.4 Discussion
5.4.1 Ethical considerations
5.4.2 Security versus user experience
5.4.3 Client-side-generated information
5.4.4 Device management rules
5.4.5 Compromised device
5.4.6 Adding features to the authentication scheme
5.5 Conclusion
6 Conclusion
6.1 Contributions
6.1.1 FP-Redemption: Studying Browser Fingerprinting Adoption for the Sake of Web Security
6.1.2 FP-Controlink: Studying fingerprinting under a controlled environment to link fingerprints
6.1.3 Advanced risk-based authentication using browser fingerprinting
6.2 Short-term perspectives
6.2.1 Discovering new fingerprinting JavaScript attributes
6.2.2 Studying attacks targeting fingerprinting-based authentication systems
6.2.3 Investigating Web Assembly technology
6.3 Long-term perspectives
6.4 Concluding note
Bibliography