Detection of Malicious Web Pages by Companies and Researchers

Get Complete Project Material File(s) Now! »

Web Attacks and Hosting Providers

Several works have studied the threats that affect websites all around the world as well as users visiting infected pages [86, 92–94]. Research has been focusing also on the ways in which criminals exploit search engines in order to reach their victims, by poisoning search results for popular queries [50], and on possible solutions to detect them [66]. The work presented by Moore et al. [75] starts from the analysis of logs collected from phishing websites to demonstrate that web attackers typically search for vulnerable websites by employing specific search terms on common search engines. Also, authors find out that websites compromised with the purpose of hosting phishing pages are much more likely to be re-compromised in the future if they can be identified by attackers through specific searches on search engines. John et al. [52], instead, studied how attackers find, compromise and misuse vulnerable servers on the Internet, by employing honeypots that actively attract attackers and dynamically generate honeypot pages related to what they search for, in order to lure them and collect attack logs. This work is closely related to our analysis of the behavior of web attackers, and will be discussed more in detail in Section 2.2.
Researchers have also studied how all malicious activities are combined by criminals in order to be able to conduct attack campaigns infecting tens of thousands of hosts [115].
In a recent paper by Bau et al. [7], the authors evaluate current commercial tools for detecting vulnerabilities in web applications. This is related to what web hosting providers can do in order to detect, or even prevent, attacks on their customer’s websites. As this work shows, however, the tested commercial tools mainly rely on black-box approaches, and are not able to find all possible vulnerabilities. Recently, a web hosting provider [21] announced an improvement of his hosting offer by adding free automated website vulnerability scanning, fixing and recovery. Such a service is presumably running as a white-box approach on the network and server side. This service is related to what, in our work, we refer to as “add-on” security services. Unfortunately, this service was announced when our experiments were already completed, and it was therefore not possible to integrate it into our results.

Behavior ofWeb Attackers

Honeypots are nowadays the tool of choice to detect attacks and suspicious behaviors on the Internet. Most of the research done with the purpose of observing or analyzing attackers’ actions has been carried out by using honeypot systems. Th purpose of these systems is, typically, to collect information related to all actions performed by an attacker on a vulnerable system, or seemingly vulnerable one, in form of event logs, network traces, and files that have been created and modified during the attack. Honeypots can be classified in two categories: client honeypots, which detect exploits by actively visiting websites or executing files, and server honeypots, which attract the attackers by exposing one or more vulnerable (or apparently vulnerable) services. The behavior of web attackers can be studied by employing honeypots of the second type, as in this case we are interested in observing what attackers do on the server, after a website has been compromised. Several server-side honeypots have been proposed in the past years, allowing for the deployment of honeypots for virtually any possible service. In particular, we can distinguish two main classes of server honeypots: low-interaction and highinteraction ones. The first class of honeypots only simulates services, and thus can observe incoming attacks but cannot be really exploited. These honeypots usually have limited capabilities, but are very useful to gather information about network probes and automated attack activities. Examples of these are honeyd [91], a framework for virtual honeypots that is able to simulate the networking stack of different operating systems, Leurre.com [89], a project for the development and deployment of a distributed honeypot network spanning several countries and collecting network-level attack information, and SGNET [60], a scalable framework for the deployment of low interaction honeypot networks, where each honeypot is able to collect almost the same amount of information as a real high interaction honeypot if dealing with server based code injection attacks.

Detection of Drive-by-Download Attacks

In the last few years, the detection of web pages used to launch drive-bydownload attacks has become an active area of research and several new approaches have been proposed. In Chapter 6, we will describe the design and implementation of a fast filter for the large scale detection of malicious web pages. The purpose of this system is to proactively crawl the web and filter the collected web pages in search for potentially malicious ones. As such, it can be very useful to researchers and antivirus companies willing to collect and discover new malicious pages on the Internet. This section briefly goes through the papers and projects that have been published in this field of research in recent years.

Table of contents :

1 Introduction
1.1 Malicious Code on the Web
1.2 Attack Model
1.3 Goals
1.4 Contributions
2 Related Work
2.1 Web Attacks and Hosting Providers
2.2 Behavior of Web Attackers
2.3 The User Point of View
2.3.1 User-based Risk Analysis
2.3.2 User Profiling
2.4 Detection of Drive-by-Download Attacks
2.4.1 Dynamic approaches
2.4.2 Static approaches
2.4.3 Alternative approaches
3 Web Attacks From a Provider’s Point of View
3.1 Introduction
3.2 Setup and Deployment
3.2.1 Test Cases
3.2.2 Attack Detection Using State-of-the-Art Tools
3.2.3 Test Scheduling and Provider Solicitation
3.3 Evaluation
3.3.1 Sign-up Restrictions and Security Measures
3.3.2 Attack and Compromise Detection
3.3.3 Solicitation Reactions
3.3.4 Re-Activation Policies
3.3.5 Security Add-on Services
3.4 Lessons Learned, Conclusions
4 Web Attacks From the Attacker’s Point of View
4.1 Introduction
4.2 HoneyProxy
4.2.1 Containment
4.2.2 Data Collection and Analysis
4.3 System Deployment
4.3.1 Installed Web Applications
4.3.2 Data Collection
4.4 Exploitation and Post-Exploitation Behaviors
4.4.1 Discovery
4.4.2 Reconnaissance
4.4.3 Exploitation
4.4.4 Post-Exploitation
4.5 Attackers Goals
4.5.1 Information gathering
4.5.2 Drive-by Downloads
4.5.3 Second Stages
4.5.4 Privilege Escalation
4.5.5 Scanners
4.5.6 Defacements
4.5.7 Botnets
4.5.8 Phishing
4.5.9 Spamming and message flooding
4.5.10 Link Farming & Black Hat SEO
4.5.11 Proxying and traffic redirection
4.5.12 Custom attacks
4.5.13 DOS & Bruteforcing tools
4.6 Conclusions
5 Web Attacks from the User’s Side
5.1 Introduction
5.2 Dataset and Experiments Setup
5.2.1 Data Labeling
5.2.2 Risk Categories
5.3 Geographical and Time-based Analysis
5.3.1 Daily and Weekly Trends
5.3.2 Geographical Trends
5.4 Feature Extraction for User Profiling
5.5 Evaluation
5.5.1 Feature Correlations
5.5.2 Predictive Analysis
5.6 Discussion and Lessons Learned
5.7 Conclusions
6 Detection of Malicious Web Pages by Companies and Researchers
6.1 Introduction
6.2 Approach
6.2.1 Features
6.2.2 Discussion
6.3 Implementation and setup
6.4 Evaluation
6.5 Conclusions
7 Conclusions and Future Work
8 Résumé
8.1 Introduction
8.1.1 Code Malveillant sur le Web
8.1.2 Modèle d’Attaque
8.1.3 Objectifs
8.1.4 Contributions
8.2 Fournisseurs d’Hébergement
8.2.1 Introduction
8.2.2 Résultats
8.3 Attaquants
8.3.1 Introduction
8.3.2 Résultats
8.4 Les Utilisateurs
8.4.1 Introduction
8.4.2 Résultats
8.5 Entreprises de Sécurité et Chercheurs
8.5.1 Introduction
8.5.2 Résultats
8.6 Conclusions .