Signature Based Detection (SBD)
The static analysis enables a signature-based code classification. For example, if a malicious piece of code is found within the executable, an antivirus will drop the complete package. Nevertheless, this mechanism is not immune to code obfuscation. The behavioral analysis extends this part, where malevolent patterns are examined. Dynamic analysis limitations are some stealth and anti-debugging techniques . This part outlines ransomware characteristics extracted mostly from static analysis. Signature-based detection belongs to the Delivery phase since the payload (encryption) is not executed. The major drawback of signature-based detection is its inability to detect zero-day attacks. The protection of any system is valid only after updating the signatures database with the ones published of unseen malware.
Medhat et al. present a static-based framework having a multi-level alert system to detect ran-somware . Their work relies on the concept of shared patterns/code among ransomware that rep-resents static features. Four elements are kept: cryptographic signatures, API functions, file keywords, and file extension. Their detection tool is based on Yara rules . A limitation of their work is the omission of obfuscated or packed samples representing a significant number of ransomware samples in the wild.
Subedi et al. utilize reverse engineering tools to provide distinct identifiers for various ransomware families . For a given ransomware, they extract assembly instruction level, libraries used, and func-tions called. The association rule mining is deployed for DLLs (Dynamic-Link Library) identification to construct a known signature (sequence of DLLs) of the malicious software. Cosine similarity is used to measure the similarity between the frequency vector of the assembly code of a benign and malevolent software . Their implemented CRSTATIC tool can detect crypto-ransomware without executing the sample, based on the features provided above.
The authors in  focus on distinguishing a benevolent application from ransomware established on discriminant characteristics of the Portable Executable (PE) file. In the static analysis part, the PE file is disassembled and unpacked to extract the metadata from the header fields. Accordingly, 60 static properties are identified to enable an accurate classification (bytes on the last page, pages in file and relocations in the DOS header; size of optional header in the file header and number of sections) and nine ransomware specific (presence of packer, DLLs used for network communication, command for registry modification). Howbeit, if obfuscation was used, the authors performed a dynamic analysis to extract the rest of the features. The sample is then executed in an isolated environment having the sys-internal tools in place. An extended analysis revealed suspicious DLLs at run-time, the windows registry changes, and the alteration of directories.
The work done in  performs a comparison between binaries checking for a similarity amongst the samples. To this end, import hashing, fuzzy hashing, and YARA rules have been used. Even though each of these methods has its limitation, 92% of similarity was achieved among the same families. Fuzzy hashing outperformed the rest of the fuzzing algorithms based on time eﬃciency, memory, and hash/rule size. A continuation of their work is the analysis of the polymorphic aspect of various samples acquired. It is done by performing ransomware clustering using the combination of two fuzzy techniques: fuzzy hashing and FCM clustering method . They are able to aggregate multiple samples of the same family in diﬀerent distinct clusters. The accuracy of the clustering varies between the families and the number of clusters chosen.
The following step in ransomware mitigation relies on monitoring the API calls. They show the in-teraction between the malware and the computer of the victim. Many attackers rely on the services provided by Microsoft Cryptographic API to complete their payload execution, such as random number generator, AES encryption. Writing a specific code is prone to errors. Thus, attackers prefer to use built-in services to accomplish their tasks. Therefore, researchers analyzed the API calls, including their patterns and frequency, to classify processes (section 2.1). Monitoring carefully Windows events helps to extract patterns to describe the habitual behavior of any user accurately compared to ransomware (section 2.2). These methods are summarized in Table 2.2.
Chen et al. monitor the API calls made by ransomware to generate API calls flow graphs (CFG) . It is a proactive solution that provides an early stage detection while the ransomware is still setting its environment. They improve ransomware detection by analyzing the API call flow graph utilizing machine learning techniques. They develop their API Monitor tool to gather the calls made during the experiments executed on a virtual machine. A weighted directed graph represents the sequence of API calls. The weight corresponds to the frequencies of a specific API 1, followed by API 2. The CFG is converted to a feature vector where its values are normalized and rescaled from zero to one. Subsequently, feature selection is performed to retain certain features enabling a distinct separation between malicious and benevolent software. The Simple Logistic (SL) algorithm outperforms the rest of the classifiers (decision tree DT, random forest RF, support vector machine SVM).
In a like manner, Maniath et al. rely on the sequence of API calls to flag ransomware behavior. They utilize a modified version of the Cuckoo sandbox to extract those calls from the JSON report of 157 ransomware samples . The sequence of API calls is converted to a chain of integers (each integer refers to a specific system call). Missing inputs in the dataset occur because each malware is programmed to operate distinctly. Thus, to handle those missing inputs (for example, five sequence calls compared to 200), 0s are appended to the record since they do not influence the record’s value. By applying the LSTM algorithm (Long Sort-Term Memory), prominent results are achieved.
Takeuchi et al. also rely on the sequence of API calls to depict ransomware-like behavior . Their contribution is the representation of API calls by a vector where they quantified the sequence of those API calls (including the number of q-grams in the execution logs). A mapping is performed using n-grams. For a software using 2 distinct API calls A= a, b, the possible 2-gram would be (a, a), (a, b), (b, a) and (b, b) . The final vector is [0,1,1,0] since it does not include (a, a) nor (b, b). A major drawback is that two distinct API call strings can have the same output vector. Therefore, they solve this issue by adding the count of the performed calls. Since the number of API calls diverges exponentially between applications, standardized vectors are calculated to have a balanced set. SVM is used to diﬀerentiate between the created vectors belonging to ransomware or to a benign application. Similarly, Vinayakumar et al. and Hampton et al. analyze ransomware activity considering API call patterns and their frequency [76, 77]. Tests performed on the sequence of API calls show that ransomware identification is possible through its frequency. Additionally, some system calls are made solely by ransomware (InternetOpen, CryptDeriveKey, CryptGenKey) . Al-Rimy et al. propose a 0-Day aware crypto-ransomware behavioral detection framework . Their model is divided into three submodules: preprocessing, features engineering, and detection. They do not rely only on API calls collected during the preprocessing phase for early detection. They added a layer consisting of data-centric detection (this method focuses on the data using entropy or similarity) and anomaly detection based on a deviation of normal behavior. However, no tests were performed to prove the validity nor the accuracy of their framework even though it has promising characteristics.
Al Rimi et al. propose a combination of behavioral and anomaly-based mechanisms to achieve accurate ransomware detection rate and maintain low false alarms . Cuckoo sandbox is used for the experiments where all the samples are executed for 5 seconds to collect the API calls information. Each API call is treated as a feature. Term Frequency–Inverse Document Frequency (TF-IDF) is used to build a vector for the training and test phase. The vector contains the weight (calculated by applying TF-IDF formula) of each API. Mutual Information (MI) is adopted to extract significant features. As for the anomaly-based estimator, only benign software is used to carry out the experiments. This estimator flags a deviation compared to normal behavior. The fusion of both mechanisms shifted the detection results providing better classification. In some cases, specific user actions (for example, a mouse click) trigger the execution of ransomware. Therefore, the duration of 5 seconds is not adequate for API calls collection.
Cabaj et al. use a honeypot technique in addition to an automatic runtime system to analyze and detect ransomware through the network activity . Their approach is built on virtual machines to download and test ransomware samples on Windows XP. They reveal that CryptoWall uses domain names rather than IP addresses. Multiple actions are carried out by the sample, such as getting the IP address of the victim’s machine and contacting the hardcoded servers. Therefore, by blocking the DNS requests made by CryptoWall, the authors are able to enumerate all the contacted servers. The parties maintained encrypted communication. All the proxies hosting malicious scripts are identified.
File System Honeypot
Monitoring file system activity, apart from system calls, is crucial for an overall detection mechanism. In fact, if an attacker learns diﬀerent patterns or sequences of the system calls made to bypass security measures deployed on the system, an early detection of the malware is improbable. A honeypot is a resource used by administrators to detect unauthorized access to a system . Lee and Hong introduce a novel mechanism to make eﬃcient decoy files . Two search methods are extracted from malware’s source codes. The first one consists of performing a search looking for specific file extension hence .pptx, .docx, .txt. Then, it saves the location of these files, encrypting them one by one at the end of this process. The second method is encrypting a file as soon as it is found. Since the search is performed in order or reverse order of Windows-1252 (character encoding of the Latin alphabet), consequently, decoy files are created using the first or the last character in Windows-1252. Preferably, they should be located in the parent folder rather than in sub-folders due to ransom traversal patterns. The size and attributes of decoy folders can be updated to meet the new requirements of the ransomware in the wild and flag them as soon as possible. Lee and Hong’s work is complemented by Moore and Al Kossairi et al. investigations [97,98]. Moore’s work relies on a honeypot folder that a File Server Resource Manager (FSRM) monitors, followed by changes analysis of the Windows Event Logs. A tiered response to detection is developed based on the number of modified files. FSRM is a tool that prevents an already executing malware from infecting the entire file server. The EventSentry makes a warning if an attempt of modification is made to a specific object. The threshold is defined based on a regular observation of users’ behavior. Any abnormality noticed is a deviation of double or three times the normal activity. A practical method certainly, however, it can be bypassed if the malware does not attempt to access these areas.
Whereas Al Kossairi et al. monitor decoy folders by Watching File System Event Handler watcher applicable only on Windows OS. Decoy folders properties have been identified (variability, diﬀerentiability from benign ones). Low (contains random data) and High (contains fake data) Interaction Decoy files are used for the proof of concept. They are monitored by Watching File System Event Handler watcher. The decoy folders are positioned at the beginning of each directory to be first intercepted by the ransomware. These files contain misleading information about credentials or even IP addresses. They provide an eﬃcient detection mechanism. However, it is dependent on Find First File & Find Next File functions used in Windows OS to get the files or search directories. In addition, if an attacker used a reversed search, the victim would be alerted at the end of the encryption process leaving only the decoy files intact.
Table of contents :
List of Figures
List of Tables
I Literature Review
1 About Ransomware
1 Ransomware’s Workflow
2 Ransomware Timeline
2 Ransomware Detection Mechanisms
1 P1: Delivery
1.2 Data Backup
1.3 Access Control List (ACL)
1.4 Microsoft Volume Shadow Copy (VSS)
1.5 Signature Based Detection (SBD)
2 P2: Deployment
2.1 API Calls
2.2 Windows Events
3 P3: Destruction
3.1 Network Traffic Analysis1
3.2 Network Honeypot
3.3 File System Honeypot
3.4 Moving Target Defense (MTD)
3.5 Files Monitoring (Encryption, I/O requests)
3.6 Hardware Performance Counters (HPC)
3.7 Multiple Stage/ IOC (indicators of compromise)
3.8 Keys Backup
4 P4: Dealing
4.1 Bitcoin Tracking
5 Network Intrusion Detection System (NIDS) and Datasets
5.1 Types of NIDS
3 Work Environment, Ransomware Samples and Evasion
1 Dynamic Analysis (DA) Tools
1.1 Virtual Machine (VM)
1.2 Hypervisor/Virtual Machine Monitor (VMM)
1.3 Bare-Metal (BM)
2 What About Evasion?
2.1 Chosen Bare-Metal Platform: MoM
2.2 Ransomware Samples
1 File System Based Solution for Ransomware Detection
1 Need for Dynamic Analysis
2 Decoy score
2.1 Whitelists and Blacklists
2.2 Proposed Algorithm
3 From Decoy Score to Supervised Learning
3.1 Learning Phase
4 File System Traversal Velocity
5 Graph Similarity
5.1 Hierarchical Graph
5.2 Adjacency Similarity and Classification
6 Data Collection
7 Experimental results
7.1 Decoy Folders
7.2 Supervised Learning
7.3 File System Traversal Velocity
7.4 Ransomware’s Graph
8 Limitations & Conclusion
2 Network Based Solution for Ransomware Detection
1 Ransomware Network Traffic Dataset
1.1 Data Generation
2 Proposed methodology
2.1 Data filtering
2.2 Ransomware Network Session Reconstruction
2.3 Supervised Machine Learning
3 Experimental results
3.1 Zero-Day Ransomware Detection
3.2 Alert Time
3.3 Results Overview
4 Is There A Correlation Between System & Network Logs?
3 From Ransomware to Doxware
1 From Data Encryption to Data Exfiltration
1.1 Where to Find Sensitive Data and How to Track It?
1.2 Data exfiltration
2.1 From Ransomware To Doxware
2.2 Data Formats Choice
2.3 Natural Language Processing (NLP) and Information Retrieval (IR)
3 Content Analysis Proposal
3.1 Target Threat Model
3.2 Chosen Corpus: Contracts
3.3 Lexical Generation
3.4 Document Content Evaluation
3.5 Metadata Analysis
3.6 Proposal Summary
4 Proposal Analysis
4.1 Created Lexicons
4.2 Test Bench Results
5 Protecting User Assets
6 Honeypot in the case of a Doxware attack
6.1 Honeypot Key Elements
6.2 Proposal Overview
6.2.1 Decoy Folder Name Generation
6.2.2 Decoy File Content Generation
6.3 Evaluation and Discussion
4 Honeypot Ransomware Detection
1 Ransomware Detection Tools
1.1 Environment Setup
1.2 Anti-Ransomware Tools
1.2.1 Padvish AntiCrypto 220.127.116.113
1.2.2 CyberReason RansomFree 18.104.22.168
1.2.3 AntiRansom V3
1.3 Bypassing Decoy Folders
2 Classification Decoy/Non Decoy
2.3 Supervised Learning Results
2.3.1 Test Bench Number 1
2.3.2 Test Bench Number 2
3 Decoy Files Recommendations
5 Future Perspectives and Conclusion
1 Future Perspectives
1.1 Windows-Based Ransomware Countermeasures Roadmap
1.1.1 P1: Delivery
1.1.2 P2: Deployment
1.1.3 P3: Destruction
1.1.4 P4: Dealing
1.2 Mobile Ransomware
1.2.1 API Calls
1.2.2 Multiple IOC (indicators of compromise)
A Extended French R´esum´e
1 Revue de la Litt´erature