Security and privacy consequences of flow transformations

Get Complete Project Material File(s) Now! »

Protection of user privacy thanks to address obfuscation

Since IP addresses are also used to identify and to track users of the Internet, it is important to provide solutions to obfuscate their addresses. Of course, other various ways exist to track users, for example through cookies [Set09] or the web browser configuration [Pet10]. Nevertheless a privacy friendly Internet with secure IP addresses is required first to build privacy protection on other layers. Some full featured protections already exist, based on the David Chaum idea of Mixes [Cha81]. The most popular is probably the Tor network [DMS04], based on Onion routing. In this system, a packet will not go directly to the destination, but will first be several times encrypted and then sent across several intermediate servers (see Figure 2.1). At the end of this path, an output point will take the payload of the packet and send it to the real destination. From the destination point a view, the source address is the source address of the Tor outgoing server. The Tor network protects the user’s privacy as long as someone does not control enough Tor nodes or is not able to monitor the incoming and outgoing traffic flows to correlate traffic flows with each other in time [DRH].
Based on the same Mixes idea but built on static path of trusted servers (see Figure 2.2), the AN.ON project [BFK01] provides the same kind of privacy protection. The path of a user communication is not random like in Tor, but the servers in the ”Mix Cascade“ can be trusted by users, since they know the organizations providing the service. This service protects users as long as Mix Servers do not work together to track them.
If these networks try to protect themselves against powerful attackers (like a governmental organization, one example of world widely deployed surveillance is done by the National Security Agency (NSA) [Gre13]), it is at the price of complex design and deployment, and drawbacks for users. The first problem is usability: the user has to know the tools and how to configure them. The second issue is the latency and bandwidth overhead, because this system slows down the connection. The last one is the congestion on these Mixes networks. To obtain a good anonymity set that provides good privacy protection, a user has to share the Mixes networks with many users. In the Tor network, the drawback is then a very slow connection. Even though there are many Tor nodes, they are still not enough for the number of users, and all nodes do not have a high bandwidth to the Internet. In the AN.ON project, it is possible to pay for a better quality a service, thanks to high performance servers funded by users, but this financial price can discourage users. All these costs are probably too high for standard users who want to protect themselves against weaker attackers than the NSA. A prominent example of weaker attackers would be a web service which tries to reidentify its users, like the Google and Facebook companies. They are not in a position of man in the middle attacker and they cannot monitor the international traffic. However, since a majority of websites adds embedded objects like graphics provided by Google and Facebook services, they are in a very good position to track all activities and interests of users.

Architecture and components for the obfuscation

To increase user privacy, we propose to assign an individual external address to each flow. To each independent flow of packets, the connected device assigns a new flow label1. The connected device still uses the same address for each flow, addresses are stable locally.
To provide privacy protection, a middlebox is inserted at the border of a trusted network (see Figure 2.3). The middlebox assigns a new external address to each pair of (internal IP address, flow label) and rewrites the source addresses of the outgoing packets and the destination addresses of the incoming packets. Because the middlebox is in position of a border router, it receives all the packets from the local network.Therefore, it does not need to send extra neighbor discovery packets. In contrast, if the rewriting happens on the end devices, this solution implies some active neighbor discovery.
Since some applications can be incompatible with address rewriting (similar to the implications of NAT in IPv4 [Hai00]), a flow label set to zero is a signal to forbid rewriting. This special label can be used if a temporary address is undesirable, for example in case of IP source address filtering on the destination device or an incompatible application layer. To summarize, the intelligence to discriminate flows and optimize privacy relies on end devices, and all rewritings rely on the middlebox, i.e. under control of the network administrator. There is no need to change local address assignment policy. The middlebox should be located between the local firewall and the Internet; it avoids to rewrite firewall policies.

Computation on the middlebox

Since the “intelligence” of flow classification relies on connected devices, the middlebox does not need to do a complex parsing of packet headers, and to follow a TCP stream in a stateful way. However, it has to maintain a context to perform rewriting (cf. Table 2.1). For each outgoing packet with an unknown pair (internal IP address, flow label) (short (IPint, label)) the middlebox creates a context and generates a random interface identifier. This random identifier becomes an address by concatenation with the prefix, named external IP address (short IPext). The stored context is a 3-tuple (IPint, label, IPext), and all following packets matching the pair (IPint, label) will be rewritten with the IPext. For all incoming packets, the middlebox rewrites the destination address with IPint if a context exists, or applies the standard routing and firewall policies. In both directions, the middlebox has to adjust the transport layer checksum, since IPv6 addresses are part of the checksum. This adjustment is a simple operation, explained in Section 2.4.3. Note that a flow (defined by all packets sharing the same source IP and the same flow label) can be made of several TCP connections (or other transport protocols). For example, we recommend to use the same flow for all elements of a given webpage. The middlebox itself does not care about upper protocol layers, because the flow assignment is done on the end device.

Flow label assignment by application

The connected device is the best place to discriminate flows and to assign flow labels. For example, a peer-to-peer application probably needs to use the same address for several TCP connections, a Web browser knows if one connection is related to another, etc. In our case, the best way is to modify the application to assign flow labels efficiently. However, this solution is no realistic. The lack of standardized API for flow label is discussed in Chapter 4. We found some workaround, and we propose a solution to assign one flow label to each application in the Section 4.6.

Table of contents :

Abstract
Acknowledgments
Résumé
Introduction
List of Figures
List of Tables
1 State of the art
1.1 Flow management on the Internet
1.2 Consequences of using IP addresses identifiers
1.3 Flow transformations: identifiers for special cases
1.3.1 Private addresses and the renumbering problem
1.3.2 Locator/Identifier Separation Protocol
1.3.3 Host Identity Protocol
1.3.4 IP address exhaustion and the NAT
1.3.5 Multihoming
1.4 Security and privacy consequences of flow transformations
1.4.1 Security of flow transformations
1.4.2 Do not broadcast identifiers to protect privacy
1.5 New flow properties offered by IPv6
1.5.1 New IPv6 features
1.5.2 Local Network Protection
1.5.3 Cryptographic protection of addresses
1.5.4 Site Multihoming by IPv6 Intermediation
1.5.5 Encoding more information in addresses
1.5.6 Solving the renumbering problem thanks to NPTv6
1.6 Clock based flow identification
1.6.1 Shim6 extension for privacy improvements
1.6.2 Address hopping thanks to the application layer
1.6.3 A Moving Target IPv6 Defense
1.7 Toward dynamical identifiers
1.7.1 Dynamical identifiers without adding headers
1.7.2 From temporary to ephemeral addresses
2 IP address obfuscation on middleboxes, managed by end devices
2.1 Protection of user privacy thanks to address obfuscation
2.2 Architecture and components for the obfuscation
2.2.1 Overview of the solution
2.2.2 Computation on the middlebox
2.2.3 Flow label assignment by application
2.3 Implementation with the Linux Kernel
2.3.1 Address rewriting and flow management by the middlebox .
2.3.2 Our Netfilter Kernel module
2.4 Evaluation and consequences
2.4.1 Risk of address collision
2.4.2 Compatibility analysis with the current Internet
2.4.3 Performance of the middlebox
2.5 Conclusion
3 Spoofing protection based on address spreading
3.1 Address spoofing on the Internet and countermeasure
3.1.1 Standard identification of a flow
3.1.2 Other spoofing protections
3.1.3 Definition of address spreading
3.2 Discussion on the best places to enable the spreading
3.2.1 On the end device
3.2.2 Solution with a patched router
3.2.3 On the communication path
3.2.4 Delegation of addresses prefixes to the end device
3.3 Detailed process of the protocol
3.3.1 General principles
3.3.2 Step by step initialization
3.3.3 Detailed packet processing on spreaders
3.3.4 Detailed steps of packets processing (incoming packets)
3.3.5 Identification of a flow thanks to the flow label field
3.4 Loss of packets due to desynchronization
3.4.1 Theoretical loss due to false positive detection
3.5 Conclusion on address spreading
4 Flow labels on Linux and various operating systems
4.1 The lack of standardized API
4.1.1 Historical evolution of RFCs
4.1.2 Current standardization
4.2 Diversity of implementation on operating systems
4.2.1 The KAME project
4.2.2 Flow label to Solaris
4.2.3 MAC OS X
4.2.4 No flow label on Microsoft Windows
4.2.5 Comparison of operating system implementations
4.3 Need of the Linux API refactoring
4.3.1 Historical implementation on the Linux Kernel
4.3.2 Principles of the implementation
4.3.3 Sharing and permission system
4.3.4 The current implementation in details
4.4 Limitations of the Linux Kernel API
4.4.1 Restrictions incompatible with RFCs
4.4.2 Lack of options in flow label Management
4.5 Modifications on the Kernel
4.5.1 Removing old restrictions
4.5.2 Adding options to read flow label
4.5.3 The reflecting option
4.6 Preload library
4.6.1 Preloading the library on dynamically compiled software
4.6.2 Implementation and installation of the library
4.6.3 Evaluation of the library performance
4.7 Conclusion on the flow labels implementation
5 Evaluation of address spreading
5.1 Evaluation of packet loss
5.1.1 Test beds
5.1.2 Spreading consequences
5.1.3 Backward temporal windows for delayed packets
5.1.4 Forward temporal window to avoid desynchronization packet loss
5.1.5 Configuration requirement to fully avoid packet loss
5.2 AES encryption cost
5.3 Automatic resynchronization
5.3.1 Detection of desynchronization
5.3.2 Rules for resynchronization
5.3.3 Implementation and tests
5.3.4 Limitation of detection and resynchronization
5.4 Benefits of address spreading with IPsec
5.4.1 Identification of IPsec flows
5.4.2 Spreading to protect IPsec devices
5.4.3 Performance benefits of spreading to protect IPsec devices .
5.5 Conclusion: performance of the spreading
6 Conclusion
Bibliography