Expressiveness in Symmetric Searchable Encryption

Get Complete Project Material File(s) Now! »

Searchable Encryption

Symmetric searchable encryption (SSE), or its generalization to structured encryption (STE)1, is a tuple of three probabilistic algorithms running in a polynomial time: Setup, Lookup and Search. At a higher level, Setup is an algorithm that takes the plaintext data set as input, and outputs the encrypted data structure along with the encrypted files. Lookup (called also Token) is an algorithm that takes as input a boolean function of keywords and outputs a secure token. Search (called also Query) is an algorithm that takes as input both the secure token and the encrypted data structure, and outputs the desired encrypted files. We give a more formal definition of SSE in Section 3.2.2. From a research perspective, there are many challenges and balances inherent to each of these algorithms. The setup overhead, expressiveness of the query, computation and communication overhead, I/O efficiency and leakage (security) are fundamental challenges for SSE constructions.
In the following, we give more details about each of these challenges.

Expressiveness of the Query

Expressiveness in SSE englobes the different types of queries that an encrypted structure can be queried on (or support). Query expressiveness includes, but not limited, to conjunction, dis-junction, and boolean queries. Recently, many researchers got interested in building encrypted structures for different types of queries such as range [59], similarity [22, 136], shortest distance in graphs [42, 104], and semantic queries [111]. From a theoretical perspective, all queries can be represented as a boolean expression. As an instance, an integer range query to find all files containing a value within the interval [a, b] is equivalent to a conjunctive query composed of all integers within the interval such that {a, a + 1, · · · , b 1, b}. Enhancing query expressiveness has significant impact on security and efficiency. Naively, any complex query can be constructed while leveraging single keyword SSE. In general, boolean queries can be represented in a CNF or DNF formulas 2, and therefore multiple keywords within the boolean query can be searched for in an atomic manner i.e., by invoking single keyword SSE over every term within the boolean query. For instance, if the query equals w1 AND w2, then one can separate the query into three sub-queries such that the client runs as a sub-routine a single keyword SSE protocol on w1, then redoes the same process on w2. The client gets two sets containing all encrypted files matching respectively keywords w1 and w2, separately. The client finally outputs the intersection of both of these sets. While this simple solution provides correctness, it has some non-trivial efficiency downsides. First, if we consider a worst-case scenario where w1 and w2 match more or less the en-tire encrypted data set, then the client might have to download the entire encrypted data set where the intersection only consists of a couple of encrypted files. Clearly, this can have a devastating impact on communication bandwidth between the client and the server. One however, can work with an SSE that reveals the answer, and therefore the server will perform the intersection instead of the client, but this would have an important implication on the security of the construction as it leaks more than needed. As the server being traditionally considered as the main adversarial entity in data externalization, one needs to reduce as much as possible the amount of information that can be leaked to the server during the search phase. In particular, fetching documents for every query can lead to disclosing unnecessary information to the server3. The server can infer, for instance, the number of documents that an intersection of any two disjoint keywords in the query contains. An adversary with some auxiliary information can infer more information. Reducing leakage is extremely important as it has recently been shown that it can be used to recover plaintext queries or the outsourced data itself by [36, 78, 113]. Ideally, a conjunctive query in an SSE construction only consists of one encrypted query that the server processes without disclosing which keyword in the conjunction matches which encrypted file. That is, an ideal leakage in a conjunctive SSE only equals the final result of the query.
To sum up, expressiveness comes up with many security and efficiency challenges. The ulti-mate goal is to reduce leakage while not impacting efficiency. Finding out the right balance greatly depends on the underlying application.

Setup overhead

To securely outsource a data set, the user needs to generate an encrypted data structure (or encrypted searchable indexes). The Setup algorithm is an off-line step that is performed once in the lifetime of the system. It takes as input the plaintext data set, and involves different operations depending on the degree of expressiveness desired for the STE construction. These operations gen-erally include data parsing, stemming operations, evaluating cryptographic primitive operations. In the case of single keyword search, these operations are at least linear in the size of the data set. However for more expressiveness, the setup can be polynomial in the size of the data set. This can take a considerable amount of time especially for large data set. Researchers [38] reported that it can take days of processing time using powerful servers to generate the encrypted data struc-ture. Moreover, the size of the encrypted data structure often equals the size of the entire data set which represents also an additional storage cost. For average users with a normal commodity machine, the setup represents a considerable one-time cost and can take much more time than the one reported so far by the aforementioned research papers.
Some STE constructions can enable the client to perform the setup operation in an on-line manner, i.e, generate the encrypted data structure in a step-by-step manner. This setup is possible if the STE is a forward-secure dynamic scheme, or only dynamic but under the constraint that the client must not perform any search operation as long as the setup did not terminate yet. Forward secrecy is a property that allows the client to unlink updates from previous search queries. That is, an adversary cannot infer any information from the updates knowing the search tokens.
Another research challenge in the setup phase is the creation of the encrypted data structure that preserves the same search efficiency whether the structure is stored in the main memory or in the hard disk. This research challenge is known in literature as the locality challenge. The security of the encrypted data structure consists in part of scrambling data such that it will be uni-formly distributed. That is, when the encrypted data structure is stored in the hard drive, accessing non-contiguous memory blocks results in considerable delays. Consequently, the search becomes extremely slow when compared to the same search if the structure is kept in the main memory. Re-cent results show that the most efficient schemes in theory are not necessarily the best in practice in case of large data sets [40].

READ Findings of the conducted survey to the German population

Computation overhead

Computation overhead is one of the main comparison metric in SSE constructions. When SSE was first proposed in literature [130], the search complexity was linear in the size of the entire data set. In particular, if we consider that all n files are textual, and each has size m, then the search can be performed in O(m · n) for single keyword search. Four years later, Goh [66]4 succeeded to reduce the search overhead to be only linear in the number of stored documents. Moreover, the report introduced the first index based SSE and proposed further improvements that were in the appendices that can further reduce the computation to be only logarithmic in n. In 2006, Curtmola et al. [50] presented optimal search complexity construction, i.e., the computation performed by the server is only linear in the result set. While computation overhead has been more or less solved for single keyword search, for more expressive constructions it is not. Boolean SSE schemes are still not achieving optimal search efficiency and represents an on-going challenge in the SSE community. The difficulty behind this challenge consists of finding the right balance between storage, security and search efficiency.

Storage overhead

Storage overhead varies depending on the expressiveness of the construction as well as the de-sired level of security. It is very challenging to assert that an STE scheme has the best storage over-head while neglecting the level of security it offers. Reducing storage overhead of the encrypted data structure is also another research dimension of SSE constructions. As an instance, there are many expressive SSE constructions that can require exponential storage for optimal search com-putation, but clearly such solutions are not realistic. In general, for single keyword SSE, a widely acceptable storage complexity is in O(n), which translates to be the total size of the entire data set. If a user has, for example, 10 GByte of files to store, then the encrypted data structure will have a size of 1 GByte, as an instance. The storage overhead of single keyword SSEs has been recently reduced by Cash et al. [37]. To sum up, it is very challenging to find the right balance between computational overhead and storage overhead.

Table of contents :

Abstract
Acknowledgements
Dedication
List of Tables
List of Figures
Abstract
1 R´esum´e
1.1 Contributions
1.1.1 SSE : contributions
1.1.2 ORAM : contributions
2 Introduction
2.1 Research contributions
2.1.1 SSE contributions
2.1.2 ORAM contributions
2.2 Ongoing Work
2.3 Dissertation Outline.
3 Research Challenges and Background
3.1 Research dimensions
3.1.1 Searchable Encryption
3.1.2 Oblivious RAM
3.2 Scheme Definitions
3.2.1 Setting definition
3.2.2 Structured Encryption (STE)
3.2.3 Tree-based Oblivious RAM
3.3 Security Definitions
3.3.1 STE security definition
3.3.2 ORAM security definition
3.4 Cryptographic Primitives
4 Related Works
4.1 Searchable Encryption
4.2 Oblivious RAM
5 Expressiveness in Symmetric Searchable Encryption
5.1 BSSE: Boolean SSE
5.1.1 Contribution Summary
5.1.2 BSSE construction
5.1.3 Security Analysis
5.1.4 BSSE Performance
5.2 3SE: Semantic SSE
5.2.1 Contribution Summary
5.2.2 Stemming Algorithms
5.2.3 Semantic SSE: Construction Overview
5.3 SED: Substring SSE
5.3.1 Contribution Summary
5.3.2 Substring search over Encrypted Data – Pre-construction
5.3.3 Substring Search Over Encrypted Data #1 – SED-1
5.3.4 Substring Search Over Encrypted Data #2 – SED-2
5.3.5 Security analysis for SED-2
5.3.6 SED-2 Generalization
5.3.7 Performance Analysis
6 Oblivious RAM
6.1 Resizable ORAM
6.1.1 Motivation and Findings
6.1.2 Resizable ORAM
6.1.3 Adding
6.1.4 Pruning
6.2 Recursive ORAM
6.2.1 Contribution Summary
6.2.2 Recursive Binary Trees
6.2.3 -ary Trees
6.2.4 Security Analysis
6.2.5 Performance Analysis
6.3 Constant bandwidth ORAM
6.3.1 Contribution Summary
6.3.2 Background: Onion ORAM
6.3.3 Constant Communication ORAM
6.3.4 C-ORAM analysis
6.3.5 Evaluation
7 Conclusion
References .