Get Complete Project Material File(s) Now! »

## Formal Concept Lattices Generation Algorithms

A naive way to generate concept lattices is to follow the definition of formal con-cepts. The first step is to compute either the intents or the extents, and use it to build the set of formal concepts. To achieve this, we can construct the set of extents as E = fY00 for all Y 2 2Og and the set of concepts fhe0, ei for all e 2 Eg. The set of intents would be I = fX00 for all X 2 2 Ag and the set of concepts fhi, i0i for all i 2 Ig. The number of possible extents and intents are respectively j2Oj = 2jOj and j2Aj = 2 jAj. The size of the set of concepts is thus, in the worst case, 2min(jOj,jAj). Therefore, as-suming the time complexity of 0 and 00 is in O(1), computing E first is in O(2jOj) and computing I first is in O(2jAj). In either case, there can not be a polynomial algorithm to build the set of concepts, because the size of the output (the set of for-mal concepts) is exponential to the size of the input (the FC). The second step of our naive algorithm is to compute the order by comparing each pair of extents with .

There exist a wide variety of algorithms to generate concept lattices or at least the set of concepts. For an overview of those algorithms see, e.g., [33].

**Major Neural Networks Architectures**

A convolutional neural network (CNN) is a NN that apply convolutions on an input. For an overview of CNN see, e.g., [47]. Following the principle of convolutions, for a given input element, a CNN produces an output based on a learned kernel and the neighborhood of the input. It is possible to compute the size of the output and the size of the neighborhood taken into account by the CNN. A CNN can handle inputs of varying sizes, and the output size is proportional to the ones of the input. Additionally, CNNs maintain the relations between neighboring elements and is invariant to translation. In other words, a CNN will produce the same output for a given neighborhood at different locations in the input. They are widely used in image processing due to this property allowing it to learn filters to detect features independently of the position in the input. A basic CNN is shown in Figure 1.4.

The famous long short-term memory recurrent NN (LSTM) [17] and gated recurrent unit NN (GRU) [7] are architectures of a family called recurrent NNs (RNNs). This family of models handles sequences of inputs of variable length, and are designed to learn dependencies within the input sequence. By transmitting and updating a vector called hidden state from one step of the sequence to the following one, each output depends on the current input as well as the previous ones. A variant of step t is xt, the hidden state ht, and the output ot. From Wikipedia.

the LSTM, called bi-directional LSTM (BLSTM) [17], uses a second LSTM to process the input sequence in the reverse direction. It can thus handle both forward and backward dependencies in the sequence. A major flaw of all RNNs is their inability to model long-range dependencies, with various variants like LSTM and GRU trying to solve this memory issue to some extent. A block diagram representing how RNN handle sequences is shown in Figure 1.5a, and the inner workings of a a GRU and an LSTM are shown respectively in Figure 1.5b, and Figure 1.5c.

Attention mechanisms [1], which have been developed to handle this issue, con-sider a full sequence and attribute attention weights to each element. The attention weights are usually computed with a dot product between a query and the elements in the sequence, though there are numerous variants of attention. The attention weights are usually used to weight the sequence directly or to compute a weighted average of the sequence. The summaries produced by the attention mechanisms are called context. Attention is cheap to compute (usually), and the analysis of the atten-tion weights allows to determine the implication of each element in the final result. Due to this property, attention can be used to make an RNN’s result interpretable. Attention is powerful enough to be used alone, like the transformer network [44] and more recently the reformer network [30]. An example of attention for neural machine translation is shown in Figure 1.6.

We call unordered composition functions operations that do not take into account the order of the input elements and can accommodate any number of input elements. Typical examples for vectors are the element-wise min, max, and average (also re-spectively called min-, max-, and average-pooling). Unordered composition-based models combining an unordered composition of the inputs with an MLP are called deep averaging networks (DANs). They have proven their effectiveness in a variety of tasks, for instance, sentence embedding [23], sentiment classification [6], and feature classification [15]. On the one hand, this family of architectures allows for varying sizes of input to be processed at a relatively low computational cost, by opposition to recurrent models like LSTM. On the other hand, the information related to the order of the input elements is lost. The DAN for sentence embedding from [23] is shown in Figure 1.7.

To summarize, CNNs, RNNs, attention mechanisms, and DANs are various ar-chitectures able to handle inputs of varying sizes, each with their advantages and drawbacks.

**Node-Centered Approaches**

The first group of graph NNs focuses on representing nodes.This kind of NN can be trained on a specific graph to provide representations for this graph’s nodes [38, 39]. It is used to generate local representations in particular for large and complex graphs, e.g., generating entity embeddings for knowledge graphs or node embed-dings for community graphs. Graphs NNs representing nodes can also be used to represent whole graphs, by combining the representation of the nodes. However, it is also possible to learn a more general model to represent nodes by training on multiple graphs [28].

A first method, proposed in DeepWalk [39], is to mimic natural language pro-cessing (NLP) methods for learning sentences. The neighborhood of a node is lin-earized, by taking a sequence of nodes obtained by randomly walking through the graph. The sequences of nodes are then processed as if they were sentences, with SkipGram [38] in the case of DeepWalk.

A second approach is message passing, in which a node representation is built iteratively. The nodes’ initial representation can be the features of the nodes (rep-resented as vectors), or randomly generated values if there are no given features for the nodes. GraphSAGE [20] and Graphite [19] are examples of message passing neural algorithms, with [19] including a variational version of the Graphite.

A third framework is graph convolution, an adaptation of CNNs from grid-like structures like images to the more irregular structure of graphs. In practice, the ker-nel is applied to the neighborhood of a node as defined in graphs, instead of the spa-tial neighborhood of a cell in a matrix. Graph Convolutional Network (GCN) [28] is the basic architecture implementing this approach, and Hypergraph Neural Net-work [11] is an adaptation of GCNs to hypergraphs. Graph Attention Network (GAT) [45], Masked GCN [48], SPAGAN [49], Hierarchical GAN [25] are improve-ments of GCNs with attention mechanisms and other similar techniques. Those improvements allow the NN to select neighboring nodes based on the current node and the content of the neighbors.

### Whole Graph Approaches

Graph NNs representing whole graphs are used to represent graphs from a specific family (or class) of graphs. In other words, they are trained to handle graphs sharing some specific structural properties, e.g., community graphs. This kind of graph NNs are used to perform tasks on a whole graph at once, e.g., classification of a graph and generating graphs of a specific family.

A first method is spectral convolution, introduced in [5] and with a more recent im-plementation called ChebNet [43]. Spectral convolution is based on spectral graph theory, which represents a graph’s adjacency using a spectrum. The spectrum of a graph provides a complete view of the graph’s adjacency. A detailed introduction to spectral graph theory can be found, e.g., in [8]. Spectral convolution is an application of CNNs on this spectrum of the graph.

A second framework is hierarchical graph pooling, presented in [50]. This method is an iterative simplification of the graph by grouping strongly related nodes using clustering and pooling methods. Once a single group remains, the graph is fully simplified.

A third approach is to use a VAE directly on the adjacency matrix of a graph, as implemented in GraphVAE [29] and Constrained GraphVAE [36]. By flattening the adjacency matrix to a vector, it becomes possible to apply a VAE directly on the adjacency. This method performs well at generating graphs of specific families, despite being less involved than the other methods presented in this section. Indeed, it doesn’t rely on specific properties of graphs to establish its architecture.

A fourth method is to use recurrent models on a linearization of the graph, as done in GraphRNN [51]. In this approach, the graph is first transformed into a se-quence of nodes, and the nodes are then processed iteratively. GraphRNN processes each node of the sequence by computing its edges with the previous nodes in the se-quence. They rely on a breadth-first search to linearize the graph to make connected nodes close in the sequence.

**Advantages of GraphRNN for Lattice Generation**

As a reminder, our goal is to learn a single neural model to generate lattices in gen-eral. We want the model to produce a whole lattice for each FC presented. From those constraints, we focus on graph NNs able to handle whole graphs at once. The architecture should also be able to generate graphs, and only GraphVAE, Con-strained GraphVAE, and GraphRNN fit our criterion.

A drawback of GraphVAE and Constrained GraphVAE is that they are limited in the size of the graph they can handle. Indeed, VAE has a fixed-sized input and output, so the two approaches are unable to handle graphs larger than the graphs they are trained on. Conversely, the recurrent nature of GraphRNN makes it very flexible with regard to the size of the generated graphs. A final argument in favor of GraphRNN is that it is easily extendable with node features. In our case, node features are the intent and extent of the concepts, and they are very important for our goal because they define the concept. In practice, when generating a new node, we would add a network to generate the intent or the extent to the link generation network of GraphRNN, as we explain in Subsection 2.3.2.

GraphRNN shows state-of-the-art performance when generating graphs of vari-ous families, and can handle any size of graphs. However, it is not designed to gen-erate specific graphs on request, as we would like to do to generate lattices based on the FCs. Conversely, constrained VAEs (such as Constrained GraphVAE [36]) are generative models that can be constrained. Using such a model would allow us to generate lattices under the constraint of the FCs. To compensate for the limita-tions of GraphRNN with regards to our goal, we decided to adapt GraphRNN into such a constrained VAE. As described in Section 2.3, we first adapt GraphRNN into an auto-encoder, which is later adapted into a VAE and then further modified as a constrained VAE.

#### Transforming Lattices into Matrices

When training a NN on non purely numerical data, lattices in our case, it is necessary to first represent the data in a numerical form. The choice of the data, as well as the quality of the representation, noticeably affect the quality of the resulting model. Typically, a bad representation makes learning hard if not impossible, while a well-thought representation of the data is already half the job done. Also, the model will perform better and be better at generalizing when trained on more and more varied data.

We decide to learn on randomly generated FCs and the corresponding lattices. The main reason is that gathering and preparing large amounts of usable real-world FC is costly, while generating random data is cheap. We can afford to use randomly generated FCs because we want to learn the general process of FCA, which should behave the same no matter the source of the data.

In this section, we describe the lattice data we used for our experiments and how we encode it as matrices. First, we explain our generation process in Subsec-tion 2.2.1. We then explain how we adapted the breadth-first search ordering method of GraphRNN in Subsection 2.2.2. We then discuss the encoding of the adjacency in Subsection 2.2.3 and the intents and extents in Subsection 2.2.4.

**Table of contents :**

**1 Project Frame **

1.1 Introduction

1.2 Work Environment

1.2.1 LORIA, Orpailleur and Multispeech

1.2.2 Inria Project Lab HyAIAI

1.2.3 Tools, Repository and Testbed

1.3 Basic Background in Formal Concept Analysis

1.3.1 Formal Contexts and Formal Concepts

1.3.2 Formal Concept Lattices

1.3.3 Formal Concept Lattices Generation Algorithms

1.4 Basic Background in Deep Learning

1.4.1 Neural Networks

1.4.2 Deep Learning Algorithms

1.4.3 Usual Loss Functions

1.4.4 Binary Encoding and Softmax

1.4.5 Major Neural Networks Architectures

1.4.6 Auto-Encoders and Embeddings

1.4.7 Metric Learning

1.5 Problem Statement

**2 Initial Approach: Graph Generation **

2.1 State of the Art of Graph Modeling

2.1.1 Node-Centered Approaches

2.1.2 Whole Graph Approaches

2.1.3 Advantages of GraphRNN for Lattice Generation

2.2 Transforming Lattices into Matrices

2.2.1 Data Generation and Dataset

2.2.2 From Breath-First Search to Level Ordering

2.2.3 Encoding the Lattice Adjacency

2.2.4 Encoding the Concepts

2.3 Using GraphRNN for Lattices Modeling

2.3.1 GraphRNN

2.3.2 GraphRNN Constrained VAE

2.4 Preliminary Experiments and Change of Approach

2.4.1 Reconstruction PerformanceWithout Features

2.4.2 Formal Concept Representation and Change of Approach

**3 Bag of Attributes: Embeddings for Formal Contexts **

3.1 State of the Art: FCA2VEC

3.1.1 Object2vec and Attribute2vec

3.1.2 Closure2vec

3.1.3 Evaluation

3.2 Bag of Attributes Model

3.2.1 Architecture

3.2.2 Training Objective

3.3 Training

3.3.1 Training Process

3.3.2 Training Dataset

3.3.3 Data Augmentation

3.3.4 IssuesWith KL Divergence

3.4 Experiments

3.4.1 Reconstruction Performance

3.4.2 Metric Learning Performance

3.4.3 Experiments on Real-World Datasets

**4 Second Approach: Intents Generation **

4.1 Pilot Experiments

4.1.1 Variational Auto-Encoder Architecture

4.1.2 Convolutional Neural Network Architecture

4.1.3 Recurrent Neural Network Architecture and Attention Mechanisms

4.1.4 Conclusions on the Tested Architectures

4.2 Concept Number Upper Bound Prediction

4.3 Intents, Cover and Order Relation Generation

4.4 Training and First Experiments

**5 Conclusion and Discussion **

**Bibliography**