An evolution of AppSI
The University of Lille uses the Lightweight Directory Access Protocol (LDAP), a standard application protocol for accessing and maintaining distributed directory information services. The administration decided to evolve the LDAP schema to enable users to have multiple identifiers. The idea is that each person has a personal account and zero or many accounts that represent a function. These “function accounts” can be assigned to let a person use a specific software for example.
As AppSI stores the unique LDAP identifier of people working at the labora-tory, the database is directly impacted by this change. AppSI database also has its own way to identify a person in the database uniquely implemented as a column constrained by a primary key constraint.
Before digging into technical details of AppSI evolution, let us set the concepts one need to know to understand the evolution:
• person: When we use the term “person”, we mean a human for who the data has been encoded in the information system.
• person table: The table in the AppSI relational database that contains data of people working at the laboratory.
• id column: A column of the person table in AppSI database that is con-strained by a primary key constraint. The values held by this column are used to uniquely identify data of a person stored in the person table.
• uid column: A column of the person table in AppSI database that stores the LDAP identifier of each person working at the laboratory. The uid column is used as the login of the users for web applications using AppSI database. Thus, the pair of values (id, uid) are unique in AppSI database.
• uid attribute: An attribute of the LDAP schema allowing one to refer to a person’s data uniquely before the evolution.
Additionally to the LDAP schema evolution to enable users to have multiple identifiers, the uid attribute was renamed as login. This evolution necessitates to similarly rename the uid column of the person table as login. This column contains the primary identifier of a person. Such an induced evolution aims to ease the understanding and the maintenance of the database and its client applications.
To support multiple LDAP identifiers in AppSI, a new table named account_ alias is created. It gathers all the secondary identifiers of a person in the login_ alias column. The id_person column is a foreign key to the id column, pri-mary key of the person table.
Before this evolution, it was possible to find the id of a person from their LDAP identifier. After the evolution, since a person may have several LDAP identifiers, it is necessary to use a stored procedure to find the id of a person from one of their LDAP identifier (the main or a secondary one). For this purpose, a view was created to ease the correspondence between one of the LDAP identifiers of a person and his/her primary key. Figure 2.1 provides a sketch of this evolution.
Of course, once this modification is integrated, entities of the database using the uid column of the person table have to be adapted in order either to use the new login column, or to use the new account_alias table with its login_alias column and a join with the person table.
Although this evolution is rather simple to understand, it is not trivial to imple-ment. To plan this evolution, the database architect established a roadmap of what he needed to do. During the whole evolution, he uses this roadmap to keep track of what was done and what remained to do. The roadmap was also updated when the database architect discovered he had forgotten something or when some step turned out to be more complex than originally planned.
Conditions of the Case Study
For practical reasons, we could not be present when the database architect per-formed the evolution and it was not possible to postpone it. The architect agreed to record his screen during the whole task. The result consists of three recordings of about 1 hour each (total video time is 3.5 hours). Unfortunately, for confidentiality reasons (personal data of university employees appearing at different moments of the videos), the videos can not be made publicly available.
We analyze recordings of the architect screen to understand problems he faces during an evolution. Analyzing video data is not easy because of their unstructured nature. Furthermore, we can observe what the architect does but we need to infer why he does it. This information is not encoded in the video.
In our case, we want to understand:
• Which tools the architect use to perform the evolution?
• How tools are used during the evolution?
• How the architect achieves the evolution?
The first step to be able to understand data in the video is to extract some struc-ture from it. We choose to split the video into entries to have a more structured version of data encoded in the recordings. As delimiter for these entries, we chose to use timestamps for which we observe changes in the screen display. For exam-ple when the database architect switched from one tool to another (e.g. text editor, shell terminal, or a database development tool), or from one tab of a tool to another tab (e.g. multiple files in a text editor). This delimiter is quite objective as we do not need to inject knowledge external to the video to split it. Because of that, our experiment is reproducible on another recording of another evolution for a different database conducted by a different architect. We segmented the video into a list of “entries” containing a timestamp and a de-scription of what happens on the screen between the timestamp of the entry and the next timestamp. The full transcript is available at https://github.com/juliendelplanque/ icsme2018data. From that list of entries, we perform a qualitative and a quantitative analysis to understand the evolution. Figure 2.2 provides a visual representation of our methodology to analyze the video.
Starting from the list of entries we transcribed, we perform a qualitative analy-sis of the evolution. This analysis is twofold. First, we decompose the evolution into “actions” which are groups of entries that are related. Then, we group these actions into activities which abstract the semantic of the architect’s actions at a coarser grain. Activities allow us to identify the process followed by the architect to implement the evolution and to analyze it.
Table of contents :
1.1 Software Engineering
1.2 Relational databases
1.4 Software Engineering for Relational Databases
1.6 Structure of the Thesis
1.7 List of Publications
2.1 Introducing AppSI database
2.3 Conditions of the Case Study
2.4 Qualitative Analysis
2.5 Quantitative Analysis
3 State of the Art
3.1 Database Design
3.2 Relational Database Reverse-Engineering
3.3 Relational Database Impact Analysis
4 A Behavior-Aware Meta-Model for Relational Databases
4.2 The Meta-Model
4.3 Meta-model Instantiation
4.4 Case studies
5 Identifying Quality Issues in Relational Databases
5.3 Case Studies
6 Recommendations for Evolving Relational Databases
6.1 Setting the context
6.2 Description of the Approach
7.3 Future Work
A Operators Catalog