COMPARING STRINGS IN ONTOLOGY MATCHING

Get Complete Project Material File(s) Now! »

Implementation

OWL

OWL Web Ontology Language is, according to the requirements, the language which is going to be used to describe the input ontologies. This language does not only deal with the representation of information for humans, but also with processing the content of the information. It is a Semantic Web Standard for sharing and reuse of data on the Web. The bases for this language are taken from RDF, and after, additional vocabulary for the formal semantics is added. We are going to have a short overview about RDF first. According to the RDF Schema (RDF Schema), “The Resource Description Framework (RDF) is a general-purpose language for representing information in the Web”. RDF is a standard to describe information about resources, which are typically URIs. This information is classified into resources, statements (or properties) and individuals. Each of the statements is represented by an arc, and each of the arcs has three parts: subject (resource from where the arc leaves), predicate (property that labels the arc), and object (resource pointed to by the arc). A set of statements create a RDF model. Based on this implementation of a RDF model, it is easy to understand OWL models. OWL has three sublanguages: OWL Lite, OWL DL, and OWL Full, in increasing order of expressiveness. OWL Full can be viewed as an extension of RDF and every OWL (Lite, DL, and Full) document is an RDF document. In general we can say that with OWL, everything that can be expressed with RDF can be expressed and also more complex concepts about the classes of the ontologies. OWL is considered to be the most expressive language for ontology description.

Editing the ontologies: Protégé OWL

In order to implement the ontologies that are going to be compared by the polygon method, we use the Protégé OWL editor and knowledge-base framework. This free open source editor provides us with the accurate tools to create ontologies and modify their main characteristics. A graphical interface is provided as well as the means to visualize the OWL code. Through the entire thesis, different examples and scenarios of comparisons between ontologies will be used to illustrate the reader with the steps followed to arrive to the final result. The main example used deals with some hypothetical aspects of an accommodation. The different ontologies dealing with accommodation have some concepts in common and some different ones. At this moment we are going to introduce the ontologies of the example as they were edited with Protégé OWL. It should be pointed out, that the first ontology introduced to our polygon method is going to be considered as the standard ontology, and the second one is going to be compared to it. This means that the similarity is going to be described as “how similar the second ontology is to the standard one”. The standard ontology receives the name of Accommodation1.

Access the ontologies from Java: Jena

As the general requirements specify, the code of the method has to be implemented in java. The requirements also specify that the input ontologies have to be written in OWL. Consequently, OWL code has to be accessible from the java editor. That problem is aimed by Jena, a Semantic Web Framework for Java. This open source framework used for building Semantic Web applications provides a programmatic environment for RDF, RDFS and OWL. Jena was at first developed to be a Java API for RDF, but later implementations include other functionalities such as the Jena2 ontology API. This API provides an interface for the Semantic Web application developers. That makes it an ideal programming toolkit when we want to process the ontologies created in Protégé-OWL. When we deal with Jena, the class Model is the one used to access the statements in a collection of RDF data. If, in stead of accessing RDF data, OWL data has to be processed, the class OntModel is the one used. This class is an extension of the previous Model class and makes accessible the main features of an ontology: classes, properties and individuals. The methods included in this class provide the needed functionalities to access these features in different ways. In order to use this class the first step is the creation of an ontology model using the Jena _ModelFactory. The polygon method is going to deal with two different ontologies in order to compare them. Therefore, there is a need to create two ontology models, one for each of them.
OntModel m1 = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);
OntModel m2 = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);
From now on, the ontologies will be available by using Java methods through their respective models. At this point of the implementation we are going to take an overview of the general method we are developing. The polygon method is based on the comparison of the features from both ontologies, the standard one and the one which is going to be compared. As it has been already pointed out, these features can be grouped into classes, properties and individuals. Hence, classes in ontology1 have to be compared to classes in ontology2 and the same procedure has to be followed for the properties and the individuals. There is no need to compare classes in ontology1 with properties in ontology2 and vice versa. This comparison does not add any relevant data to our study, because that would not mean that the ontologies are more similar. Therefore, the comparisons between the features are going to be restricted to the same kind of feature in both ontologies. Moreover, subclasses will also be considered and grouped regarding their super class. Two subclasses will only be compared to each other if some similarity is found between super classes. Once the models are created the information from each of the ontologies can be easily retrieved. We use the following methods to retrieve the existing classes, properties and individuals.
Iterator i = m1.listClasses(); i.hasNext();
OntClass c = (OntClass) i.next();
Iterator s = c.listSubClasses(true); s.hasNext();
Iterator f = m1.listObjectProperties(); f.hasNext();
Iterator s = c.listInstances(); s.hasNext();
By means of these methods the names of the features are stored, and after compared with the ones belonging to the other ontology taking into account the fact that we are only interested in comparing the corresponding groups of features

Comparing Strings in Ontology matching

After the information retrieval, the names of the elements corresponding to each of the ontologies are stored and converted to Java Strings. The next step deals with the comparison of these Strings. According to those results of the comparison, the polygons can be represented, and finally, based on those representations, the final matching result can be obtained. Comparison of strings is a much issued topic in the field of ontology research. The reason is that many different ways of comparing strings can be used, and efficiency and accurateness of the result depends on the situation and the parties involved. Nevertheless, in this study, we are going to deal with the comparison of strings oriented to Ontology matching. The examples proposed, and the cases considered will all be dealing with the names of the classes, subclasses, properties and individuals in the involved ontologies. Moreover, there is also a great variety of methods that can be used for the comparison of ontologies and its elements. Consequently, and before taking a decision about which one to use, some studies have to be conducted in order to determine which of the existing methods has better results in the field of ontology matching.

Previous research

Previous studies have already conducted research about the effectiveness of the string comparing methods. According to Cohen (Cohen et al., 2003) we are going to have a classification of the methods belonging to the class SecondString. SecondString is an open-source package of string matching methods based on the java language. These methods follow a big range of approaches and have been designed according to different criteria and perspectives, such as statistics, artificial intelligence, information retrieval, and databases. Previous classifications divide these methods into three groups according to the methodology they use to establish the correspondence between strings: Editdistance metrics, Token-based distance metrics and hybrid methods. This method considers the two strings that are going to be compared. One of them is taken as the input and the other one as the output. Transformations are done between both Strings for them to be the same. The distance between both strings can be seen as the shortest sequence of edit commands that transform the input into the output. These transforming commands are copy, delete, substitute and insert. Depending on how the cost of the editing operations is considered, two edit distance methods are regarded by the SecondString class.

1. Introduction
1.1 BACKGROUND
1.2 PURPOSE/OBJECTIVES
1.3 LIMITATIONS
1.4 THESIS OUTLINE
2. Theoretical Background
2.1 ONTOLOGIES
2.2 MATCHING ONTOLOGIES
3. Implementation
3.1 OWL
3.2 EDITING THE ONTOLOGIES: PROTÉGÉ OWL
3.3 ACCESS THE ONTOLOGIES FROM JAVA: JENA
3.4 COMPARING STRINGS IN ONTOLOGY MATCHING
3.5 FROM THE COMPARISON RESULTS TO THE POLYGONS: JMATLINK
3.6 REPRESENTATION OF RESULTS: MATLAB
4. Conclusion and Discussion
4.1 GLUE
4.2 ANCHOR-PROMPT
4.3 S-MATCH
4.4 AXIOM-BASED ONTOLOGY MATCHING
4.5 FCA-MERGE
4.6 MAFRA
4.7 FUTURE WORK
GET THE COMPLETE PROJECT