Task 2: Real-World Challenge: Aligning Web Directories

Focus of this Alignment Task

The focus of this task is to evaluate performance of existing alignment tools in real world taxonomy integration scenario. Our aim is to show whether ontology alignment tools can effectively be applied to integration of "shallow ontologies".

The evaluation dataset was extracted from Google, Yahoo and Looksmart web directories. The specific characteristics of the dataset are:

More than 4500 of node matching tasks, where each node matching task is composed from the paths to root of the nodes in the web directories.
Expert mappings for all the matching tasks.
Simple relationships. Basically web directories contain only one type of relationship so called "classification relation".
Vague terminology and modeling principles: The matching tasks incorporate the typical "real world" modeling and terminological errors.

This implies that the task will be challenging from a technological point of view, but there is guidance for tuning matching approach that needs to be taken into account. The papers describing the datatset construction methodology are TaxME and TaxME 2.

The Data

The node matching tasks are represented by pairs of OWL ontologies, where classification relation is modeled as OWL subClassOf construct. Therefore all OWL ontologies are taxonomies (i.e., they contain only classes (without Object and Data properties) connected with subclass relation. The dataset can be downloaded from here. The matching tasks are numbered from 1 to 4640. Thus, for example, the first matching task is to find a mapping between 1/source.owl and 1/target.owl and to output it to a file called 1/yourname.rdf.

The alternative representation of the same data is two large scale OWL taxonomies. They can be downloaded from here. Two taxomonies incorporating 10% of the node matching tasks can be downloaded from here.

Task and Submission of Results

The task is to find an alignment between classes in the ontologies. In order to find the alignment any information in the two models can be used. In addition, it is allowed to use background knowledge, that has not specifically been created for the alignment tasks (ie no hand-made mappings between parts of the ontologies). Admissible background knowledge are "Oracles" such as WordNet, Cyc, UMLS, etc. Further, results must not be tuned manually by for instance removing obviously wrong mappings. Participants are encouraged to submit the results for all representations of the dataset (i.e., both for node matching tasks and large scale taxonomies).

Results of the alignment should be represented using the common format for alignments and send to yatskevi@dit.unitn.it for evaluation by September 1st. Quantitative indicators of matching quality (Precision and Recall) will be returned to participants till September 15th.