DBBO

From SWRC

Jump to: navigation, search

DBBO stands for DBpedia Based Oontology.

The diversity and amount of data on the Web are both continuously growing, and there has been a paradigm shift leading from the publishing of isolated data to the publishing of interlinked data through a variety of knowledge sources such as Linked Open Data (LOD). DBpedia dataset currently plays a central role in the LOD cloud, which has been populated using a large amount of collaboratively edited material (i.e., Wikipedia) as a knowledge source. Because of the ever-growing size and enormous scope of Wikipedia's coverage, the DBpedia dataset has been increasingly applied to a wide range of web applications.


The DBpedia dataset contains a community-curated cross-domain ontology to homogenize the description of information in the knowledge base (KB), which is one of the largest multilingual ontologies developed to date. Version 2014 of this ontology covers 685 classes in total, which form a subsumption hierarchy, and includes 2,795 different properties. This ontology has become a de facto reference vocabulary; however, this is limited as a multilingual pivot. Although a large number of instances among different languages are connected to the owl:sameAs link, matching the class level is rare. The rdfs:label properties use language tagging to enhance multilingualism as follows.

<owl:Class rdf:about="http://dbpedia.org/ontology/Actor"> 
       <rdfs:label xml:lang="en">actor</rdfs:label> 
       <rdfs:label xml:lang="fr">acteur</rdfs:label> 
       <rdfs:label xml:lang="ja">俳優</rdfs:label> 
       <rdfs:label xml:lang="ko">영화인</rdfs:label>
       ...


This shows that the class “Actor” has several cross-lingual corresponding terms such as “영화인” in Korean and “Acteur” in French. The number of labeled classes for different languages varies significantly, and there is obviously an absence of cross-lingual labeling for some editions such as Chinese. The DBpedia ontology (DBO) is continuously evolving due to its collaborative (wiki) paradigm and ongoing internationalization. However, it suffers from a scarcity of multilingual labels, due to its derivation that is based on the popular infoboxes in English. This leads to a limitation of other languages' ability to adapt the DBO to local language knowledge resources and makes it difficult to homogenize as a conceptual extension. Thus, identifying the global representative parts of the DBO is important for expanding multilingual ontologized space in LOD.

Generally, the terminological components (henceforth referred to as the TBox) of an existent ontology can be translated and tailored to fit the understanding of other languages to expand multilingual coverage and thus increase knowledge access across languages with existing ontologies. Therefore, a multilingual pivotal ontology must accurately represent the global common concept structure, yet remain reusable in different languages so that connections can be made between local language knowledge resources and ontological KBs when entering an LOD.

We identify globally representative DBO classes for different language editions in this work (called it DBBO), based on the combination of several ranking results that analyze the knowledge graph to measure the popularity of instances from multiple perspectives. Then, a consensus global ranking could be produced via rank aggregation; finally, we constructed a representative subset of DBO that could capture universally popular information that would be useful for improving the multilingual reuse of the ontology itself and would more easily and rapidly expand the ontological domain of the local language knowledge sources. We evaluated our approach by comparing its coverage with respect to the losses caused by the selection process, which had almost the same coverage with no appreciable loss of efficiency for larger sizes when the data were adapted to multilingual purposes.

This work is accepted and will be presented at KÉKI

Dataset

Evaluation data for DBBO is available for download

References

[1] Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked Data - The story so far. International Journal on Semantic Web and Information Systems, 5(3):1–22, 2009.

[2] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, So ̈ren Auer, and Christian Bizer. DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal, 2014.

[3] Eun-Kyung Kim, Matthias Weidl, Key-Sun Choi, and So ̈ren Auer. Towards a ko- rean dbpedia and an approach for complementing the korean wikipedia based on dbpedia. In So ̈ren Auer, Jonathan Gray, Claudia Mu ̈ller-Birn, Rufus Pollock, and Sara Wingate Gray, editors, Proceedings of the 5th Open Knowledge Conference, volume 575, pages 12–21. CEUR-WS.org, 2010.

[4] Dimitris Kontokostays,Charalampos Bratsas,So ̈ren Auer,Sebastian Hellmann, Ioannis Antoniou, and George Metakides. Internationalization of linked data: The case of the greek dbpedia edition. J. Web Sem., 15:51–61, 2012.

[5] Jorge Garcia, Elena Montiel-Ponsoda, Philipp Cimiano, Asuncio ́n Go ́mez-P ́erez, Paul Buitelaar, and John McCrae. Challenges for the multilingual web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 11(0), 2011.

[6] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University, 1999.

[7] Amy N. Langville and Carl D. Meyer. Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006.

[8] J. C. Borda. Memoire sur les elections au scrutin, 1781.