CoreNet

From SWRC

Jump to: navigation, search

Introduction

CoreNet: Core Multilingual Semantic WordNet

CoreNet is a net of words based on their semantics, and could be a useful resource in natural language proessing. Especially, it could act as a important knowledge base for information retrieval and machine translation. (2004 ~ )

CoreNet is a net of words based on their semantics, and could be a useful resource in natural language proessing. Especially, it could act as a important knowledge base for information retrieval and machine translation.

Details

Statistics

Word-to-Concept System

Noun Verb Adjective All
Words Senses Words Senses Words Senses Words Senses
Korean 21,401 51,607 1,758 5,290 813 2,801 23,938 58,985
Chinese 34,041 38,368 288 765 80 119 34,409 39,252

Predicate Case Frame

Verb Adjective All
Words Senses Words Senses Words Senses
Korean 193 765 780 1,144 973 1,909
Chinese 288 80 368

The Structure of CoreNet

CoreNet has been constructed by following principles:

  • Word sense mapping to concept
The major purpose of CoreNet is to resolve semantic ambiguities by two functionalities. Every sense of words in the dictionary is mapped to at least one concept. For example: each sense of word "school" is mapped into three concepts under place: organization and building. The other functionality is to give the syntactic-semantic structure for predicates which is based on the predicate-argument structure.
  • Corpus-based
The set of vocabulary and their senses are extracted from KAIST corpus . For example, all argument structure of Korean Verb "gada" is extracted from the corpus as follows:
GOING([HORSE/MAMMAL, BUS/VEHICLE]=SUBJ)
HORSE and BUS are extracted terms from corpus and MAMMAL and VEHICLE are concept names mapped from words horse and bus. This causes the more specified sense categorization than those of dictionaries
  • Multi-lingualism
All of concepts are aligned among three languages: Japanese, Korean and Chinese. All of words in noun and predicate of three languages are categorized into one common concept hierarchy. Verbs of three languages are also linked each other based on senses and concepts.
  • Mono-concept system for multi-languages
In general, concept systems and word nets are constructed for words in noun. However, CoreNet shares one concept systems for nouns, verbs, and adjectives. Furthemore, one concept systems have been used and updated to keep three languages share one.



Corenet structure.jpg


The structure of CoreNet word-concept system

Recent Accomplishments

CoreNet-WordNet Mapping

CoreNet Java API

Current Progress

Official Websites

CoreNet official website

Old CoreNet website archive

CoreNet Java API/browser open source project

CoreNet Web Search