Augmenting WordNet with Polarity Information on Adjectives

Alekh Agarwal, Pushpak Bhattacharyya (I.I.T., Bombay)

Polarity of a word refers to its strength in a classification, typically in a good vs bad sense, for example in movie reviews. This paper describes a technique to effectively compute the polarity information for Adjectives . Carrying on from this, we propose to introduce a new kind of link in WordNet and associate a polarity score with each Adjective in the WordNet database. We show the inter-dependence of subjectivity and polarity of a word. We demonstrate the need for incorporating such information in WordNet , by showing its use in the classification of sentences as subjective and objective .

Verb Similarity on the Taxonomy of WordNet

Dongqiang Yang, David M. W. Powers (Flinders University of South Australia)

In this paper, we introduce two kinds of word similarity algorithms, SHE and RHE, to investigate the capability of WordNet in measuring verb similarity. In the absence of a standard verb set we have proposed two new verb similarity evaluation data sets.

RussNet as a Semantic Component of the Text Analyser for Russian

Irina V. Azarova (St-Petersburg State University, Russia), Vadim Ivanov, Ekaterina A. Ovchinnikova (IdeoGraph Company, Russia), Anna A. Sinopalnikova (Brno University of Technology, Czech Republic)

In this paper we present a text analysis system developed on the basis of the AGFL grammar and RussNet -- a wordnet-like lexicon for the Russian language. We describe a basic architecture of the system, in particular the characteristics of its semantic components -- lexico-semantic, morpho-semantic, and syntactic-semantic ones. Principally, the effectiveness of the system benefits from the fact that its semantic modules are extended with syntactic-semantic descriptions -- that of valency frames and predicate proposition formalism. The output structures may be used for various NLP tasks including text mining, fact extraction etc.

WordNet as a Base Lexicon Model for the Computation of Verbal Predicates

Raquel Amaro (Center of Linguistics of the University of Lisbon)

Assuming that the lexicon is a complex and dynamic knowledge system, the lexicon model and the information that is stated in it become crucial for the computation of meaning. In this paper we discuss how the WordNet model can support a decompositional approach to troponymy, the enrichment of the lexical entries with Qualia information, and the establishment of a lexical inheritance device, without much additional work. More specifically, we intend to show how the choice of the model and its further enrichment is motivated by the reflex of the semantic content on the syntactic behaviour of the lexical items, namely through co-occurrence restrictions, and address issues on argument structure, Aktionsart shifts, co-troponym's incompatibilities and shadow arguments' realization. We present a level of inheritance structure in the semantic representation of the lexical items and propose a semantic representation for prepositional arguments, using the Generative Lexicon notation. We also address the ability of the WN model to be used as a semantic type hierarchy.

Wordnet.Br: An Exercise of Human Language Technology Research

Bento Carlos Dias-da-Silva (Universidade Estadual Paulista, Brazil)

This paper reports the ongoing project (since 2002) of developing a wordnet for Brazilian Portuguese (Wordnet.Br) from scratch. In particular, it describes the process of constructing the Wordnet.Br core database, which has 44,000 words organized in 18,500 synsets Accordingly, it briefly sketches the project overall methodology, its lexical resourses, the synset compilation process, and the Wordnet.Br editor, a GUI (graphical user interface) which aids the linguist in the compilation and maintenance of the Wordnet.Br. It concludes with the planned further work.

Gazetteer Linkage to WordNet

Beth M. Sundheim (SPAWAR Systems Center, USA), Scott Mardis, John Burger (The MITRE Corporation, USA)

One of the new WordNet features to be found in version 2.1 is the instance relation, which replaces the hypernym relation for noun synsets that denote instances rather than types (Miller and Hristea, forthcoming). The creation of this distinction serendipitously coincided with project work at SPAWAR Systems Center (SSC) and the MITRE Corporation to produce a tailored gazetteer database of place names for use in research on question answering by participants in a U.S. government-sponsored research program (Irie and Sundheim 2004). The millions of place names contained in this database, called the Integrated Gazetteer Database (IGDB) (Mardis and Burger), are drawn from publicly available sources provided by the National Geospatial-Intelligence Agency, the U.S. Geological Survey, the CIA World Factbook, and the Tipster Text research program. The IGDB project includes a task that is being carried out in collaboration with Princeton University to incorporate the instance synsets that define places into the database as an additional source of gazetteer information.

WordNet as a Geographical Information Resource

Davide Buscaldi, Paolo Rosso, Emilio Sanchis Arnal (Universidad Polit�cnica de Valencia, Spain)

Geographical entities often appears in very different forms in text collections, such as when a foreign name is used instead of the English one, or when the citation of some region or place omits the name of a larger geographical entity containing them. This is a known problem in the field of Information Retrieval. The use of an ontology like WordNet can help in addressing this issue. In this paper we propose an automatic method to expand the geographical terms in queries by using the WordNet ontology and another method that expands the terms during the indexing phase. The proposed methods exploits the synonymy, meronymy and holonymy relationships provided by WordNet, together with some information extracted from the gloss.

DanNet -- a Wordnet Project for Danish

Bolette Sandford Pedersen, Sanni Nimb (University of Copenhagen, Denmark), J{\o}rg Asmussen, Nicolai Hartvig S{\o}rensen, Lars Trap-Jensen, Henrik Lorentzen (Det Danske Sprog- og Litteraturselskab Copenhagen, Denmark)

This paper describes a recently initiated wordnet project for Danish called DanNet. The project is a collaborate project between a university institution and a literary and linguistic society.

Hindi Verb Knowledge Base and Noun Incorporation in Hindi

Debasri Chakrabarti, Vaijayanthi Sarma, Pushpak Bhattacharyya (Indian Institute of Technology, Bombay)

The work reported in this paper deals with the Hindi verbs. This paper can be divided into two parts. The first part is a description of a lexicon where the verbs are arranged hierarchically according to their super-ordinate terms. In this hierarchy, verbs are first listed according to their specific senses. This lexicon is named as Hindi Verb Knowledge Base (HVKB). HVKB uses constructs and Knowledge Base of Universal Networking language (UNL), an interlingua, CIIL (Central Institute of Indian Languages) Corpora and partially Hindi Wordnet (henceforth HWN). In the second part of the paper we have presented the structural description of the Hindi verbs. A type of complex verb, namely, Noun+Verb combinations are studied in detail. The motivation behind this study was to look for a principled approach in storing the Noun+Verb combinations in HWN. Different syntactic tests are applied to identify such verbs. Various syntactic and semantic properties of this group emerged through these tests. These properties help identifying this group as a process of lexical compound. These lexical compounds are then stored in the hierarchy. Finally, the paper shows how HVKB will prove beneficial for HWN.

Improving the Basque WordNet by Corpus Annotation

Eneko Agirre, Izaskun Aldezabal, Jone Etxeberria, Eli Izagirre, Karmele Mendizabal, Eli Pociello, Mikel Quintian (University of the Basque Country, Spain)

This paper describes the methodology adopted to jointly develop the Basque WordNet and a hand annotated corpora (the Basque Semcor). This joint development allows for better motivated sense distinctions, and a tighter coupling between both resources. The methodology involves edition, tagging and refereeing tasks. We are currently half way though the nominal part of the 300.000 word corpus (roughly equivalent to a 500.000 word corpus for English).

Lexicalization and Multiword Expressions in the Basque WordNet

Eneko Agirre, Izaskun Aldezabal, Eli Pociello (University of the Basque Country, Spain)

In this paper we propose a solution for the representation of a wide range of multiword expressions (Note that we use multiword expression as a general term to denominate those constructions, either lexicalized or not, containing more than one word ( word defined as "any string of characters between two blanks" (Fontenelle et al., 94).)) (lexicalized or not) in the Basque WordNet. We first argue in favor of including non-lexicalized multiword expressions, and propose very simple criteria based on existing dictionaries to mark those that are lexicalized from those that are not. We then motivate and propose a representation based in EuroWordNet relations to represent the inner structure of them. This rich representation will allow for further populating the MEANING Multilingual Central Repository with additional semantic relations.

Towards Building a WordNet for Persian Adjectives

Ali Famian (Tarbiyat Modares University, Iran), Daruosh Aghajaney (Jahad Higher Education Institute, Iran)

This article attempts to report on a project for building a WordNet for Persian adjectives. Three monolingual Persian dictionaries, as well as a Farsi linguistic corpus are employed here to extract required entries. This WordNet provides the semantic classes of adjectives, their synonyms, antonyms and frequency. The database management system employed here is MS SQL Server 2000. The system is implemented using Microsoft's .NET Framework in Visual C{\#} language and is developed in both desktop and web-based platforms. It allows the end-user to export the results of a specific query, both in Persian and Latinized alphabets, into a CSV or XML file for further reference.

Building the Slovene Wordnet: First Steps, First Problems

Toma� Erjavec (Jo�ef Stefan Institute, Slovenia), Darja Fi�er (University of Ljubljana, Slovenia)

We report on the prototype Slovene wordnet which currently contains about 5,000 top-level concepts. The resource is based on the Serbian wordnet which has been automatically translated with the help of a bilingual dictionary, the literals ranked according to the frequency of corpus occurrence, and results manually corrected. The paper also discusses some problems encountered along the way and points out some possibilities of automated acquisition and refinement of synsets in the future.

WordNet Based Comparison of Language Variation: A Study Based on CCD and CWN

Jia-Fei Hong, Chu-Ren Huang (Academia Sinica, Taiwan), Yang Liu (PeKing University, China)

This paper will deal with the lexica of comparing the Chinese Concept Dictionary (CCD) with the Chinese WordNet (CWN) by WordNet. CCD is a WordNet-like semantic lexicon that developed by the Institute of Computational Linguistics, Peking University. And CWN is a bilingual wordnet by linking to the SUMO ontology that developed by Academia Sinica Bilingual Ontological WordNet. In this paper, we will base on WordNet database to show several situations for both CCD and CWN, such as: the same translation for them, zero translation only for CCD or CWN, and unique translation only for CCD or CWN. Then, through these analyses, we could find out the unique usage of English translating for traditional Chinese Characters or simplified Chinese characters.

Data Representations for WordNet: A Case for RDF

Alvaro Graves, Claudio Gutierrez (Universidad de Chile)

This paper discusses current versions of WordNet from a data modelling perspective. We show that these versions do not consider basic data model desiderata for their design, like flexibility, extensibility and interoperability. We claim that a data model for WordNet must also consider the inherent network structure of WordNet data. Thus we make the case for an RDF model for WordNet and present a concrete version of WordNet in RDF format.

Learning Information Extraction Patterns Using WordNet

Mark Stevenson, Mark A. Greenwood (University of Sheffield, UK)

Information Extraction (IE) systems often use patterns to identify relevant information in text but these are difficult and time-consuming to generate manually. This paper presents a new approach to the automatic learning of IE patterns which uses WordNet to judge the similarity between patterns. The algorithm starts with a small set of sample extraction patterns and uses a similarity metric, based on a version of the vector space model augmented with information from WordNet, to learn similar patterns. This approach is found to perform better than a previously reported method which relied on information about the distribution of patterns in a corpus and did not make use of WordNet.

Towards a Sensorimotor WordNetSM

Gutemberg Guerra-Filho, Yiannis Aloimonos (University of Maryland, USA)

We have empirically discovered that the space of human actions has a grammatical structure. This is a motoric space consisting of the evolution of the joint angles of the human body in movement. Furthermore, the process of assembling individual human movements into higher level descriptions resembles in a natural sense the process of speech recognition. Thus the space of human activity has its own phonemes, morphemes, words (verbs, nouns, adjectives, adverbs), and sentences formed by its own syntax. This has a number of implications for the grounding problem and cognition in general. With regard to WordNet, the theory points to a future Sensorimotor WordNet which contains a map between the nodes of the current WordNet and the space consisting of human action. In this paper, we suggest initial steps towards closing the semantic gap by grounding language with visuomotor information. The grounding takes place on a set of primitive words which are selected here through verb classification of the WordNet lexicon. A formal approach to the identification of primitive words would consider the basic atoms of WordNet extensions. However, one further extension is required to incorporate grounded information into WordNet in the direction of a sensorimotor WordNet, designated here as WordNetSM.

Lexical Knowledge of Personality Traits

Heili Orav (University of Tartu, Estonia)

This article studies the vocabulary and concepts of personality traits in Estonian. The choice of character words and concepts that are in active use can give us an idea which traits are considered important for Estonians and how the concepts of personality traits are organized on `the map of character landscape'. Whereas most common personality traits are expressed by adjectives, I will focus here mostly on lexical semantics of personality adjectives and examine them in accordance with the principles of WordNet. This research was supported by Estonian Science Foundation Grant No 5534.

Passive Verb Sense Distinction in Korean WordNet

Eun-Ryoung Lee, Ae-Sun Yoon, Hyuk-Cheol Kwon (Korean language Processing Laboratory)

During semi-automatic translation of Princeton English WordNet (PWN) into Korean verbs, we noticed that the verbs of accusativity/inaccusativity alternation in English were mapped to two or more Korean verbs of different morpho-syntactic features and thus different senses. These mismatches in mapping show the need for reconstructing the lexical semantic structure of PWN for Korean verb wordnet, which enables distinguishing of lexical semantic features of each verb. The sense distinction of Korean verbs based on their morpho-syntactic features contributes also to improve the consistency of PWN and ensures the accuracy of Korean wordnet.

An Approach towards Applying and Constructing Multilingual Indo-WordNet

Manish Sinha, Mahesh Reddy, Pushpak Bhattacharyya (Indian Institute of Technology Bombay)

In the work reported here, we present three important related issues.

  • We present an effective method of construction of the Marathi WordNet ( using the Hindi WordNet (, both of which are being developed at IIT Bombay. Henceforth we will refer to them as MWN and HWN respectively.
  • The Synset identity is the key to connect WordNets.
  • We present an interface to browse linked Hindi and Marathi WordNets (Bilingual WordNet) simultaneously for a given word either in Hindi or in Marathi.
As an application, we present Word Sense Disambiguation (WSD) of nouns in Hindi. The system has been evaluated on the Corpora provided by Central Institute of Indian Languages ( and the results are encouraging.

Some Considerations in Structuring a Terminological Knowledge Base

Rita Marinelli (Istituto di Linguistica Computazionale C.N.R., Italy), Giovanni Spadoni (Sauro Spadoni s.r.l. Shipping Agency, Italy)

Exploiting the computational instruments of ItalWordNet (IWN), we built a terminological Database containing about 3000 lemmas. This allowed us to outgo the concept of "dictionary", and obtain data not only described (by the definition), but also codified (by relations), easily managed automatically and linked to the corresponding closest concepts in English through the Inter-Lingual Index (ILI). We started to design the terminological data base top level, identifying the most relevant and representative domain concepts. The users demand has determined the need of managing the ever-increasing new technical terminology which includes also very different domains as the juridical or the economic one. Up to now our database is connected, by means of the 'plug_in' relations, to the general ontology which IWN inherited from EuroWordNet. Now we outline a new domain ontology design, for better defining the boundary of this research, setting the base of the terminological concepts and gaining more functional information. Before defining the ontology, a reflection is preliminary about the concept of 'term' and `domain', the 'relevance' of each term, the knowledge potential of the terminological lexicon, together with the possibility of manipulating this knowledge with huge cognitive effects, specifying how to represent it as a concrete (suitable to be instantiated) data structure. The set of characteristics recognized in our terminological Database and verified, lead us to qualify it a Knowledge Base System, that is a body of represented knowledge, based on a conceptualized view of the world, with axioms and inference rules productive of new knowledge generated from existing one.

WordNet.PT New Directions

Palmira Marrafa, Raquel Amaro, Rui Pedro Chaves, Susana Lourosa, Catarina Martins, Sara Mendes (University of Lisbon, Portugal)

This paper reports the current Portuguese WordNet (WordNet.PT) research and development directions, which mainly regard the enrichment of the WordNet model with event and argument structures (section 1), the codification of cross-part-of speech relations (section 2) and the exploitation of WordNet.PT in concrete applications (section 3).

Adjectives in WordNet.PT

Sara Mendes (University of Lisbon, Portugal)

Most authors agree that adjective semantic analysis and representation is far from being a trivial issue. Since the semantic organisation of adjectives seems to be unlike that of nouns and verbs, as noted by Fellbaum et al (1993) and Miller (1998), this paper focuses on the encoding of adjectives in wordnets. We discuss the strategies used in WordNet.PT. Our proposal aims at mirroring adjectives definitional features in the database, allowing adjective classes to emerge from the relations expressed in the network. In order to do so, we use some of the semantic relations introduced in the Princeton WordNet, but we also propose some new pointers. (This research was supported by Fundaç�o para a Ci�ncia e a Tecnologia (grant SFRH/BD/8524/2002).)

Semantic Based Text Classification Using WordNets: Indian Language Perspective

S. Mohanty, P. K. Santi, Ranjeeta Mishra, R.N. Mohapatra, Sabyasachi Swain (Utkal University, India)

Automatic text classification is an area that has received a great deal of attention in recent research due to current growth of Internet, which has resulted in huge amount of information that has become a challenge to access efficiently. This paper describes an experimental result on how to create an automatic efficient and effective tool that is able to classify large documents quickly. Our method is built on lexical chain of linking significant words that are about a particular topic with the help of hypernym relation in WordNet. We have tested for the Indian language Sanskrit using SanskritNet and also extracting and scoring lexical chain considering with necessary design decisions.

Semantic Relations in Glosses and Explanations: Do They Help?

Neeme Kahusk, Kadri Vider (University of Tartu, Estonia)

This paper gives an overview of current state of Estonian Wordnet and discuss the problem of word definitions in EstWN glosses and word explanation experiments. In this paper, the role of semantic relations in word explanations is discussed. Verbs and nouns are extracted from word definitions (glosses) of Estonian WordNet, and linked with semantic relations of the key word. The results are compared with a word explanation experiment where subjects have to explain as many words as they can within a limited time. Our aim is to tag word senses in Estonian WordNet definition field and find the semantic relations they have with the literals We compare the semantic relations found in dictionary definitions with these, that people give when they have to explain a word under time pressure. Besides improving our wordnet, we hope to find better guidelines for forming word explanations in general.

Research of Multi-Lingual Information Processing Methodologies and Database Development for Globalizing Korean Studies

Woonho Choi (Institute of Korean Culture), Beom-mo Kang, Hochol Choe (Korea University)

This technical report introduces the Multi-lingual (Korean, Japanese, Chinese and English) information processing which has been carried out by the Institute of Korean Culture of Korea University since the year 2000. The aim of this Project is to develop a syntactic and morphological analyser to analyse multi-lingual sentences and construct a Multilingual Lexical Database containing 50,000 entries until August, 2006.

Research on Processing a Multiple Nominative Case Construction in Korean-English Machine Translation by Using WordNet

Donghyeok Lee, Janggeun Oh, Hochol Choe, Junghye Choi (Korea University)

In this paper, we focus on explaining the process of the Korean multiple nominative case construction through using WordNet. Multiple nominative case constructions, in which the nominative case marker - ga is allocated not only to subject but to other syntactic functions, generate much confusion and make the Korean-English Machine Translation, a process mainly depending on the case marker with words, the more difficult process. Also, the predicates in the multiple nominative case constructions can sometimes be ambiguous. To overcome those difficulties, we classified the multiple nominative case constructions into 4 types by argument analysis and then processed them in Korean-English Machine Translation by using WordNet.

An Empirical Study for the Automatic Acquisition of Topic Signatures

Montse Cuadros, German Rigau (Univ. of the Basque Country, Spain), Llu�s Padró (Technical University of Catalonia, Spain)

The main goal of this work is to compare different methods for building Topic Signatures, which are vectors of weighted words acquired from large corpora. We used two different software tools, ExRetriever [Cuadros+'04] and Infomap [Dorow+'03], for acquiring Topic Signatures from corpus. Using these tools, we retrieve sense examples from large text collections. We also include in the comparison the Topic Signatures acquired previously by [Agirre+'04b] from the web. The three systems construct queries for each word sense using WordNet. ExRetriever and Infomap acquire the sense examples from the British National Corpus. The quality of the acquired Topic Signatures is indirectly evaluated on the Word Sense Disambiguation English Lexical Task of Senseval-2.

Meaningful Results for Information Retrieval in the MEANING Project

Piek Vossen (Irion Technologies, Netherlands), David Farwell (TALP Research group, Spain), German Rigau, Inaki Alegria, Eneko Agirre (IXA group, Spain), Manuel Fuentes (Agencia EFE, Spain)

The goal of the MEANING project (IST-2001-34460) is to develop tools for the automatic acquisition of lexical knowledge that will help Word Sense Disambiguation (WSD). The acquired lexical knowledge from various sources and various languages is stored in the Multilingual Central Repository (MCR) (Atserias et al 04), which is based on the design of the EuroWordNet database. The MCR holds wordnets in various languages (English, Spanish, Italian, Catalan and Basque), which are interconnected via an Inter-Lingual-Index (ILI). In addition, the MCR holds a number of ontologies and domain labels related to all concepts. During the MEANING project, the MCR has been enriched in various cycles. This paper describes the integration and evaluation of the MCR in a commercial classification and (cross-lingual) information retrieval system, developed by Irion Technologies. We carried out a series of task-based evaluations on English and Spanish news collections, for which indexes were built with and without the results of MEANING. The evaluations show that both recall and precision are significantly higher when using the enriched semantic networks in combination with WSD.

Linking and Harmonizing Different Lexical Resources: a Comparison of Verbal Entries in ItalWordNet and PAROLE-SIMPLE-CLIPS

Adriana Roventini, Nilda Ruimy (Istituto di Linguistica Computazionale, CNR, Italy)

During the last years, in the framework of Computational Linguistics, many lexical resources have been developed which aim at coding complex lexical semantic information according to different linguistic models (WordNet, Frame Semantics, Generative Lexicon, etc.). However, these resources are often not easily accessible nor available in their entirety. Yet, from the point of view of the continuous growth of the technology (Semantic Web), their visibility, availability, integration and harmonization are becoming of utmost importance. ItalWordNet and PAROLE/SIMPLE/CLIPS are two resources which, tackling lexical semantics from different perspectives and being at least partially complementary could profit from linking each other. In this paper we address the issue of linking these resources focusing on the most problematic part of the lexicon: the second order entities. In particular, after a brief description of the two resources, their different approaches to the verb semantics are described; an accurate comparison of a set of verbal entries is carried out, with a view to evaluating the possibilities and the advantages of a semiautomatic link; finally, the results and the future work are illustrated.

Toward Domain Specific Thesaurus Construction: \newline Divide-and-Conquer Method

Pum-Mo Ryu, Jae-Ho Kim, Yoonyoung Nam, Jin-Xia Huang, Saim Shin, Sheen-Mok Lee, Key-Sun Choi (Computer Science Division, KAIST, KORTERM/BOLA)

This paper describes new thesaurus construction method in which class-based, small size thesauruses are constructed and merged as a whole based on domain classification system. This method has advantages in that 1) taxonomy construction complexity is reduced, 2) each class-based thesaurus can be reused in other domain thesaurus, and 3) term distribution per classes in target domain is easily identified. The method is composed of three steps: term extraction step, term classification step, and taxonomy construction step. All steps are balanced approaches of automatic processing and manual verification. We constructed Korean IT domain thesaurus based on proposed method. Because terms are extracted from Korean newspaper and patent corpus in IT domain, the thesaurus includes many Korean neologisms. The thesaurus consists of 81 upper level classes and over 1,000 IT terms.

Recognizing Transliteration Equivalence for Enriching Domain-Specific Thesauri

Jong-Hoon Oh (National Institute of Information and Communications Technology, Japan), Key-Sun Choi (Korea Advanced Institute of Science and Technology)

Transliteration is used to translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from English to Korean, Japanese, and Chinese. "Transliteration equivalence" refers to a set of the same words that include all possible transliterated forms and the original word. Many Korean domain-specific terms are composed of transliterations. Therefore, handling transliterations and their transliteration equivalence is essential to constructing and enriching Korean domain-specific thesauri. In this paper, we propose an algorithm recognizing transliteration equivalence or transliteration pairs in domain-specific dictionaries using machine transliteration. Machine transliteration can serve as one of components in a transliteration pair acquisition method by offering a machine-generated transliterated form. Because, transliteration pair acquisition task is to find phonetic cognate in two languages, it is important to phonetically convert words in one language to that in the other language, like machine transliteration, to compare the phonetic equivalence. Our method shows about 99% precision and 73% recall rate.

A Study on a Conceptual Map of Korean Words

Sun-Mee Bae, Chung-Kon Shi, Key-Sun Choi (KAIST, Korea)

A multi-lingual lexical semantic wordnet called CoreNet has been developed by KAIST KORTERM. CoreNet is constructed based on one shared semantic hierarchy oriented from NTT thesaurus. Korean wordnet in CoreNet consists of 2,937 conceptual nodes (semantic categories) with 12 depth levels and of 51,172 senses for nouns, 5,290 for verbs, and 2,081 for adjectives in Korean. As a primary work for constructing a conceptual map of Korean words, this paper aims to show the concept distributions of Korean words in CoreNet based on the depths and semantic categories. The analysis results on concept distributions shows that WORK<ABSTRACT> and HUMAN ACTIVITY are the most broadly distributed concepts in nouns and verbs, while ABSTRACT RELATION, STATE, and ATTRIBUTE are the most ones in adjectives. This study provides the indispensable statistical data in order to construct a conceptual map of Korean words. Moreover, it allows to structurally and totally understand structure of Korean wordnet, to review proper specifications of semantic categories and correct assignment of concepts for Korean words and to prospect the next version of Korean a Word.

Knowing a Word vs. Accessing a Word: Wordnet and Word Association Norms as Interfaces to Electronic Dictionaries

Anna Sinopalnikova, Pavel Smr� (Brno University of Technology, Czech Republic)

Various groups of users, ranging from professional translators and writers to language learners, use dictionaries in their everyday work. The electronic form of the dictionaries facilitated and accelerated the access to their content considerably and brought new ways of the dictionary search. However, the current products are still unable to offer a full-featured search by meaning which would be advantageous in many cases. This paper describes our experiments on access-supporting enhancements of electronic dictionaries that are based on wordnets and word association norms. Results of evaluation experiments for two European languages - English and Russian - are presented. The comparison with the fulltext- and corpus-based access methods shows that the proposed ways of the dictionary search often provide the best word-access strategy.

Using WordNet for Opinion Mining

Pavel Smr� (Brno University of Technology, Czech Republic)

This paper deals with lexical resources applied for opinion mining -- the identification and extraction of opinions from free texts. Opinion mining comprises the segmentation of documents, passages, sentences, or phrases to objective (factual) and subjective parts, and the evaluation of the subjective attitude toward a given fact. We briefly introduce an automatic system that was designed to crawl various information sources available on the Web -- newspapers, Internet blogs and forums -- to collect and identify different opinions on a given topic and to report diversity of opinions across languages and countries. A special attention is paid to linguistic resources used, especially to wordnet extensions that play a crucial role in the identification of subjective expressions.

PictNet: Semantic Infrastructure for Pictogram Communication

Toshiyuki Takasaki (NPO Pangaea, Japan)

PictNet is an online pictogram communication system designed for children from any country that accomplishes equitable communication by reflecting each user's cultural background with multilingual support. There were issues with search accuracy and the consistency of manageability of the PictNet pictogram repository. Online pictogram surveys and experimental pictogram-creation activities for children indicated that users should be able to search for pictograms more easily when the pictogram repository has the semantics framework not only of the relationship among concepts in PictNet, but also of the relationship among visual properties of pictograms, or background concepts of pictograms such as cultural and emotional information. Also, it was suggested that WordNet, an online lexical reference system, supports most of the new concepts requested by children with keeping its consistency. It was confirmed that findability and scalability are realized by grounding PictNet repository onto WordNet.

Analogical Reasoning with a Synergy of WordNet and HowNet

Hyesook Kim, Shanshan Chen, Tony Veale (University College Dublin, Ireland)

WordNet and HowNet are large-scale lexical resources that adopt complementary perspectives toward semantic representation. WordNet is differential, inasmuch as it provides a rich taxonomic structure but little in the way of explicit propositional content. HowNet is constructive, and dedicates its representational energies to the explicit definition of relational structures for each lexico-conceptual entry. For purposes of analogy, no one approach is best. Rather, a synergy of both is required, since analogy is a knowledge-hungry process that demands both taxonomic richness and causally descriptive propositional structure. In this paper we consider how such a synergy might be achieved, and how it can be exploited to support a robust model of analogical reasoning.

A Typology of Lexical Analogy in WordNet

Tony Veale (University College Dublin, Ireland)

Analogy and metaphor are extremely knowledge-hungry processes, so one should question whether lightweight lexical ontologies like WordNet are sufficiently rich to support them. In this paper we argue that resources like WordNet are suited to the processing of certain kinds of lexical analogies and metaphors, for which we propose a spatially-motivated typology and a corresponding computational model. We identify two kinds of dimension that are important in lexical analogy -- lexicalized (taxonomic) dimensions and ad-hoc (goal-specific) dimensions -- and describe how these can be automatically identified, extracted and exploited in WordNet.

Construction of the Hungarian EuroWordNet Ontology and its Application to Information Extraction

Zolt�n Alexin, J�nos Csirik, György Szarvas (University of Szeged, Hungary), Andr�s Kocsor (MTA-SZTE, Research, Hungary), M�rton Mih�ltz (MorphoLogic Ltd., Hungary)

This report describes a recent Hungarian project begun in the spring of 2005. The goals of the project are to produce a Hungarian version of the EuroWordNet ontology database, to extend it with concepts specific to the business domain, and to develop a demonstration version of an ontology-based Information Extraction (IE) system. The system will extract condensed data from short business articles concerning company mergers, acquisitions, balance reports, new products, new plants and so on. A consortium of three leading Hungarian human language technology institutions won substantial governmental support that will last until 2007.

The LOIS Project

Wim Peters (University of Sheffield, UK)

In Search for More Knowledge: Regular Polysemy and Knowledge Acquisition

Wim Peters (University of Sheffield, UK)

This paper describes the process of the extraction of implicit knowledge from WordNet and EuroWordNet. This knowledge is an extension of the explicit knowledge structures already provided by the wordnets in the form of synsets and semantic relations, and is contained both within (Euro)WordNet's hierarchical structure and the glosses that are associated with each WordNet synset. The extended knowledge comes in the form of frame structures containing regular polysemic patterns and automatically extracted relations that link the participating concepts in these patterns.

Developing PersiaNet: The Persian Wordnet

Farhad Keyvan (XselData Corporation, USA), Habib Borjian (Independent Consultant), Manuchehr Kasheff (Columbia University, USA), Christiane Fellbaum (Princeton University, USA)

This paper outlines work on PersiaNet, a wordnet for Modern Persian.

Adding Dense, Weighted Connections to WordNet

Jordan Boyd-Graber, Christiane Fellbaum, Daniel Osherson, Robert Schapire (Princeton University, USA)

\WN, a ubiquitous tool for natural language processing, suffers from sparsity of connections between its component concepts (synsets). Through the use of human annotators, a subset of the connections between 1000 hand-chosen synsets was assigned a value of "evocation" representing how much the first concept brings to mind the second. These data, along with existing similarity measures, constitute the basis of a method for predicting evocation between previously unrated pairs.

Automating Ontological Annotation with WordNet

Antonio Sanfilippo, Stephen Tratz, Michelle Gregory, Alan Chappell, Paul Whitney, Christian Posse, Patrick Paulson, Bob Baddeley, Ryan Hohimer, Amanda White (Pacific Northwest National Laboratory, USA)

Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models.

The Nature of Cross-Lingual Lexical Semantic Relations: A Preliminary Study Based on English-Chinese Translation Equivalents

Chu-Ren Huang, Wan-Ying Lin, Jia-Fei Hong, I-Li Su (Academia Sinica, Taiwan)

In this paper, we propose a new approach to comparative lexical semantics. In particular, a wordnet-like framework is adopted to study the nature of cross-lingual lexical semantic relations. The synsets of an existing monolingual wordnet are often aligned with their translation equivalents in a target languages in order to bootstrap a bilingual wordnet. Previous studies adopting this approach include the Spanish WordNet (SpWN, Atserias et al., 1997) and MultiWordNet (MWN, Pianta, et al., 2002). Such studies brought to attention the importance of cross-lingual lexical semantic relations between two translation equivalents. In this paper, we examine and analyze the contrast and the cross-lingual semantic relations between the English WN synsets, and their Chinese translation equivalents. Generalizations are made based on the distribution of the part-of-speech, semantic relations and concepts in terms of SUMO ontology. Our account sheds the first light towards the nature of conceptual basis for non-synonymous translation, as well as for bilingual wordnet-mapping.

Romanian WordNet: New Developments and Applications

Dan Tufiş, Verginica Barbu Mititelu, Luigi Bozianu, Cătălin Mihăilă (Romanian Academy Research Institute for Artificial Intelligence)

Among the existing ontologies, the multilingual lexical ontologies have a special status. Structured in a similar way to standard ontologies, the lexical ones are distinguished by the fundamental requirement that each conceptualized entity is lexicalized by one or more synonymous words (a synset) of the natural language vocabulary. Multilingually aligned wordnets, such as EuroWordNet or BalkaNet, represent one step further with great promises in the domain of multilingual processing. This paper gives an account for the development and current status of the Romanian wordnet, aligned to the Princeton WordNet 2.0 (PWN2.0), and discusses some of its applications.

Wordnet Enhanced Automatic Crossword Generation

Aoife Aherne, Carl Vogel (University of Dublin, Ireland)

We report on a system for automatically generating and displaying crosswords from a system manager supplied database of potential clues and corresponding words that index those clues. The system relies on the lexical relations encoded in WordNet to enhance the aesthetics of the resulting crossword by making it easier to automatically identify a grid that may be populated with words and clues that have a thematic focus. The system architecture is provided in overview, as is empirical evaluation.

Semi-Automated English-Russian WordNet Construction: Initial Resources, Software and Methods of Translation

Sergey Yablonsky, Andrey Sukhonogov (Petersburg Transport University, Russia)

The idea of Princeton WordNet (PWN) transformation into multilingual lexical ontology has started to be put into practice in EuroWordNet project. For today exists more than 15 national versions of WordNet, and all of them are to some extent adhered to PWN. Conformity is reached or by means of interlingual indexes development, or as such index acts PWN. The purpose of the present work is research and development of semi automated methods of English-Russian version of WordNet database (English-Russian WordNet -- ERWN) construction using mapping of PWN to RWN and preliminary test translation of PWN/RWN for an estimation of an opportunity of such mapping construction on the basis of PWN. It is shown that up to 70% PWN synsets could be translated into Russian on the basis of the semi-automated translation methods.

A Proposal for the Automatic Distinction of Homomorphic Idiomatic and Non-idiomatic Phrases in WordNet

Benjamin R. Haskell, Christiane Fellbaum (Princeton University, USA), Chandra Barnett (California Institute of Technology)

Idiomatic phrases composed of several lexemes pose various problems for NLP. It is often not obvious whether a given sequence of words is intended for idiomatic or literal interpretation. We propose a solution that detects idioms based on the semantic classes of their constituents. After annotating the idioms in WordNet with this information, they can be compiled into a tree structure to efficiently identify the constructions.

Project Report on a Korean Science

Hanmin Jung, Won-Kyung Sung, Dong-In Park (Korea Institute of Science and Technology)

Our project has a long-term plan to construct a Korean science & technology thesaurus from 2005 to 2010. For designing an elaborated thesaurus, we introduce conceptual and relational facets which are excluded or partially included in WordNet, Core-Net, and other thesauri constructed by Chung et al . (2002) and Lee et al . (2000).

Alexandria as a Result of the Integration of WordNet and LDI

Dominique Dutoit (Memodata, France), Ourania Papadima (University of Stendhal, France)

This paper is dealing with several problems related to the integration of the two following resources, Wordnet (Fellbaum 1998) and Le Dictionnaire Integral  - LDI (Dutoit 1992). The LDI project began in 1988 in France and is based on different principles and goals from Wordnet. Nevertheless, LDI and WordNet have much in common and their integration is possible and highly expected so as to share the information contained in these two models. In the first section we will see in what extent the two linguistic models differ and we will mention some of the reasons of their differences. The two implemented technical data models are schematised and compared. In a second section we explain the integration tasks of the two models and the actual state of the integration process. In the conclusion, we see how it will soon be possible to consult the resulting merge and how to update the data on the Internet.

Some Issues in the Construction of a Multi-Lingual Lexical-Semantic Net

Yu-de Bi, Key-Sun Choi (KORTERM, KAIST, Korea), Jian-guo Xiong (Luoyang FLU, China), Yang Liu (Peiking University, China)

An advanced knowledge base, such as a lexical-semantic net, guarantees accuracy in semantic interpretation and setting of semantic relations. The paper provides a tentative analysis of the concepts, relation representations, conceptual system and cross- language translation as observed in the construction of a multi-lingual lexical-semantic net.

DEBVisDic -- First Version of New Client-Server Wordnet Browsing and Editing Tool

Ale� Hor�k, Karel Pala, Adam Rambousek, Martin Povoln� (Masaryk University Brno, Czech Republic)

In this paper, we present the new wordnet development tool called DEBVisDic. It is built on the recently developed platform for client-server XML databases, called DEBII. This platform is able to cover many possible applications, from which we concentrate on the new, complete reimplementation of one of the most-spread wordnet editor and browser -- VisDic. We argue for the benefits the new DEBII platform brings to wordnet editing and to XML databases in general. In the paper, we describe the state of the implementation, the insides and interfaces of the DEBVisDic tool. We also discuss its functionality and some distinctions in comparison with other dictionary writing systems.

Prepositional Phrase Attachment through Semantic Association using Connectionist Approach

Medimi Srinivas, Pushpak Bhattacharyya (Indian Institute of Technology)

Determining the correct attachment site for Prepositional Phrase (PP) is one of the major sources of ambiguity in natural language parsing and analysis. In this paper, we describe a neural network based approach to prepositional phrase attachment for natural language text. Our approach disambiguates the attachment site for PP through semantic association among the constituents namely verb, noun and PP, using the WordNet semantic classes. It is essentially a corpus based approach. In most of previous corpus based statistical approaches, accurate estimation of probabilities was dependent on the data sufficiency in terms of size and coverage of the features. Moreover, rule-based systems are inappropriate for handling uncertain knowledge. Managing and maintaining rule based systems is also very difficult task and poses many problems. Our method, using the semantic class properties of words, reduces the lexical ( word) level data sparseness problem. Neural networks are also very good in capturing the complex nature of semantic association among the words, and as a result capture the selectional restrictions. We have tested our method on Wall Street Journal corpus, and the experimental results show much better accuracy in PP attachment disambiguation and comparable to state-of-the-art approaches and the accuracy of the results shows the effectiveness of our approach.

Introducing the Arabic WordNet Project

William Black, Sabri Elkateb (University of Manchester, USA), Horacio Rodriguez, Musa Alkhalifa (University of Barcelona, Spain), Piek Vossen (Irion Technologies, Netherlands), Adam Pease (Articulate Software, USA), Christiane Fellbaum (Princeton University, USA)

Arabic is the official language of hundreds of millions of people in twenty Middle East and northern African countries, and is the religious language of all Muslims of various ethnicities around the world. Surprisingly little has been done in the field of computerised language and lexical resources. It is therefore motivating to develop an Arabic (WordNet) lexical resource that discovers the richness of Arabic as described in Elkateb (2005). This paper describes our approach towards building a lexical resource in Standard Arabic. Arabic WordNet (AWN) will be based on the design and contents of the universally accepted Princeton WordNet (PWN) and will be mappable straightforwardly onto PWN 2.0 and EuroWordNet (EWN), enabling translation on the lexical level to English and dozens of other languages. Several tools specific to this task will be developed. AWN will be a linguistic resource with a deep formal semantic foundation. Besides the standard wordnet representation of senses, word meanings are defined with a machine understandable semantics in first order logic. The basis for this semantics is the Suggested Upper Merged Ontology (SUMO) and its associated domain ontologies. We will greatly extend the ontology and its set of mappings to provide formal terms and definitions equivalent to each synset.

