Home
About SWRC
Publications
Events
Members
Mem. Only

HanNanum

From SWRC

Jump to: navigation, search

Introduction

HanNanum: Korean Morphological Analyzer

Hannanum is a morphological analyzer and a POS tagger which is plug-in component architecture-based tool. Hannanum was developed in Java, so this tool is available in every platform which has JRE. (1999 ~ )

When you analyze natural language text, a word may have different part-of-speech depending on the context. Part-of-speech tagging is to solve this ambiguity. After POS tagging process, we can know the role of the word and the structure of the sentence. Korean morphological analyzer is software that has Korean text as an input, and separates them with morphemes. The morphological analysis and POS tagging result is basic and important for natural language processing.

In each natural language, POS tagging methods need to be different regarding the characteristics of the target language. In the case of English that is an inflectional language it is easier to get morphemes in a sentence by segmenting the word on blanks, but in the case of Korean that is an inflectional language more than one morpheme can be connected and affect each other. So combination of the ambiguity of morpheme detection and part-of-speech tagging makes it more complex to analyze Korean text.


Architecture of HanNanum (java version)

HanNanum Workflow

The Java version of HanNanum Morphological Analyzer adapted plug-in component architecture for more flexible use. Users are allowed to set the work flow up for own purpose using the plug-ins already developed, and developers can implement new plug-ins easily using existing system and resources.


Features of HanNanum

  • Platform Independent: HanNanum can be run on any system with JRE 1.6 or above.
  • The resources are opened so users are allowed to edit and use them under the license.
  • Flexible Architecture: Easy to add a new functionality by implementing just a plug-in.
  • Easy to use: Add the library jhannanum.jar to your project, then you are ready to use.
  • Support Multi-thread and Single-thread mode.
  • Based on UTF-8 that has higher compatibility than EUC-KR



Workflow Examples

Followings are the examples of HanNanum work flows that analyze Korean text for own different purpose. You can easily test these examples using the example programs on kr.ac.kaist.swrc.jhannanum.demo.* or GUIDemo in the HanNanum release. To download HanNanum, visit KLDP Download (KO) or SourceForge Download (EN).

Morphological Analysis & POS Tagging

Workflow: SentenceSegmentor - InformalSentenceFilter - ChartMorphAnalyzer - UnknownProcessor - HMMTagger

Input:
프로젝트 전체 회의.
회의 일정은 다음과 같습니다.
			
日時: 2010년 7월 30일 오후 1시
場所: Coex Conference Room
Output:
프로젝트/ncn 전체/ncn 회의/ncn ./sf
회의/ncn 일정/ncn+은/jxc 다음/ncn+과/jct 같/paa+습니다/ef ./sf

日時/ncn+:/sp 2010/nnc+년/nbu 7/nnc+월/nbu 30/nnc+일/nbu 오후/ncn 1/nnc+시/nbu
場所/ncn+:/sp Coex/f Conference/f Room/f


Noun Extraction

Workflow: ChartMorphAnalyzer - UnknownProcessor - HMMTagger - NounExtractor

Input:
롯데마트가 판매하고 있는 흑마늘 양념 치킨이 논란이 되고 있다.
Output:
롯데마트, 판매, 흑마늘, 양념, 치킨, 논란


Morphological Analysis & POS Tagging

Workflow: ChartMorphAnalyzer - UnknownProcessor - HMMTagger

Input:
학교에서조차도 그 사실을 모르고 있었다.
Output:
학교에서조차도
	학교/ncn+에서/jca+조차/jxc+도/jxc

그
	그/mmd

사실을
	사실/ncn+을/jco

모르고
	모르/pvg+고/ecc

있었다
	있/px+었/ep+다/ef

.
	./sf


Morphological Analysis & POS Tagging (simple)

Workflow: ChartMorphAnalyzer - UnknownProcessor - HMMTagger - SimplePOSResult09

Input:
학교에서조차도 그 사실을 모르고 있었다.
Output:
학교에서조차도
	학교/N+에서조차도/J

그
	그/M

사실을
	사실/N+을/J

모르고
	모르/P+고/E

있었다
	있/P+었다/E

.
	./S



Quick Links



Official Websites