From SWRC
Introduction
Hannanum is a morphological analyzer and a POS tagger which is plug-in component architecture-based tool. Hannanum was developed in Java, so this tool is available in every platform which has JRE. (1999 ~ )
When you analyze natural language text, a word may have different part-of-speech depending on the context. Part-of-speech tagging is to solve this ambiguity. After POS tagging process, we can know the role of the word and the structure of the sentence. Korean morphological analyzer is software that has Korean text as an input, and separates them with morphemes. The morphological analysis and POS tagging result is basic and important for natural language processing.
In each natural language, POS tagging methods need to be different regarding the characteristics of the target language. In the case of English that is an inflectional language it is easier to get morphemes in a sentence by segmenting the word on blanks, but in the case of Korean that is an inflectional language more than one morpheme can be connected and affect each other. So combination of the ambiguity of morpheme detection and part-of-speech tagging makes it more complex to analyze Korean text.
Architecture of HanNanum (java version)
The Java version of HanNanum Morphological Analyzer adapted plug-in component architecture for more flexible use. Users are allowed to set the work flow up for own purpose using the plug-ins already developed, and developers can implement new plug-ins easily using existing system and resources.
Features of HanNanum
- Platform Independent: HanNanum can be run on any system with JRE 1.6 or above.
- The resources are opened so users are allowed to edit and use them under the license.
- Flexible Architecture: Easy to add a new functionality by implementing just a plug-in.
- Easy to use: Add the library jhannanum.jar to your project, then you are ready to use.
- Support Multi-thread and Single-thread mode.
- Based on UTF-8 that has higher compatibility than EUC-KR
Workflow Examples
Followings are the examples of HanNanum work flows that analyze Korean text for own different purpose. You can easily test these examples using the example programs on kr.ac.kaist.swrc.jhannanum.demo.* or GUIDemo in the HanNanum release. To download HanNanum, visit KLDP Download (KO) or SourceForge Download (EN).
Morphological Analysis & POS Tagging
Workflow: SentenceSegmentor - InformalSentenceFilter - ChartMorphAnalyzer - UnknownProcessor - HMMTagger
Input:프로젝트 전체 회의. 회의 일정은 다음과 같습니다. 日時: 2010년 7월 30일 오후 1시 場所: Coex Conference RoomOutput:
프로젝트/ncn 전체/ncn 회의/ncn ./sf 회의/ncn 일정/ncn+은/jxc 다음/ncn+과/jct 같/paa+습니다/ef ./sf 日時/ncn+:/sp 2010/nnc+년/nbu 7/nnc+월/nbu 30/nnc+일/nbu 오후/ncn 1/nnc+시/nbu 場所/ncn+:/sp Coex/f Conference/f Room/f
Noun Extraction
Workflow: ChartMorphAnalyzer - UnknownProcessor - HMMTagger - NounExtractor
Input:롯데마트가 판매하고 있는 흑마늘 양념 치킨이 논란이 되고 있다.Output:
롯데마트, 판매, 흑마늘, 양념, 치킨, 논란
Morphological Analysis & POS Tagging
Workflow: ChartMorphAnalyzer - UnknownProcessor - HMMTagger
Input:학교에서조차도 그 사실을 모르고 있었다.Output:
학교에서조차도 학교/ncn+에서/jca+조차/jxc+도/jxc 그 그/mmd 사실을 사실/ncn+을/jco 모르고 모르/pvg+고/ecc 있었다 있/px+었/ep+다/ef . ./sf
Morphological Analysis & POS Tagging (simple)
Workflow: ChartMorphAnalyzer - UnknownProcessor - HMMTagger - SimplePOSResult09
Input:학교에서조차도 그 사실을 모르고 있었다.Output:
학교에서조차도 학교/N+에서조차도/J 그 그/M 사실을 사실/N+을/J 모르고 모르/P+고/E 있었다 있/P+었다/E . ./S
Quick Links
- Where can I download the HanNanum? KLDP Download (KO) or SourceForge Download (EN)
- Is there a user reference manual? Reference Manual Download (KO)
- I want more detailed information about HanNanum library. JAVADOC of HanNanum (EN)
- I have a question or suggestion on HanNanum. KLDP Forum (KO), SourceForge Open Discussion (EN), E-mail
- Which corpus is used to develop HanNanum? High quality morpho-syntactically annotated corpus
- What morpheme tag set does HanNanum use? Refer the reference manual - Reference Manual (KO)
Official Websites
- KLDP Project Community (KO)
- SorceForge.net Project Community (EN)
- The previous website I of HanNanum (KO)
- The previous website II of HanNanum (KO)