From SWRC
Introduction
A project to build a Korean syntactic parser. Syntactic parser plays basic role for many natural language processing applications. (2012 ~ )
Introduction
We aim to build a Korean syntactic parser. For the first step, we are trying to train the existing English parsers to run on Korean texts.
Citation of Reference
If you use this Korean parser, please cite the following paper for your reference:
- Korean Treebank Transformation for Parser Training Download DH Choi, J Park, L Editions, KS Choi - ACL 2012, 2012 - newdesign.aclweb.org
Refining Corpus
To train the existing English corpus, Sejong Treebank is used. To convert Sejong Treebank into Penn-Treebank format, program proposed in [Jungyeul Park, "Extraction of tree adjoining grammars from a treebank for Korean", COLING-ACL 06: Student Research Workshop, 2006] is used. Also, we additionally refine the treebank to get the better result: transformation algorithm is presented in the paper which is submitted in SPMRL 12 workshop.
Phrase-Structure Grammar Parser
1. parser only
Among three PSG parsers - Stanford parser, Berkeley parser, Bikel-Collins parser - , Berkeley parser woked best for Korean (Approximately 78 % of F1-score for 10 -fold cross-validation in Sejong corpus). The following zip file contains the library files and required models. The following code shows how to use the toolkit:
// Initialization.
Configuration.hanBaseDir = "./models/ma/"; // Path for morphological analyzer models
BerkeleyParserWrapper bpw = new BerkeleyParserWrapper("KorGrammar_BerkF_FIN"); // Path for parser model
// Running the parser
String result = bpw.parse("신라면은 1개에 505칼로리의 열량을 갖고 있다.");
2. parser with morphological analyzer
This Korean Berkeley parser contains Hannanum morphological analyzer and work with Hannanum.
The following code shows how to use the toolkit:
When using console command :
java - jar BerkeleyParser_KorV2.jar "나는 한국을 사랑한다'
When using in file :
BerkeleyParserWrapper bpw = new BerkeleyParserWrapper(Configuration.parserModel); result = bpw.parse("나는 한국을 사랑한다.");
Dependency-Grammar Parser
We are about to train Malt parser and KNP parser.
NLP Hub
2012 국어 정보 처리 시스템 경진대회 (장관상)