KoreanParser

From SWRC

Jump to: navigation, search

Introduction

Korean Parser

A project to build a Korean syntactic parser. Syntactic parser plays basic role for many natural language processing applications. (2012 ~ )

Introduction

We aim to build a Korean syntactic parser. For the first step, we are trying to train the existing English parsers to run on Korean texts.

Citation of Reference

If you use this Korean parser, please cite the following paper for your reference:

  - Korean Treebank Transformation for Parser Training Download
    DH Choi, J Park, L Editions, KS Choi - ACL 2012, 2012 - newdesign.aclweb.org

Refining Corpus

To train the existing English corpus, Sejong Treebank is used. To convert Sejong Treebank into Penn-Treebank format, program proposed in [Jungyeul Park, "Extraction of tree adjoining grammars from a treebank for Korean", COLING-ACL 06: Student Research Workshop, 2006] is used. Also, we additionally refine the treebank to get the better result: transformation algorithm is presented in the paper which is submitted in SPMRL 12 workshop.

Phrase-Structure Grammar Parser

1. parser only

Among three PSG parsers - Stanford parser, Berkeley parser, Bikel-Collins parser - , Berkeley parser woked best for Korean (Approximately 78 % of F1-score for 10 -fold cross-validation in Sejong corpus). The following zip file contains the library files and required models. The following code shows how to use the toolkit:

// Initialization.

Configuration.hanBaseDir = "./models/ma/"; // Path for morphological analyzer models

BerkeleyParserWrapper bpw = new BerkeleyParserWrapper("KorGrammar_BerkF_FIN"); // Path for parser model

// Running the parser

String result = bpw.parse("신라면은 1개에 505칼로리의 열량을 갖고 있다.");

Download


2. parser with morphological analyzer

This Korean Berkeley parser contains Hannanum morphological analyzer and work with Hannanum.

The following code shows how to use the toolkit:

When using console command :

  java - jar BerkeleyParser_KorV2.jar "나는 한국을 사랑한다'
1.jpg

When using in file :

  BerkeleyParserWrapper bpw = new BerkeleyParserWrapper(Configuration.parserModel);   
  result = bpw.parse("나는 한국을 사랑한다.");
Co2.jpg

Download

Dependency-Grammar Parser

We are about to train Malt parser and KNP parser.

NLP Hub

2012 국어 정보 처리 시스템 경진대회 (장관상)