Multilingual Synchronization

From SWRC

(Difference between revisions)

Jump to: navigation, search
(Project Description)
Line 12: Line 12:
}}
}}
 +
== Welcome to the M-Sync Project!==
 +
M-Sync(Multilingual Synchronization focusing on Wikipedia) is a research project by the KAIST Semantic Web Research Center<br><br>
 +
=== Project Description ===
-
 
+
Our goal of this research is to synthesize contents of Wikipedia from multiple different language editions.  
-
 
+
The various Wikipedia editions from different languages can offer more precise and detailed information based on different intentions, different backgrounds and different cultures. It helps to users to synthesis and enrich the information contained in Wikipedia among multiple language editions.
-
 
+
Although articles may be shared among the different language editions, Wikipedia is not a parallel corpus.  
-
 
+
Because the most of articles in different languages are independently created by different users and independently maintained by different communities so that two linked articles in two different languages have different amount of information.  
-
 
+
Even some entries in one language have no entry in the other language.
-
 
+
The English articles accounted for 20% of the total articles, so that Wikipedia other language editions suffer from lack of information compared to the English version.  
-
 
+
Obviously, the different number of articles cannot contain the same amount of information and it makes serious difficulties to users who try to seek information or knowledge from different lingual sources.  
-
=Project Description=
+
Thus, the synthesis of information contained in different lingual datasets is valuable to the extent that it contributes to better results for a number of applications such as information retrieval and query expansion on the multilingual environment.
-
 
+
-
Our goal of this research is to synthesize contents of Wikipedia from multiple different language editions. The various Wikipedia editions from different languages can offer more precise and detailed information based on different intentions, different backgrounds and different cultures. It helps to users to synthesis and enrich the information contained in Wikipedia among multiple language editions. Although articles may be shared among the different language editions, Wikipedia is not a parallel corpus. Because the most of articles in different languages are independently created by different users and independently maintained by different communities so that two linked articles in two different languages have different amount of information. Even some entries in one language have no entry in the other language. The English articles accounted for 20% of the total articles, so that Wikipedia other language editions suffer from lack of information compared to the English version. Obviously, the different number of articles cannot contain the same amount of information and it makes serious difficulties to users who try to seek information or knowledge from different lingual sources. Thus, the synthesis of information contained in different lingual datasets is valuable to the extent that it contributes to better results for a number of applications such as information retrieval and query expansion on the multilingual environment.
+

Revision as of 07:28, 12 April 2011

Introduction

Multilingual Synchronization

this research is to synthesize contents of Wikipedia from multiple different language editions. (2010 ~ )

Welcome to the M-Sync Project!

M-Sync(Multilingual Synchronization focusing on Wikipedia) is a research project by the KAIST Semantic Web Research Center


Project Description

Our goal of this research is to synthesize contents of Wikipedia from multiple different language editions. The various Wikipedia editions from different languages can offer more precise and detailed information based on different intentions, different backgrounds and different cultures. It helps to users to synthesis and enrich the information contained in Wikipedia among multiple language editions. Although articles may be shared among the different language editions, Wikipedia is not a parallel corpus. Because the most of articles in different languages are independently created by different users and independently maintained by different communities so that two linked articles in two different languages have different amount of information. Even some entries in one language have no entry in the other language. The English articles accounted for 20% of the total articles, so that Wikipedia other language editions suffer from lack of information compared to the English version. Obviously, the different number of articles cannot contain the same amount of information and it makes serious difficulties to users who try to seek information or knowledge from different lingual sources. Thus, the synthesis of information contained in different lingual datasets is valuable to the extent that it contributes to better results for a number of applications such as information retrieval and query expansion on the multilingual environment.