Title | Where do the authors come from? |
Publication Type | Journal Article |
Year of Publication | 2009 |
Authors | Biryukov, M |
Journal | Journal of Digital Information Management |
Volume | 7 |
Issue | 4 |
Pagination | 211 - 218 |
Date Published | 2009 |
Keywords | Co-author network, DBLP database, Digital libraries, Language classification, Scientific publications |
Abstract | Permanent growth of scientific publications makes bibliographic databases and digital libraries widespread. At the same time they are an object of research in their own right. In this paper we address the question of "where do the authors come from?" via language identification of the author names. This is a two-steps process which involves primary classification based on the statistical models of languages, and classification refinement achieved with the analysis of the co-author network built from the bibliographic records. A system for automatic language identification presented here handles 14 different languages and requires no dictionary of names for traing. The statistical models are built from the general purpose corpora for all Western European, Chinese, Japanese and Turkish languages. The system is fine tuned to achieve precision and recall above 90% for many languages, and provides better performance than some other systems aiming at the language identification of personal names. Tests on the DBLP data set have shown that the extension of the language model with the co-author network helps to improve classification results, especially in cases of closely related languages and mixed names. They have also demonstrated the usability of the system in applications such as data cleaning and trends detection. |
URL | http://www.scopus.com/inward/record.url?eid=2-s2.0-77953242477&partnerID=40&md5=d76b422a52e3bdf48f54dbd4ba5cd9c1 |