Where do the authors come from?

TitleWhere do the authors come from?
Publication TypeJournal Article
Year of Publication2009
AuthorsBiryukov, M
JournalJournal of Digital Information Management
Volume7
Issue4
Pagination211 - 218
Date Published2009
KeywordsCo-author network, DBLP database, Digital libraries, Language classification, Scientific publications
Abstract

Permanent growth of scientific publications makes bibliographic databases and digital libraries widespread. At the same time they are an object of research in their own right. In this paper we address the question of "where do the authors come from?" via language identification of the author names. This is a two-steps process which involves primary classification based on the statistical models of languages, and classification refinement achieved with the analysis of the co-author network built from the bibliographic records. A system for automatic language identification presented here handles 14 different languages and requires no dictionary of names for traing. The statistical models are built from the general purpose corpora for all Western European, Chinese, Japanese and Turkish languages. The system is fine tuned to achieve precision and recall above 90% for many languages, and provides better performance than some other systems aiming at the language identification of personal names. Tests on the DBLP data set have shown that the extension of the language model with the co-author network helps to improve classification results, especially in cases of closely related languages and mixed names. They have also demonstrated the usability of the system in applications such as data cleaning and trends detection.

URLhttp://www.scopus.com/inward/record.url?eid=2-s2.0-77953242477&partnerID=40&md5=d76b422a52e3bdf48f54dbd4ba5cd9c1

Collaborative Partner

Institute of Electronic and Information Technology (IEIT)

Collaborative Partner

Collaborative Partner