Three-stage Short Text Language Identification Algorithm-

TitleThree-stage Short Text Language Identification Algorithm-
Publication TypeJournal Article
Year of Publication2017
AuthorsHasimu, M, Silamu, W
JournalJournal of Digital Information Management
Volume16
Issue6
Start Page354
Pagination354-372
Date Published12/2017
Type of ArticleResearch
Abstract

Text on the internet is written in different languages and scripts, and a language identification system is used to analyze and identify them. To improve the performance of text language identification, this paper proposes a three-stage short text language identification algorithm. The script of a given text is identified in the first stage of the algorithm. The language group to which it belongs, consisting of languages written in the same script, is identified in the second stage. In the third stage, the specific language of the given text is recognized from within the language group. Experimental results showed that our proposed method improves the accuracy of text language identification systems stage by stage, reduces the time and the size of the feature set needed to make a prediction, and achieves optimal accuracy.

Refereed DesignationRefereed

Collaborative Partner

Institute of Electronic and Information Technology (IEIT)

Collaborative Partner

Collaborative Partner