Title | Three-stage Short Text Language Identification Algorithm- |
Publication Type | Journal Article |
Year of Publication | 2017 |
Authors | Hasimu, M, Silamu, W |
Journal | Journal of Digital Information Management |
Volume | 16 |
Issue | 6 |
Start Page | 354 |
Pagination | 354-372 |
Date Published | 12/2017 |
Type of Article | Research |
Abstract | Text on the internet is written in different languages and scripts, and a language identification system is used to analyze and identify them. To improve the performance of text language identification, this paper proposes a three-stage short text language identification algorithm. The script of a given text is identified in the first stage of the algorithm. The language group to which it belongs, consisting of languages written in the same script, is identified in the second stage. In the third stage, the specific language of the given text is recognized from within the language group. Experimental results showed that our proposed method improves the accuracy of text language identification systems stage by stage, reduces the time and the size of the feature set needed to make a prediction, and achieves optimal accuracy. |
Refereed Designation | Refereed |