Non-words Spell Corrector of Social Media Data in Message Filtering Systems –

Title Non-words Spell Corrector of Social Media Data in Message Filtering Systems –
Publication TypeJournal Article
Year of Publication2018
Authors,, ,, Aritsugi, M
JournalJournal of Digital Information Management
Volume16
Issue2
Start Page64
Pagination64-75
Date Published04/2018
Type of ArticleResearch
Abstract

We develop an extended version of spell checker and corrector to check non-word errors in social media datasets, which will be used in message filtering systems especially for cyberbullying detection. We use the dictionary techniques to check words, twelve-word spell error checking and correction approaches to correct the non-word errors, and n-gram and Levenshtein distance to select the most suitable word among corrected words. If there is more than one corrected word we get from each approach, we use n-gram techniques to choose the corrected and reasonable word from the words in n-gram database. When we used the Levenshtein distance in our previous work, we found that it selected the first corrected word and it was not a reasonable one in some sentences. Therefore, we use the n-gram database in this paper.

URLhttp://dline.info/fpaper/jdim/v16i2/jdimv16i2_2.pdf
Refereed DesignationRefereed

Collaborative Partner

Institute of Electronic and Information Technology (IEIT)

Collaborative Partner

Collaborative Partner