Spam blog filtering with bipartite graph clustering and mutual detection between spam blogs and words

TitleSpam blog filtering with bipartite graph clustering and mutual detection between spam blogs and words
Publication TypeJournal Article
Year of Publication2010
AuthorsIshida, K
JournalJournal of Digital Information Management
Volume8
Issue2
Pagination108 - 116
Date Published2010
KeywordsBipartite graph, Clustering, Filtering, Mutual detection, Spam blog, Spam word
Abstract

This paper proposes a mutual detection mechanism between spam blogs and words with bipartite graph clustering for filtering spam blogs from updated blog data. Spam blogs are problematic in extracting useful marketing information from the blogosphere; they often appear to be rich sources of information based on individual opinion and social reputation. One characteristic of spam blogs is copied-and-pasted articles based on normal blogs and news articles. Another is multiple postings of the same article to increase the chances of exposure and income from advertising. Because of these characteristics, spam blogs share common words, and such blogs and words can form large spam bi-clusters. This paper explains how to detect spam blogs and spam words with mutual filtering based on such clusters. It reports that the maximum precision, or F-measure, of the filtering is 95%, based on a preliminary experiment with approximately six months' updated blog data and a more detailed experiment with one day's data. An advantage of this method for spam blog filtering, as compared to a machine learning approach, is also supported by experiments with SVM.

URLhttp://www.scopus.com/inward/record.url?eid=2-s2.0-79960276152&partnerID=40&md5=575c9621399a49b24a9d7c264810d304

Collaborative Partner

Institute of Electronic and Information Technology (IEIT)

Collaborative Partner

Collaborative Partner