Spam blog filtering with bipartite graph clustering and mutual detection between spam blogs and words

Title	Spam blog filtering with bipartite graph clustering and mutual detection between spam blogs and words
Publication Type	Journal Article
Year of Publication	2010
Authors	Ishida, K
Journal	Journal of Digital Information Management
Volume	8
Issue	2
Pagination	108 - 116
Date Published	2010
Keywords	Bipartite graph, Clustering, Filtering, Mutual detection, Spam blog, Spam word
Abstract	This paper proposes a mutual detection mechanism between spam blogs and words with bipartite graph clustering for filtering spam blogs from updated blog data. Spam blogs are problematic in extracting useful marketing information from the blogosphere; they often appear to be rich sources of information based on individual opinion and social reputation. One characteristic of spam blogs is copied-and-pasted articles based on normal blogs and news articles. Another is multiple postings of the same article to increase the chances of exposure and income from advertising. Because of these characteristics, spam blogs share common words, and such blogs and words can form large spam bi-clusters. This paper explains how to detect spam blogs and spam words with mutual filtering based on such clusters. It reports that the maximum precision, or F-measure, of the filtering is 95%, based on a preliminary experiment with approximately six months' updated blog data and a more detailed experiment with one day's data. An advantage of this method for spam blog filtering, as compared to a machine learning approach, is also supported by experiments with SVM.
URL	http://www.scopus.com/inward/record.url?eid=2-s2.0-79960276152&partnerID=40&md5=575c9621399a49b24a9d7c264810d304

Collaborative Partner

Institute of Electronic and Information Technology (IEIT)

Collaborative Partner

Collaborative Partner

High Education Forum, Taiwan