Title | Spam blog filtering with bipartite graph clustering and mutual detection between spam blogs and words |
Publication Type | Journal Article |
Year of Publication | 2010 |
Authors | Ishida, K |
Journal | Journal of Digital Information Management |
Volume | 8 |
Issue | 2 |
Pagination | 108 - 116 |
Date Published | 2010 |
Keywords | Bipartite graph, Clustering, Filtering, Mutual detection, Spam blog, Spam word |
Abstract | This paper proposes a mutual detection mechanism between spam blogs and words with bipartite graph clustering for filtering spam blogs from updated blog data. Spam blogs are problematic in extracting useful marketing information from the blogosphere; they often appear to be rich sources of information based on individual opinion and social reputation. One characteristic of spam blogs is copied-and-pasted articles based on normal blogs and news articles. Another is multiple postings of the same article to increase the chances of exposure and income from advertising. Because of these characteristics, spam blogs share common words, and such blogs and words can form large spam bi-clusters. This paper explains how to detect spam blogs and spam words with mutual filtering based on such clusters. It reports that the maximum precision, or F-measure, of the filtering is 95%, based on a preliminary experiment with approximately six months' updated blog data and a more detailed experiment with one day's data. An advantage of this method for spam blog filtering, as compared to a machine learning approach, is also supported by experiments with SVM. |
URL | http://www.scopus.com/inward/record.url?eid=2-s2.0-79960276152&partnerID=40&md5=575c9621399a49b24a9d7c264810d304 |