| Title | Clustering frequent graph patterns |
| Publication Type | Journal Article |
| Year of Publication | 2007 |
| Authors | Liu, Y, Li, J, Zhu, J, Gao, H |
| Journal | Journal of Digital Information Management |
| Volume | 5 |
| Issue | 6 |
| Pagination | 335 - 346 |
| Date Published | 2007 |
| Keywords | Closed graph pattern, Frequent graph pattern, Graph mining, Representative graph pattern, δ-jump pattern |
| Abstract | In recent years, graph mining has attracted much attention in the data mining community. Several efficient frequent subgraph mining algorithms have been recently proposed. However, the number of frequent graph patterns generated by these graph mining algorithms may be too large to be effectively explored by users, especially when the support threshold is low. In this paper, we propose to summarize frequent graph patterns by a much smaller number of representative graph patterns. Several novel concepts such as δ-cover graph, jump value and δ-jump pattern are proposed for efficiently summarizing frequent graph patterns. Based on the property of all δ-jump patterns being representative graph patterns, we propose two efficient algorithms for summarizing frequent graph patterns, RP-FP and RP-GD. The RP-FP algorithm computes representative graph patterns from a set of closed frequent graph patterns, whereas the RP-GD algorithm directly mines representative graph patterns from graph databases. The RP-FP has tight approximation (summarization) bound but higher computational complexity. The RP-GD has no approximation bound guarantee but is far more efficient. We conducted extensive experimental studies using various real and synthetic datasels. Experimental results show that RP-FP and RP-GD are able to obtain compact summarization in both real and synthetic graph databases. When the number of closed graph patterns is very large, RP-GD is much more efficient than RP-FP, while achieving comparable summarization quality. |
| URL | http://www.scopus.com/inward/record.url?eid=2-s2.0-49149118250&partnerID=40&md5=cae15cbd3bb5b1a9e3f57a95dbe295b5 |




