Cluster based mixed coding schemes for inverted file index compression

TitleCluster based mixed coding schemes for inverted file index compression
Publication TypeJournal Article
Year of Publication2008
AuthorsChen, J, Zhong, P, Cook, T
JournalJournal of Digital Information Management
Volume6
Issue1
Pagination30 - 37
Date Published2008
Keywordsd-gap, Index compression, Inverted file, Inverted list
Abstract

The cluster property of document collections in today's search engines provides valuable information for index compression. By clustering d-gaps of an inverted list based on a threshold, and then encoding clustered and non-clustered d-gaps using different methods, we can tailor to the specific properties of different d-gaps and achieve better compression ratio. Based on this idea, in this paper we propose a cluster based approach and presents two new codes for inverted file index compression: mixed gamma/flat binary code and mixed delta/flat binary code, Experiment results show that the two new codes achieve better or equal performance in terms of compression ratio comparing to interpolative code which is considered as the most efficient bitwise code at present Besides, the two new codes have much lower complexity comparing to interpolative code and therefore enable faster encoding and decoding. By adjusting the parameters for the mixed codes, even better results may be achieved. Experiments show promising results with our approaches.

URLhttp://www.scopus.com/inward/record.url?eid=2-s2.0-56849117299&partnerID=40&md5=e6f756f0e7e96db7d5c31eadd73799d6

Collaborative Partner

Institute of Electronic and Information Technology (IEIT)

Collaborative Partner

Collaborative Partner