A comparative study on key phrase extraction methods in automatic Web Site Summarization

TitleA comparative study on key phrase extraction methods in automatic Web Site Summarization
Publication TypeJournal Article
Year of Publication2007
AuthorsZhang, Y, Milios, E, Zincir-Heywood, N
JournalJournal of Digital Information Management
Volume5
Issue5
Pagination323 - 332
Date Published2007
KeywordsKey phrase extraction, Web retrieval, Web site study
Abstract

Web Site Summarization is the process of automatically generating a concise and informative summary for a given Web site. It has gained more and more attention in recent years as effective summarization could lead to enhanced Web information retrieval systems such as searching for Web sites. Extraction-based approaches to Web site summarization rely on the extraction of the most significant sentences from the target Web site based on the density of a list of key phrases that best describe the entire Web site. In this work, we benchmark five alternative key phrase extraction methods, TFIDF, KEA, Keyword, Keyterm, and Mixture, in an automatic Web site summarization framework we previously developed. We investigate the performance of these underlying methods via a formal user study and demonstrate that Keyterm is the best choice for key phrase extraction while Mixture should be used to obtain key sentences. We also discuss why one method performs better than another and what could be done to further improve the summarization system.

URLhttp://www.scopus.com/inward/record.url?eid=2-s2.0-70350686085&partnerID=40&md5=1e62cb16df24be6167a1dbde14d4281c

Collaborative Partner

Institute of Electronic and Information Technology (IEIT)

Collaborative Partner

Collaborative Partner