Title | A refined methodology for automatic keyphrase assignment to digital documents |
Publication Type | Journal Article |
Year of Publication | 2011 |
Authors | Khan, S, Fatima, I, Irfan, R, Latif, K |
Journal | Journal of Digital Information Management |
Volume | 9 |
Issue | 2 |
Pagination | 55 - 63 |
Date Published | 2011 |
Keywords | Automatic indexing, Keyphrase assignment, Vocabulary |
Abstract | Keyphrases precisely express the primary topics and themes of documents and are valuable for cataloging and classification. Manually assigning keyphrases to existing documents is a tedious task; therefore, automatic keyphrase generation has been extensively used to classify digital documents. Existing automatic keyphrase generation algorithms are limited in assigning semantically relevant keyphrases to documents. In this paper we have proposed a methodology to refine the result set of automatically generated keyphrases by Keyphrase Extraction Algorithm (KEA++), so that the keyphrases accurately and precisely represent the content of the document. Our approach is an additional layer at the top of KEA++ and exploits semantic relationships and hierarchical structure of the controlled vocabulary to filter out irrelevant keyphrases from the result set generated by KEA++. The methodology was applied on different sets of academic publications for evaluation. Evaluation demonstrates that the proposed refinement methodology improves the quality of generated keyphrases. |
URL | http://www.scopus.com/inward/record.url?eid=2-s2.0-79960690550&partnerID=40&md5=aeb220f4fd01fa335a93432979341abe |