Retrieving and Processing Images from the Pages of a Historical Newspaper and Modeling the Text Topics-

TitleRetrieving and Processing Images from the Pages of a Historical Newspaper and Modeling the Text Topics-
Publication TypeJournal Article
Year of Publication2021
AuthorsSá, GJ de A, Maia, JEB
JournalJournal of Digital Information Management
Volume19
Issue2
Start Page41
Pagination41-46
Date Published06/2021
Type of ArticleResearch
Abstract

Historical newspapers are a source of research for the human and social sciences. However, these image collections are difficult to read by machine due to the low quality of the print, the lack of standardization of the pages in addition to the low quality photograph of some files. This paper presents the processing model of a topic navigation system in historical newspaper page images. The general procedure consists of four modules which are: segmentation of text sub-images and text extraction, preprocessing and representation, induced topic extraction and representation, and document viewing and retrieval interface. The algorithmic and technological approaches of each module are described and the initial test results about a collection covering a range of 28 years are presented.

URLhttp://www.dline.info/download.php?sn=3258
DOI10.6025/jdim/2021/19/2/41-46
Refereed DesignationRefereed

Collaborative Partner

Institute of Electronic and Information Technology (IEIT)

Collaborative Partner

Collaborative Partner