Intelligent Data Mining of Archival Documents Using BP Neural Networks and Genetic Algorithms

Intelligent Data Mining of Archival Documents Using BP Neural Networks and Genetic Algorithms Journal of E-Technology Xiaojuan Chen 17 1 2026 https://doi.org/10.6025/jet/2026/17/1/19-27 https://www.dline.info/jet/fulltext/v17n1/jetv17n1_3.pdf With the rapid growth of unstructured data particularly in archival and document management systems traditional data processing methods have become inadequate for extracting meaningful knowledge. This paper addresses the challenge by integrating BP neural networks and genetic algorithms to enhance data mining capabilities on heterogeneous, text-based archival records. The authors propose a data warehouse architecture using a star schema to organize document metadata, including filing year, unit, type, and content, while tackling pervasive data quality issues such as missing values, inconsistent cataloging standards, and formatting errors. Data cleaning is performed using SQL Server Integration Services (SSIS), and text features are represented using TF-IDF for improved neural network input representation. The study highlights that a significant portion of archival data is underutilized due to poor standardization and incomplete metadata, which hampers effective mining. Preliminary results suggest that intelligent data preprocessing combined with optimized neural networks can uncover latent patterns in document usage and improve archival decision making. The work lays the foundation for future research in automated, intelligent archival systems, though challenges remain in data integration, standardization, and model refinement.