

<?xml version="1.0" encoding="UTF-8"?>
<record>
  <title>Intelligent Data Mining of Archival Documents Using BP Neural Networks and Genetic Algorithms</title>
  <journal>Journal of E-Technology</journal>
  <author>Xiaojuan Chen</author>
  <volume>17</volume>
  <issue>1</issue>
  <year>2026</year>
  <doi>https://doi.org/10.6025/jet/2026/17/1/19-27</doi>
  <url>https://www.dline.info/jet/fulltext/v17n1/jetv17n1_3.pdf</url>
  <abstract>With the rapid growth of unstructured data particularly in archival and document management systems
traditional data processing methods have become inadequate for extracting meaningful knowledge. This
paper addresses the challenge by integrating BP neural networks and genetic algorithms to enhance data
mining capabilities on heterogeneous, text-based archival records. The authors propose a data warehouse
architecture using a star schema to organize document metadata, including filing year, unit, type, and
content, while tackling pervasive data quality issues such as missing values, inconsistent cataloging
standards, and formatting errors. Data cleaning is performed using SQL Server Integration Services (SSIS),
and text features are represented using TF-IDF for improved neural network input representation. The study
highlights that a significant portion of archival data is underutilized due to poor standardization and
incomplete metadata, which hampers effective mining. Preliminary results suggest that intelligent data
preprocessing combined with optimized neural networks can uncover latent patterns in document usage
and improve archival decision making. The work lays the foundation for future research in automated,
intelligent archival systems, though challenges remain in data integration, standardization, and model
refinement.</abstract>
</record>
