

<?xml version="1.0" encoding="UTF-8"?>
<record>
  <title>Predicting Student Academic Performance Through Behavioral Engagement Metrics: A Random Forest-Based Machine Learning Approach</title>
  <journal>Journal of E-Technology</journal>
  <author>Hathairat Ketmaneechairat</author>
  <volume>17</volume>
  <issue>2</issue>
  <year>2026</year>
  <doi>https://doi.org/10.6025/jet/2026/17/2/33-58</doi>
  <url>https://www.dline.info/jet/fulltext/v17n2/jetv17n2_1.pdf</url>
  <abstract>This study presents a Random Forest-based machine learning framework for predicting student academic
performance using behavioral engagement metrics within e-learning environments. Analyzing a dataset of
14,003 student records encompassing academic, behavioral, and demographic attributes, we evaluated
three predictive models Linear Regression, Random Forest, and Multilayer Perceptron neural networks to
forecast final academic grades categorized into four ordinal levels. A critical methodological contribution of
this research is the identification and mitigation of data leakage: initial experiments revealed that including
ExamScore as a predictor artificially inflated performance metrics (RÂ² = 1.000), as FinalGrade is
deterministically derived from examination scores. After excluding this variable, the hyperparameter-tuned
Random Forest classifier emerged as the superior model, achieving an RÂ² score of 0.792, a classification
accuracy of 84.2%, and a weighted F1-score of 0.842, significantly outperforming baseline approaches.
Feature importance analysis demonstrated that behavioral engagement indicators specifically
AssignmentCompletion (18.3%), Attendance (18.0%), and StudyHours (16.5%) were the most influential
predictors, whereas demographic variables such as Gender (3.0%) exhibited minimal predictive power.
These findings suggest that modifiable learning behaviors, rather than static demographic characteristics,
drive academic outcomes. The study provides actionable insights for educational institutions to develop
early intervention systems that monitor engagement metrics and deliver equitable, personalized support.
Limitations include the cross-sectional design and institutional specificity, warranting future research on
temporal modeling, explainable AI integration, and cross institutional validation to enhance generalizability
and ethical deployment of predictive learning analytics.</abstract>
</record>
