Predicting Student Academic Performance Through Behavioral Engagement Metrics: A Random Forest-Based Machine Learning Approach

Predicting Student Academic Performance Through Behavioral Engagement Metrics: A Random Forest-Based Machine Learning Approach Journal of E-Technology Hathairat Ketmaneechairat 17 2 2026 https://doi.org/10.6025/jet/2026/17/2/33-58 https://www.dline.info/jet/fulltext/v17n2/jetv17n2_1.pdf This study presents a Random Forest-based machine learning framework for predicting student academic performance using behavioral engagement metrics within e-learning environments. Analyzing a dataset of 14,003 student records encompassing academic, behavioral, and demographic attributes, we evaluated three predictive models Linear Regression, Random Forest, and Multilayer Perceptron neural networks to forecast final academic grades categorized into four ordinal levels. A critical methodological contribution of this research is the identification and mitigation of data leakage: initial experiments revealed that including ExamScore as a predictor artificially inflated performance metrics (RÂ² = 1.000), as FinalGrade is deterministically derived from examination scores. After excluding this variable, the hyperparameter-tuned Random Forest classifier emerged as the superior model, achieving an RÂ² score of 0.792, a classification accuracy of 84.2%, and a weighted F1-score of 0.842, significantly outperforming baseline approaches. Feature importance analysis demonstrated that behavioral engagement indicators specifically AssignmentCompletion (18.3%), Attendance (18.0%), and StudyHours (16.5%) were the most influential predictors, whereas demographic variables such as Gender (3.0%) exhibited minimal predictive power. These findings suggest that modifiable learning behaviors, rather than static demographic characteristics, drive academic outcomes. The study provides actionable insights for educational institutions to develop early intervention systems that monitor engagement metrics and deliver equitable, personalized support. Limitations include the cross-sectional design and institutional specificity, warranting future research on temporal modeling, explainable AI integration, and cross institutional validation to enhance generalizability and ethical deployment of predictive learning analytics.