Predictive Model for the classification of university students at risk of academic loss

María Cristina Gamboa Mora; Felix Vivián Mohr; Vicky Ahumada De La Rosa; Sulma Paola Vera-Monroy; Alexander Mejía-Camacho

doi:10.17081/eduhum.26.47.6379

PDF XML

Publicado jun 18, 2024

DOI https://doi.org/10.17081/eduhum.26.47.6379

María Cristina Gamboa Mora

Universidad Nacional Abierta y a Distancia, Bogotá, Colombia

Felix Vivián Mohr

Universidad de La Sabana

https://orcid.org/0000-0002-9293-2424

Vicky Ahumada De La Rosa

Universidad Nacional Abierta y a Distancia (UNAD)

https://orcid.org/0000-0002-8797-331X

Sulma Paola Vera-Monroy

Universidad de La Sabana

https://orcid.org/0000-0002-7573-4151

Alexander Mejía-Camacho

Universidad de Cundinamarca

https://orcid.org/0000-0003-4949-2045

Resumen

Para las instituciones de educación superior, predecir el riesgo de pérdida académica es un tema prioritario debido a los recursos invertidos por las instituciones, los estudiantes y la comunidad académica en general. Objetivo: el objetivo de esta investigación fue proponer un modelo adecuado que permita predecir a los estudiantes que están en riesgo de pérdida académica en un curso de química. Metodología: la investigación cuasi-experimental, predictiva y longitudinal se desarrolló con los datos de 103 estudiantes de cuatro universidades colombianas. Para construir el modelo se implementó una comparación de cinco algoritmos. Los datos se procesaron con Jupyter-Python. Resultados: el modelo de regresión logística (LR) se construyó con base en los resultados de los estudiantes en la prueba Saber 11 (examen nacional colombiano de admisión a la univer-sidad), en el que la penalización de falsos positivos con pesos diferentes a los falsos negativos mejoró el rendimiento del modelo. Conclusiones: se concluye que LR es sustancialmente mejor que un enfoque codicioso o de adivinanzas, además, se demostró que funciona mejor que un modelo de red neuronal.

Descargas

Número

Sección

Copyright Information

Referencias

Citas

Alhadabi, A., & Karpinski, A.C. (2020). Grit, self-efficacy, achievement orientation goals, and academic performance in university students. International Journal of Adolescence and Youth, 25(1), 519-535. https://doi.org/10.1080/02673843.2019.1679202
Ashraf, S., Saleem, S., Ahmed, T., Aslam, Z. and Muhammad, D. (2020). Conversion of adverse data corpus to shrewd output using sampling metrics. Visual Computing for Industry, Biomedicine and Art, 3(1), 1-13. https://doi.org/10.1186/s42492-020-00055-9
Ávila, L. K., Ospino, E., & Páez, A. J. (2021). Análisis de resultados de las pruebas saber 11 implementando técnicas de minería de datos [Analysis of Saber 11 test results by implementing data mining techniques]. Universidad del Norte. http://hdl.handle.net/10584/9877
Bai, R., Zhang, C., Wang, L., Yao, C., Ge, J., & Duan, H. (2020). Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level. Molecules, 25(10), 2357. https://doi.org/10.3390/molecules25102357
Beaulac, C., & Rosenthal, J. S. (2019). Predicting University Students’ Academic Success and Major Using Random Forests. Research in Higher Education, 60, 1048–1064. https://doi.org/10.1007/s11162-019-09546-y
Burman, I., & Som, S. (2019). Predicting students academic performance using support vector machine. Amity international conference on artificial intelligence (AICAI): 756-759.IEEE. https://doi.org/10.1109/AICAI.2019.8701260
Cheema, J. R. (2014). The Migrant Effect: An Evaluation of Native Academic Performance in Qatar. Research in Education, 91(1), 65-77. https://doi.org/10.7227/RIE.91.1.6
Coussement, K., Phan, M., De Caigny, A., Benoit, D., & Raes, A. (2020). Predicting student dropout in subscription-based online learning environments: The beneficial impact of the logit leaf model. Decision Support Systems, 135, 1-13. https://doi.org/10.1016/j.dss.2020.113325
Deri, M., Mills, P., & McGregor, D. (2018). Structure and Evaluation of a Flipped General Chemistry Course as a Model for Small and Large Gateway Science Courses at an Urban Public Institution. Journal of College Science Teaching, 47(3), 68–77. https://doi.org/10.2505/4/jcst18_047_03_68
Ene, E., & Ackerson, B. (2018) Assessing learning in small sized physics courses. Physical Review Physics Education Research, 14(010102), 1-21. https://doi.org/10.1103/PhysRevPhysEducRes.14.010102
Fay, R., & Negangard, E. (2017). Educational Case. Manual journal entry testing: Data analytics and the risk of fraud. Journal of Accounting Education, 38, 37-49. https://doi.org/10.1016/j.jaccedu.2016.12.004
Gamboa, M. (2014). La evaluación externa en el área de Ciencias a través de las pruebas masivas a gran escala TIMMS y PISA. Análisis del desempeño de los estudiantes colombianos y españoles. Universidad Distrital Francisco José de Caldas y Universidad Nacional Abierta y a Distancia. https://repository.unad.edu.co/bitstream/handle/10596/2792/9789588832692.pdf?sequence=4&isAllowed=y
Gamboa, M., Ahumada, V., Vera-Monroy, S., Mejía-Camacho, A., & Romero, J. C. (2020). Estudio de las variables asociables al rendimiento académico en la asignatura de Química en cuatro universidades colombianas. Universidad Nacional Abierta y a Distancia. https://doi.org/10.22490/9789586517454
Gazdula, J., & Farr, R. (2020). Teaching Risk and Probability: Building the Monopoly Board Game In to a Probability Simulator. Management Teaching Review, 5(2), 133-143. https://doi.org/10.1177/2379298119845090
Gill, H. S., Khehra, B. S., Singh, A., & Kaur, L. (2019). Teaching-learning-based optimization algorithm to minimize cross entropy for Selecting multilevel threshold values. Egyptian Informatics Journal, 20(1), 11-25. https://doi.org/10.1016/j.eij.2018.03.006
Gladshiya, V., & Sharmila, K. (2021). A HML-EVC Model for Analyzing the Risk of the Students to Predict the Success Probability in the Field of Education. In: 10th International Conference on System Modeling & Advancement in Research Trends (SMART), 341-344. https://doi.org/10.1109/SMART52563.2021.9676327
Goyal, M., & Vohra, R. (2012). Applications of Data Mining in Higher Education. International Journal of Computer Science Issues, 9(2). https//10.17148/IJARCCE.2020.9124
Hall, K., & Marchan, P. (2000). Predictors of the Academic Performance of Teacher Education Students. Research in Education, 63(1), 89-99. https://doi.org/10.7227/RIE.63.9
Harada, T., (2020). Learning from success or failure?–Positivity biases revisited. Frontiers in Psychology, 11, 1627. https://doi.org/10.3389/fpsyg.2020.01627
He, X., Zhao, K., & Chu, X. (2021). AutoML: A survey of the state-of-the-art. Knowledge-Based Systems, 212, 1-27. https://doi.org/10.1016/j.knosys.2020.106622
Heredia, J. J., Rodríguez, A. G., & Vilalta, J. A. (2014). Predicción del rendimiento en una asignatura empleando la regresión logística ordinal [Predicting Performance in a Subject Using Ordinal Logistic Regression]. Estudios Pedagógicos, XL(1), 145-162. http://dx.doi.org/10.4067/S0718-07052014000100009
Instituto Colombiano para Evaluación de la Educación – ICFES. (2018). Guía de orientación Saber 11. 2019-1 [Colombian Institute for Educational Evaluation - ICFES. (2018). Orientation Guide, Saber-11 Test. 2019-1. ICFES publishing]. ICFES https://www.icfes.gov.co/documents/20143/193560/Guia+de+orientacion+saber+11+de+2019.pdf/13d64150-fa02-9062-8bb8-dcee660607c5
Joshi, A. V. (2020). Decision Trees. In: Machine Learning and Artificial Intelligence. Springer, Cham. 53-63. https://doi.org/10.1007/978-3-030-26622-6_6
Junca, J. A. (2019). Desempeño académico en las Pruebas Saber 11 [Academic performance in the Saber 11 tests]. Panorama Económico, 27(1), 8-38. https://doi.org/10.19053/01211129.v30.n58.2021.13823
Lau, E. T., Sun, L., & Yang, Q. (2019). Modeling, prediction and classification of student academic performance using artificial neural networks. SN Applied Sciences, 1(9), 1-10. https://doi.org/10.1007/s42452-019-0884-7
Lee, S., & Chung, J. Y. (2019). The Machine Learning-Based Dropout Early Warning System for Improving the Performance of Dropout Prediction. Applied Sciences, 9(15), 3093. https://doi.org/10.3390/app9153093
Miguéis, V. L., Freitas, A., García, P., & Silva, A. (2018). Early segmentation of students according to their academic performance: A predictive modelling approach. Decision Support Systems, 115, 36-51. https://doi.org/10.1016/j.dss.2018.09.001
Ministerio de Educación Nacional (MEN). (2004). Estándares básicos de competencias en Ciencias Naturales y Sociales. Formar en ciencias: ¡El desafío! Lo que necesitamos saber y saber hacer [Basic standards of competencies in Natural and Social Sciences. Science Education, the challenge! What we need to know and know how to do]. MEN. https://www.mineducacion.gov.co/1759/articles-81033_archivo_pdf.pdf
Mohr, F., Wever, M., Tornede, A., & Hüllermeier, E. (2021). "Predicting Machine Learning Pipeline Runtimes in the Context of Automated Machine Learning," in IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9), 3055-3066. https://doi.org/10.1109/TPAMI.2021.3056950
Mohr, F., Wever, M. (2023). Naive automated machine learning. Machine Learning, 112(4), 1131-1170. https://doi.org/10.1007/s10994-022-06200-0
Ndirika, M. C. and Njoku, U. J. (2012). Home Influences on the Academic Performance of Agricultural Science Students in Ikwuano Local Government Area of Abia State, Nigeria. Research in Education, 88(1), 75-84. https://doi.org/10.7227/RIE.88.1.7
Niu, L. (2020). A review of the application of logistic regression in educational research: common issues, implications, and suggestions, Educational Review, 72(1), 41-67. https://doi.org/10.1080/00131911.2018.1483892
Olaleye, T., & Vincent, O. (2020). A Predictive Model for Students Performance and Risk Level Indicators Using Machine Learning. In: 2020 International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS 2020), 1-7. https://doi.org/10.1109/ICMCECS47690.2020.240897
Park, E., & Dooris, J. (2020). Predicting student evaluations of teaching using decision tree analysis. Assessment & Evaluation in Higher Education, 45(5), 776-793. https://doi.org/10.1080/02602938.2019.1697798
Peña, Y., & González, J.J.F. (2022). Modelo de predicción de los resultados de la prueba ICFES Saber 11 en el área de matemáticas a partir de variables socioeconómicas [Prediction model of the results of the ICFES Saber 11 test in the area of mathematics based on socio-economic variables. Studies in Engineering and Exact] Sciences, Curitiba, 3(1), 31-37. https://doi.org/10.54021/seesv3n1-006
Planinic, M., Boone, W., Susac, A., & Ivanjek, L. (2019). Rasch analysis in physics education research: Why measurement matters. Physical Review Physics Education Research, 15(020111), 1-14. https://doi.org/10.1103/PhysRevPhysEducRes.15.020111
Ramos, D., Pedroso, J., Lozano, A., & González, J. (2018). Deconstructing Cross-Entropy for Probabilistic Binary Classifiers. Entropy, 20, 208. https://doi.org/10.3390/e20030208
Robinson, K., Perez, T., Carmel, J., & Linnenbrink, L. (2019). Science identity development trajectories in a gateway college chemistry course: Predictors and relations to achievement and STEM pursuit. Contemporary Educational Psychology, 56, 180-192. https://doi.org/10.1016/j.cedpsych.2019.01.004
Rodríguez, F. J., Benavides, H., & Riascos, A.J. (2018). Predicción del desempeño académico usando técnicas de aprendizaje de máquinas [Prediction of academic performance using machine learning techniques]. Universidad de los Andes. ICFES.
Salmerón-Pérez, H., Gutierrez-Braojos, C., Fernández-Cano, A., & Salmeron-Vilchez, P. (2010). Self-regulated learning, self-efficacy beliefs and performance during the late childhood. RELIEVE, 16(2), 1-18. https://doi.org/10.7203/relieve.16.2.4136
Selwyn, N., Pangrazio, L., & Cumbo, B. (2021). Attending to data: Exploring the use of attendance data within the datafied school. Research in Education, 109(1), 72–89. https://doi.org/10.1177/0034523720984200
Son, L. H., & Fujita, H. (2019). Neural-fuzzy with representative sets for prediction of student performance. Applied Intelligence, 49, 172–187. https://doi.org/10.1007/s10489-018-1262-7
Soo, J., Lok, V., Bong, K., Wha, Y., & Ook, B. (2021). Quantitative risk-based inspection approach for high-energy piping using a probability distribution function and modification factor. International Journal of Pressure Vessels and Piping, 189, 1-14. https://doi.org/10.1016/j.ijpvp.2020.104281
Suárez-Montes, N., & Díaz-Subieta, L. B. (2015). Estrés académico, deserción y estrategias de retención de estudiantes en la educación superior [Academic stress, desertion, and retention strategies for students in higher education]. Revista de Salud Pública, 17(2), 300–313. https://doi.org/10.15446/rsap.v17n2.52891
Tai-Chui, K., Chun, D., Lytras, M., & Miu-Lam, T. (2020). Predicting at-risk university students in a virtual learning environment via a machine learning algorithm. Computers in Human Behavior, 107. https://doi.org/10.1016/j.chb.2018.06.032
Tsiakmaki, M., Kostopoulos, G., & Kotsiantis, S. (2021). Fuzzy-based active learning for predicting student academic performance using autoML: a step-wise approach. Journal of Computing in Higher Education, 33, 635–667. https://doi.org/10.1007/s12528-021-09279-x
Tsiakmaki, M., Kostopoulos, G., Kotsiantis, S., & Ragos, O. (2020). Transfer Learning from Deep Neural Networks for Predicting Student Performance. Applied Sciences, 10(6), 2145. https://doi.org/10.3390/app10062145
Vargas, V., & Ardila, L. F. (2019). Predicción del desempeño en las pruebas Saber 11 utilizando variables del contexto socio-económico de los aplicantes mediante un análisis estadístico con técnicas de machine learning [Performance prediction on Saber 11 Tests using socio-economic variables of the applicants through a statistical analysis with machine learning techniques]. Universidad Nacional de Colombia.
Vargas, V., Gutiérrez, P., & Hervás, C. (2022). Unimodal regularisation based on beta distribution for deep ordinal regression. Pattern Recognition, 122. https://doi.org/10.1016/j.patcog.2021.108310
Waheed, R., Sarwar, S., Sarwar, S., & Khan, M. K. (2020). The impact of COVID-19 on Karachi stock exchange: Quantile-on-quantile approach using secondary and predicted data. Journal of Public Affairs, 20(4), e2290. https://doi.org/10.1002/pa.2290
Wang, Y., Pan, Z., Yuan, X., Yang, C., & Gui, W. (2020). A novel deep learning based fault diagnosis approach for chemical process with extended deep belief network. ISA Transactions, 96, 457-467. https://doi.org/10.1016/j.isatra.2019.07.001
Yang, S., Lu, O., Huang, A., Huang, J., Ogata, H., & Lin, A. (2018). Predicting Students' Academic Performance Using Multiple Linear Regression and Principal Component Analysis. Journal of Information Processing, 26, 170–176. https://doi.org/10.2197/ipsjjip.26.170
Zois, E., Alexandridis, A., & Economou, G. (2019). Writer independent offline signature verification based on asymmetric pixel relations and unrelated training-testing datasets. Expert Systems with Applications, 125, 14-32. https://doi.org/10.1016/j.eswa.2019.01.058

Barra lateral de artículos

Contenido principal de artículos

Resumen

Descargas

Detalles de artículo

Citas