By browsing this website, you acknowledge the use of a simple identification cookie. It is not used for anything other than keeping track of your session from page to page. OK
1

Predicting workplace absenteeism using machine learning: a pilot study

Bookmarks
Article

Llamas Blázquez, Pablo

Journal of Occupational Medicine and Toxicology

2025

20

38

2 of 6

absenteeism ; occupational health ; artificial intelligence ; risk assessment

Brazil

Personnel management

https://doi.org/10.1186/s12995-025-00482-5

English

Bibliogr.

"Abstract: Background Workplace absenteeism represents a significant challenge for organizations and occupational health practitioners, with substantial implications for productivity, healthcare costs, and employee well-being. Traditional approaches to absenteeism management remain largely reactive, highlighting the need for predictive models that enable proactive interventions.
Objective
To develop and validate machine learning models for predicting workplace absenteeism patterns and identifying risk factors associated with prolonged absence in a pilot study framework, thereby demonstrating feasibility for evidence-based occupational health interventions.
Methods
This pilot study employed machine learning algorithms on a publicly available workplace absenteeism dataset from a Brazilian company (2007–2010) obtained from the UCI Machine Learning Repository. The dataset comprised 740 instances with 19 variables including demographic characteristics, clinical indicators (BMI, ICD-10 coded absence reasons), and occupational factors. Random Forest and Gradient Boosting algorithms were implemented for both classification of prolonged absences and regression of absence duration. Statistical outliers (> 30 h, 3.8% of cases) were excluded to focus on typical absence patterns.
Results
The developed models demonstrated feasibility for workplace absenteeism prediction within this pilot framework. The Random Forest classification model achieved 84% accuracy (AUC = 0.89) for distinguishing between typical and prolonged absences. For duration prediction of typical absences (≤ 30 h), the Random Forest regression model yielded R² = 0.13, RMSE = 3.93 h, and MAE = 2.37 h. Key predictors included absence reason (ICD-10 classification), body mass index, and workload metrics, with notable interactions between workload intensity and specific absence categories.
Conclusions
This pilot study demonstrates the feasibility of machine learning approaches for occupational health management by enabling identification of employees at risk for prolonged absenteeism. While showing promise for supporting personalized health interventions and resource allocation, implementation requires external validation across multiple organizations and careful consideration of ethical implications regarding employee privacy and algorithmic fairness."

This work is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Digital



Bookmarks