

Mar 2025 - Jun 2025
Employee Absenteeism Prediction Model
This project uses a logistic regression model to predict employee absenteeism, identifying high-risk individuals and uncovering key factors influencing absentee behavior. The model is packaged into a reusable Python module, with insights visualized in an interactive Power BI dashboard for HR teams.
Overview
This project revolves around building a logistic regression model for tackling a common challenge in HR: predicting and understanding absenteeism. The model has three main objectives:
Identify high-risk employees likely to take unscheduled leaves.
Predict future absenteeism probability based on employee attributes.
Uncover key factors that influence absenteeism behavior.
The model was designed using a variety of features, including medical conditions, family size, and commute costs, among others. The result? A reusable Python module that can seamlessly integrate with new employee data, plus a Power BI dashboard for clear, interactive visualizations of predicted absenteeism probabilities.
Approach
Before jumping into the model-building phase, I focused on data preparation. I started by engineering a binary classification target to define absenteeism, then applied custom scaling to select features that were most relevant. With the data prepped, I split it for training and testing.
Using logistic regression, I analyzed the feature weights and odds ratios to pinpoint the factors that most significantly drove absenteeism. I refined the model by applying statistical techniques like backward elimination to improve its accuracy. After validating the model using performance metrics and a confusion matrix, I serialized the model with Pickle, enabling it to make future predictions on new data.
Integration & Visualization
I built the model with the intention of making it reusable across different datasets. I packaged it into a modular Python module, automating the entire preprocessing and prediction workflow for new employee records. To make the insights actionable, I created an intuitive Power BI dashboard that visualized absenteeism patterns across various attributes like medical conditions, commute costs, and age groups.



