Applying Shifted-Beta-Geometric and Beta-Discrete-Weibull Models for Employee Retention Curve Projection

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 116,071 characters · extracted from preprint-html · click to expand
Applying Shifted-Beta-Geometric and Beta-Discrete-Weibull Models for Employee Retention Curve Projection | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Applying Shifted-Beta-Geometric and Beta-Discrete-Weibull Models for Employee Retention Curve Projection Evgeny A. Antipov, Anastasia Gagarskaya, Yulia Trofimova, Elena Pokryshevskaya This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4765185/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 19 Feb, 2025 Read the published version in Operations Research Forum → Version 1 posted 9 You are reading this latest preprint version Abstract Employees are vital assets to any organization, and their departure can result in reduced human capital and operational disruptions. To mitigate this, companies employ predictive analysis to forecast potential employee churn. Probability-based modeling for projecting employee churn is an underexplored area in HR analytics. This paper tests the applicability of the shifted-beta-geometric (sBG) and beta-discrete-Weibull (BdW) models within the context of employee survival projection. Using data from three cohorts of employees, we compare the results of these models with each other as well as with linear and logarithmic regressions. Our key finding is the superior performance of the BdW model, which can capture differences in churn rates between employees and within employees over time. The beta distribution captures the heterogeneous employee loyalty, while the Weibull distribution effectively captures retention rate changes over time. Our research demonstrates that parsimonious probabilistic models, which require minimal data and have so far been used only in customer analytics, can be applied in HR analytics for projecting employee retention curves. employee churn churn prediction probabilistic models beta distribution geometric distribution Weibull distribution Figures Figure 1 1. Introduction Churn prediction is a significant focus of customer analytics research (Yiğit & Shourabizadeh, 2017), but employee churn, which incurs substantial losses for companies (Saridakis & Cooper, 2016 ), receives less attention (Saradhi & Palshikar, 2011 ; Yiğit & Shourabizadeh, 2017). Predicting employee churn involves evaluating turnover rates over specific periods (Gentek, 2022 ). Employee churn represents a loss of intellectual assets for a company (Musanga & Chibaya, 2023), leading to operational disruptions, knowledge loss, and increased hiring costs (Winne et al., 2018 ). High turnover requires constant recruitment, but the irreplaceability of experienced employees exacerbates the issue (Suraihi et al., 2021). The significant expenses of adaptation, training, and salaries contribute to these losses (Naz et al., 2022 ). Thus, organizations seek to predict turnover and enhance retention strategies (Musanga & Chibaya, 2023). Accurate predictions enable companies to improve retention strategies and maintain a stable workforce. Predictive models for employee churn can help prevent workforce loss and provide a competitive advantage. At the same time, many businesses are unfamiliar with effective churn projection approaches (Musanga & Chibaya, 2023). Job quitting and customer churn in contractual settings are similar stochastic processes. Both involve individuals leaving their group at the end of a period (Alamsyah & Salma, 2018 ; Gentek, 2022 ). Key metrics in these processes include retention and churn rates. The retention rate measures the proportion of individuals who remain active throughout the period, while the churn rate measures the proportion who leave by the end (Fader & Hardie, 2007 ). Existing literature categorizes churn prediction models into two broad groups: machine learning approaches and probabilistic models based on statistical principles (Gentek, 2022 ). Various studies compare these models, with machine learning and predictive analytics being commonly used for their ability to process large datasets and capture complex relationships (Fader & Hardie, 2018). Commonly used machine learning models include traditional classification algorithms (e.g., Logistic Regression, Random Forest, Naive Bayes, SVM, Decision Trees) (Kim et al., 2008 ; Alamsyah & Salma, 2018 ; Singh et al., 2019 ; Nestor et al., 2019 ; Gentek, 2022 ), deep learning models (Mena et al., 2019 ; Ozcan & Ozmen, 2021), and predictive analysis with time series models (e.g., ARIMA, ETS, XGB, LightGBM) (Jung, 2011; Javed & Azhar, 2017 ; Gregory, 2018 ). These models require large, representative datasets and often struggle with non-linear relationships and changing behaviors (Naz et al., 2022 ; Nestor et al., 2019 ; Kang & Oh, 2023). In many cases, limited data makes traditional predictive models impractical. Therefore, this research focuses on probabilistic models for churn or retention projection that do not require extensive data but can be applied to short time series of employee cohort sizes over time. Probabilistic models, such as time series analysis, Bayesian models, survival analysis, logistic regression, SDEs, Poisson models, and exponential distributions, use probability distributions to project individual behaviors into the future (Tamaddoni et al., 2016). Probabilistic models offer advantages over machine learning approaches, including ease of implementation, transparency, and stability with limited data (Chakraborty et al., 2015 ). Most studies on probabilistic models focus on customers, medical science, finance, demographics, politics, weather, and sports (Gandy, 2012 ; Berry et al., 2020 ; Zhang & Thomas, 2012; Jun et al., 2016 ; Kolasa & Rubaszek, 2012; Zhou et al., 2022; Alho, 2014 ; Levene & Fenner, 2021 ; Iversen et al., 2016 ; Boshnakov et al., 2017 ). Fader and Hardie’s probabilistic models, primarily used for customer churn, retention, or lifetime value prediction, provide a theoretical framework based on probabilistic and statistical principles (Fader & Hardie, 2018). These models offer valuable insights for understanding customer behavior and can be adapted for employee churn/retention forecasting. This study contributes to the literature by exploring the application of Fader and Hardie’s shifted-beta geometric (sBG) and Beta-discrete Weibull (BdW) models (originally developed for customer analytics) to the case of employee churn prediction. We compare the out-of-sample performance of these probabilistic models using a logarithmic trend model as a benchmark. The results offer practical insights for HR analytics, presenting an alternative solution for employee churn prediction that is easy to implement without machine learning. The remainder of this study is organized as follows. The next section describes the chosen datasets. Section 3 presents the models applied in the study and compares them using multiple training/testing splits. Sections 4 and 5 discuss the results, findings, and conclusions, providing academic and managerial insights and proposing the most suitable model for specific cases. 2. Data Our research leverages two unique US open data sources, typically not publicly available, offering an unparalleled opportunity to study the dynamics of employee cohort size. The first dataset provides information on regular hire employees within the County of Marin government organization, covering various departments from 2012 to 2020 . The second dataset includes Baton Rouge’s City-Parish employees from 2005 to 2017 , encompassing employment status until transitioning to a new payroll system. This dataset spans multiple departments, enabling a more comprehensive analysis. We examined the dynamics of three cohorts by tracking the number of employees remaining in each cohort over time: Dataset 1 : This dataset contains the monthly dynamics of a cohort from the County of Marin's Public Works Department, registered to be hired in December 2012 (initial size: 1275 employees) from December 2012 to December 2018. A noteworthy feature of this dataset is that the December 2012 cohort includes not only those hired in December 2012 itself but also those hired in previous months who were still active in December 2012. While this cohort definition is non-standard, it is practical for employers who rarely have large monthly cohorts but wish to project the monthly dynamics of employees active as of a specific date. Dataset 2 : This dataset details the monthly dynamics of a cohort from the Baton Rouge Police Department, hired in December 2005 (initial size: 53 employees) from December 2005 to December 2017. This cohort uses a standard definition, consisting of individuals hired within the same month. Dataset 3 : Similar to Dataset 2, this dataset includes the monthly dynamics of a cohort from the Baton Rouge Public Works Department, hired in December 2005 (initial size: 45 employees) from December 2005 to December 2017, also using a standard cohort definition. By analyzing these datasets, we aim to gain insights into the retention and turnover trends within different government departments, thereby contributing valuable knowledge to workforce management and planning. 3. Methods 3.1 sBG Model by Fader & Hardie ( 2007 ) The shifted-beta-geometric (sBG) distribution, introduced by Fader & Hardie ( 2007 ), is a model designed to predict customer retention rates based on single cohort data. This model operates within a hypothetical contractual framework, examining annual retention rates for cohorts of individuals. As a discrete-time model for contract duration, the sBG forecasts future churn rates within a cohort by analyzing past churn data (Fader & Hardie, 2009 ). Unlike other probability models such as Bayesian approaches or survival analysis methods like Kaplan-Meier estimation or Cox proportional hazards model, the sBG model requires limited data and involves only a few calculations, making it implementable with standard Microsoft Excel functions. The sBG model aims to estimate the alpha and beta parameters of the beta distribution that maximize the log-likelihood, thus measuring the estimated churn rates. It integrates Beta and Geometric distributions. The Beta distribution accounts for individual heterogeneity (e.g., differences in retention rates and propensities to stay or leave the company) and shifts with each change in time, reflecting the dynamic nature of employee behavior over time. The Geometric distribution models the probability of churning after a period, projecting the number of periods it takes for the event (e.g., employee termination) to occur (Ahn et al., 2020 ). As the authors of the model state (Fader & Hardie, 2007 ), the sBG model is based on two main concepts. The first one is expressed by the Beta-distribution, by which heterogeneity in churn probability θ is presented as: $$\:f\left(\theta\:|\alpha\:,\:\beta\:\right)=\frac{{\theta\:}^{\alpha\:-1}{(1-\theta\:)}^{\beta\:-1}}{B(\alpha\:,\:\beta\:)},\:\:\:\alpha\:,\:\beta\:>0,$$ where: α and β stand for the parameters that shape the beta distribution and capture the heterogeneity of observed individuals’ behavior; θ represents the probability of an event to happen (e.g., customer churns, employee leaves a company); f(θ|α, β) represents the probability density function of beta distribution for the θ parameter with shape parameters alpha and beta; B(α, β) represents the normalization constant that guarantees that the range under the probability density curve sums to 1. The second one involves the Geometric distribution and describes that an employee does not churn and stays in the company with continuous retention probability 1 − θ . The duration of relationships between an employee and a company is characterized by the shifted geometric distribution that can be expressed as the following: $$\:P\left(T=t|\theta\:\right)={\theta\:(1-\theta\:)}^{t-1},\:\:\:t=1,\:2,\:3,\:\dots\:$$ $$\:S=\left(t|\theta\:\right)={(1-\theta\:)}^{t},\:\:\:\:\:\:\:t=1,\:2,\:3,\:\dots\:$$ , where: θ stands for the probability of an individual to stay active; t stands for the number of trials until an event happens; represents the probability of an individual to churn after each trial; 1-θ represents the probability of an individual to churn after each trial; P(T = t | θ) stands for the probability of an event to happen (e.g., an individual to churn) after t of successful trials (e.g., months) given a retention probability of θ. The model also considers that it is impossible to use the formulas mentioned above directly since the value of θ is unknown. θ parameter represents a value, which stands for the probability of an employee dropping out during a certain period. The parameter is used in the shifted-geometric distribution to model the likelihood of churn. To solve the issue ( θ — unknown), Fader and Hardie used the mathematical expectation of formulas for the beta distribution, which characterizes the heterogeneity of the cross-section. In other words, since the θ parameter is assumed to vary across the population, the overall probability is computed by considering the expected value based on a beta distribution for θ . This allows the model to display the corresponding result for a randomly selected person. The probability mass function and survivor function of the sBG model can be expressed as the following (Fader & Hardie, 2007 ): $$\:P\left(T=t|\alpha\:,\:\beta\:\right)=\frac{B(\alpha\:+1,\:\beta\:+t-1)}{B(\alpha\:,\:\beta\:)},\:t=1,\:2,\:\dots\:$$ $$\:S\left(t|\alpha\:,\:\beta\:\right)=\frac{B(\alpha\:,\:\beta\:+t)}{B(\alpha\:,\:\beta\:)},\:\:t=1,\:2,\:\dots\:,$$ where: P(T = t| α , β ) represents the probability of a successful event to happen (e.g., customer makes a purchase); S( t| α , β ) represents the probability of a successful event to occur given that it has not occurred yet (e.g., we are predicting that a customer will make their 1, 2, … purchase during the next transaction, given that they have not made it yet); α and β are the parameters that shape the beta distribution and capture observed individuals’ behavior heterogeneity. Additionally, the authors mention that sBG probabilities can be measured by application of the following formula without the direct involvement of the beta functions through a forward-recursion formula: $$\:P\left(T=t|\alpha\:,\:\beta\:\right)=\left\{\begin{array}{c}\frac{\alpha\:}{\alpha\:+\beta\:}\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:t=1\:\:\:\:\:\:\:\:\:\\\:\frac{\beta\:+t-2}{\alpha\:+\beta\:+t-1}P\left(T=t-1\right)\:\:\:\:t=2,\:3,\:\dots\:\end{array}\right.$$ Since working with beta functions might be complicated, the forward-recursion formula simplifies the model implementation process. It allows us to calculate the churn probability at the beginning (T = 1) and then iteratively calculate probabilities for further periods. If the P(T = t) is known, it is possible to use the following formula for the survivor function calculation, which describes the probability of an individual’s survival: $$\:S\left(t|\alpha\:,\:\beta\:\right)=1-\sum\:_{i-1}^{t}P(T=i|\alpha\:,\:\beta\:),$$ where: \(\:\sum\:_{i-1}^{t}P(T=i|\alpha\:,\:\beta\:)\) represents the sum of cumulative probabilities of successful events to occur once, twice, thrice, and up to t-times. 3.2 BdW Model by Fader & Hardie (2018) In 2018, Fader and Hardie introduced the beta-discrete-Weibull (BdW) model, an extension of the beta-geometric (BG) model. The primary distinction between the two models lies in their handling of individual-level churn probability fluctuations (Fader & Hardie, 2018). The sBG model addresses increasing cohort retention rates over time by accounting for cross-sectional heterogeneity (Fader & Hardie, 2007 ). However, in a continuous time setting, retention rate changes can be better explained by varying tendencies to churn among individuals rather than by increasing periods alone. This individual diversity reflects changes in retention indicators, and the BdW model captures these individual-level retention probabilities, which may increase or decrease over time (Vijayaragunathan & Kishore, 2022). Fader and Hardie emphasized that while cohort-based ratios generally tend to increase monotonically over time, there are exceptions. For example, some cohorts experience an initial drop before the retention rate increases. The BG distribution cannot adequately model such patterns, prompting the development of the BdW model. The BdW model's flexibility allows it to accommodate non-monotonically increasing retention rates by accounting for individual-level changes in churn tendencies (Fader & Hardie, 2018). The Weibull distribution can model both increasing and decreasing churn probabilities over time for individuals within a cohort, thus providing a more nuanced understanding of retention dynamics. The BdW model's conceptual foundation lies in the discrete-Weibull (dW) distribution. Fader and Hardie (2018) reviewed existing theories and research on the dW distribution, noting that later dW models (Murthy et al., 2004; Rinne, 2009) offer simplicity and flexibility for discrete-time analysis. The BdW model combines aspects of both geometric and Weibull distributions, making it duration-dependent. However, in testing with "Regular" and "High End" datasets of contractual settings (Berry and Linoff, 2004), Fader and Hardie (2018) found that the dW distribution alone could not accurately forecast the number of surviving individuals within a cohort. Conversely, the BG model, which accounts for heterogeneity, better predicted retention rates. Their findings suggest that an effective churn prediction model must capture both the length of the period and individual heterogeneity for improved accuracy. Consequently, the beta function, integral to the beta distribution, was incorporated to enhance the model's predictive capability. In summary, the BdW model extends the BG model by integrating the flexibility of the Weibull distribution to handle diverse retention patterns within cohorts. This approach provides a more accurate tool for predicting churn, particularly in settings where individual-level retention probabilities vary significantly over time. The following equation is described as parametric mixture model the beta-discrete-Weibull: $$\:P\left(T=t|\gamma\:,\:\delta\:,\:c\right)=S\left(t-1|\gamma\:,\:\delta\:,\:c\right)-S\left(t|\gamma\:,\:\delta\:,\:c\right)$$ $$\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:\:=\frac{B\left(\gamma\:,\:\delta\:+{\left(t-1\right)}^{c}\right)-B\left(\gamma\:,\:\delta\:+\:{t}^{c}\right)}{B\left(\gamma\:,\:\delta\:\right)},\:t=1,\:2,\:3,\:\dots\:,$$ where: S(t-1 | γ, δ, c) and S(t | γ, δ, c) represent survival functions at times (t-1) and (t), respectively; B(γ, δ + (t-1) c ) and B( γ, δ + (t) c ) represent cumulative distribution functions (CDF) of beta-distribution, including gamma and delta as parameters (CDF represents the probability to churn by time t-1 and t ); B(γ, δ) represents the normalization factor of the overall cumulative distribution function (probability to churn) of the beta distribution. 3.3 Linear and log-linear regression models In addition to the sBG and BdW models, linear and semi-logarithmic (from now on, “logarithmic” for brevity) analysis linking the retained cohort % (y) and the period (t) is applied in this research as simple benchmarks. The following two specifications were used: $$\:y=a+bt\:\text{a}\text{n}\text{d}\:y=a+bln\left(t\right)$$ 3.5 Model comparison To evaluate the effectiveness of the models, we predicted employee retention for periods of 1 year, 2 years, and 4 years across the three datasets. The forecasted variable was the percentage share of the cohort retained in a given month. The testing sample’s mean absolute error (measured in percentage points) was used to compare models. Each model was tested using multiple training/testing splits: For dataset 1: 61 months / 12 months, 49 months / 24 months, 25 months / 48 months. For dataset 2 and dataset 3: 133 months / 12 months, 121 months / 24 months, 97 months / 48 months. These splits allow us to assess the models' performance across different time horizons and ensure a robust comparison of their predictive capabilities. 4. Results We aimed to determine the most universally effective model for predicting employee survival across different scenarios, considering that the dynamics of employee numbers can vary. To achieve this, we collected and analyzed data from three distinct datasets, each representing employees within different organizational contexts. Models were evaluated based on their ability to predict employee survival over varying time horizons, using different lengths of training and testing periods (Table 1 ). Table 1 Testing sample’s MAE for the sBG, BdW, Linear, and Logarithmic models for different training/testing splits. The number of months used for training the model The number of months used for testing the model sBG model’s MAE, % Linear model’s MAE, % BdW model’s MAE, % Logarithmic model’s MAE, % Dataset 1 61 12 5.22 3.47 1.54 8.31 49 24 3.58 2.24 0.40 8.36 25 48 6.02 5.20 2.14 8.28 Dataset 2 133 12 10.80 7.51 1.85 1.59 121 24 12.53 10.32 1.48 1.59 97 48 14.59 19.61 1.86 3.84 Dataset 3 133 12 4.90 3.42 1.91 2.37 121 24 5.18 3.59 2.61 2.78 97 48 5.20 4.94 5.19 5.04 The BdW model systematically performed much better than other models except for dataset 3, where the logarithmic model performed similarly. During the longest testing period, the MAE values for all models were approximately the same in the case of dataset 3. Overall, the BdW model is the most versatile and best-performing model for all datasets and forms of dynamics, exhibiting fewer errors than other models. The advantage of the best-performing model is most pronounced over longer forecasting horizons. This is particularly evident in the second dataset, where the BdW model consistently outperforms the other models, especially when predicting retention over 48 months. This suggests that the BdW model offers a distinct advantage in capturing and accurately predicting underlying trends and patterns within the data for longer-term forecasting needs, such as predicting employee retention over several years. Next, to assess the reliability of the BdW model in predicting employee churn and determine the acceptable number of months required to train the model, we fixed the number of months used for testing and incrementally increased the training period by 12 months. We then calculated the MAE for predictions made 12, 24, 36, and 48 months forward. The obtained results are presented in Table 2 , and plots of the dependence of MAE on training sample size for each test sample size and for each of the three datasets are shown in Fig. 1 . Table 2 The BdW model estimation results for the longest training/testing splits. The number of months used for training the model The number of months used for testing the model BdW model's MAE, % Dataset 1 Dataset 2 Dataset 3 12 12 0.66 5.61 1.88 24 12 0.25 7.23 1.26 36 12 0.24 5.33 0.65 48 12 0.81 3.55 0.78 60 12 0.54 3.64 0.84 12 24 1.37 4.62 1.50 24 24 1.22 13.37 0.96 36 24 0.32 7.93 1.48 48 24 0.74 6.28 1.19 60 24 – 3.79 0.55 12 36 2.40 8.07 1.20 24 36 1.83 18.45 1.06 36 36 0.65 11.13 2.04 48 36 – 7.60 1.13 60 36 – 4.57 0.62 12 48 3.71 12.21 1.46 24 48 2.14 23.33 1.32 36 48 – 13.32 2.19 48 48 – 8.99 1.34 60 48 – 5.86 1.20 The comparison results generally imply a tendency for larger training sets to ensure lower testing errors. For datasets 1 and 2, the most significant drop in MAE is observed when the training sample increases from 24 to 36 months. The MAE's dependence on the training sample size is least pronounced in the case of dataset 3, which is characterized by the lowest and most homogeneous MAEs across all experiments. Our comparison shows that even using as few as 12 months of training data provides decent accuracy for 12-month ahead forecasts, confirming the ability of the chosen theoretically justified BdW model to extrapolate many periods ahead. 5. Discussion The BdW model has shown the most uniformly good performance across several datasets and is considered a reliable tool for long-term retention projections. Employee retention projection is a complex process that involves both the duration and estimation of behavioral factors (e.g., reasons for churn, robust external factors at certain times) to achieve accurate outcomes. Therefore, the model must consider increasing periods and capture the sample's heterogeneity. According to Fader and Hardie (2018), retention rates change over time due to the persistently changing behavior of customers, which similarly applies to employees' behavior and reasons for churn. The BdW model performed the best among the tested models because it can capture individual-level churn/retention probabilities over increasing time durations. The poorer performance of other models can be attributed to their inability to grasp the relationships between variables affecting potential retention rate changes over time. For instance, the sBG model presented by Fader and Hardie ( 2007 ) showed satisfactory results for short-term projections but may not perform robustly with other datasets over longer periods. The sBG model includes Beta and Geometric distributions, considering both sample heterogeneity and the probability of churning over time. However, it lacks the flexibility to capture long-term behavior tendencies effectively. In contrast, the BdW model offers a more flexible distribution projection for event occurrence. Unlike geometric distribution, the Weibull distribution captures nuances in individual behavior patterns over time (e.g., the likelihood of an employee staying or leaving a company). The linear and logarithmic regression models also fall short of long-term predictions. The linear model assumes a linear trend in cohort size, which does not hold for the complex dynamics of employee retention. Retention projection involves capturing non-linear relationships and evaluating evolving factors (e.g., external environment changes, HR market dynamics, and employees' physical and mental states). While logarithmic regression can capture nonlinear changes, it poorly corresponds to the underlying data-generating process. Fader and Hardie (2018) highlight the BdW model's flexibility in handling inhomogeneous changes in retention rate projection. Several studies emphasize the Weibull distribution's flexibility in capturing complex survival patterns and suggest different extensions to develop more accurate predictive models. For example, Jamal and Bucklin (2006) found that the Weibull model outperformed simpler hazard modeling approaches, underscoring the importance of integrating heterogeneity for churn/retention projection. Enkhmunkh et al. ( 2007 ) highlighted the Weibull distribution's accuracy and flexibility for shape-forming and simple cumulative distribution functions. Nekoukhou et al. (2017) introduced the discrete equivalent of the beta-Weibull distribution for discrete data and survival modeling. Chanasriphum et al. ( 2019 ) developed a model based on the beta-generalized Weibull distribution for lifetime data forecasting, which outperformed other regression models. 6. Conclusion This paper investigates the feasibility of applying the BdW model, originally developed by Fader and Hardie (2018) for customer analytics, to project employee retention curves. We compared the accuracy of two probabilistic models—the sBG and BdW models—originally designed for customer churn prediction, to assess their suitability for employee churn prediction. Testing these models on three real-life datasets, we found that both models are effective in HR analytics for predicting employee retention, with the BdW model outperforming the sBG, the linear, and logarithmic regression models by capturing changes in attrition rate heterogeneity over time. Our analysis indicates that using 48 months of historical data can provide accurate retention rate predictions for future periods. Using probabilistic models for employee retention curve projection is addressed for the first time in academic HR analytics literature. Our research adds scientific novelty by demonstrating that such parsimonious models based on limited data can give surprisingly accurate forecasts. Although ML approaches can handle large datasets and generate individual-level predictions, they often lack interpretability, require substantial data, and apply mostly to short-term forecasting (e.g., 1–6 months ahead). On the other hand, probabilistic models offer greater transparency of underlying data-generating processes. They can be implemented using tools like Microsoft Excel, making them accessible for organizations with limited data science expertise. Firms can make projections for many months ahead and plan recruitment accordingly. While model-based estimates naturally cannot account for rare external shocks, they can serve as a useful diagnostic tool for detecting the company’s under- or over-performance from the employee retention perspective. However, probabilistic models have their limitations. Unlike customer churn, employee turnover can be voluntary or involuntary, varying by circumstances and industries. These different churn types may require distinct approaches, affecting forecast quality. Additionally, the presented parsimonious probabilistic models apply mostly to cohorts of relatively homogeneous employees (regarding the job function). Future research can test and evaluate the accuracy of these results across different industries and role types. Given the rapidly changing environments of recent years, it would be useful to test the models with recent datasets to determine if high degrees of uncertainty affect the quality of retention projections or if the best probabilistic models still generalize well even without explanatory variables. Declarations Competing Interests None Funding None Ethics approval/declarations (include appropriate approvals or waivers) Not applicable Consent to participate Not applicable Consent for publication Not applicable Availability of data and material The spreadsheet with data and model implementations is available in the dedicated Open Science Framework (OSF) repository. Code availability (software application or custom code) Not applicable Authors' contributions E.A. conceptualized, planned, and reviewed the manuscript. A.G. developed the spreadsheet models. Y.T. prepared the data and wrote the main text. E.P. wrote the conclusion and extensively reviewed the whole paper. References Ahn J, Hwang J, Kim D, Choi H, Kang S (2020) A survey on churn analysis in various business domains. IEEE Access 8:220816–220839 Alamsyah A, Salma N (2018) A comparative study of employee churn Prediction model. 2018 4th International Conference on Science and Technology (ICST) Alho JM (2014) Forecasting demographic forecasts. Int J Forecast 30(4):1128–1135 Al-Suraihi WA, Samikon SA, Alsuraihi A, Ibrahim I (2021) Employee Turnover: Causes, importance and retention strategies. Eur J Bus Manage Res 6(3):1–10 Berry LR, Helman P, West M (2020) Probabilistic forecasting of heterogeneous consumer transaction–sales time series. Int J Forecast 36(2):552–569 Boshnakov GN, Kharrat T, McHale IG (2017) A bivariate Weibull count model for forecasting association football scores. Int J Forecast 33(2):458–466 Chakraborty M, Das S, Lavoie A (2015) How to show a probabilistic model is better. arXiv (Cornell University) Chanasriphum N, Seenoi P, Srisodaphol W (2019) The Log Beta generalized Weibull regression model for lifetime data. Journal of Physics: Conference Series, 1366(1), 012121 De Winne S, Marescaux E, Sels L, Van Beveren I, Vanormelingen S (2018) The impact of employee turnover and turnover volatility on labor productivity: a flexible non-linear approach. Int J Hum Resource Manage 30(21):3049–3079 Enkhmunkh N, Kim G, Hwang K, Hyun S (2007) A parameter estimation of Weibull distribution for reliability assessment with limited failure data. International Forum on Strategic Technology, 39–42. Ulaanbaatar, Mongolia Fader PS, Hardie BGS (2009) Probability Models for Customer-Base Analysis. J Interact Mark 23(1):61–69 Fader PS, Hardie BGS, Lee KL (2005) Counting your customers the easy way: an alternative to the Pareto/NBD model. Mark Sci 24(2):275–284 Fader PS, Hardie BGS Fitting the sBG Model to Multi-Cohort Data., BruceHardie (2007) https://www.brucehardie.com/notes/017/ . Accessed on 15 December 2023 Fader PS, Hardie BGS (2007) How to Project Customer Retention. J Interact Mark 21:76–90 Gandy A (2012) Performance monitoring of credit portfolios using survival analysis. Int J Forecast 28(1):139–144 Gentek A (2022) Employee Churn Prediction in Healthcare Industry using Supervised Machine Learning (MA thesis). KTH Royal Institute of Technology. Retrieved from https://www.diva-portal.org/smash/get/diva2:1711505/FULLTEXT01.pdf Gregory B (2018) Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data. arXiv (Cornell University Iversen EB, Morales JM, Møller JK, Madsen H (2016) Short-term probabilistic forecasting of wind speed using stochastic differential equations. Int J Forecast 32(3):981–990 Jamal Z, Bucklin RE (1987) Improving the diagnosis and prediction of customer churn: A heterogeneous hazard modeling approach. J Interact Mark 20(3–4):16–29 Javed S, Azhar A (2017) Forecasting Employee Turnover for Human Resource Based on Time Series Analysis. Int J Econ Res 14(16):445–456 Jun DB, Kim K, Park M (2016) Forecasting annual lung and bronchus cancer deaths using individual survival times. Int J Forecast 32(1):168–179 Kang S, Oh H (2024) Forecasting South Korea’s presidential election via multiparty dynamic Bayesian modeling. Int J Forecast 40(1):124–141 Kim D, Lee H, Cho S (2008) Response modeling with support vector regression. Expert Syst Appl 34(2):1102–1108 Kolasa M, Rubaszek M (2015) Forecasting using DSGE models with financial frictions. Int J Forecast 31(1):1–19 Levene M, Fenner T (2021) A stochastic differential equation approach to the analysis of the 2017 and 2019 UK general election polls. Int J Forecast 37(3):1227–1234 Mahsa EG, Jafar TM (2013) Customer Lifetime Value Models: A literature Survey. Int J Industrial Eng Prod Res 24(4):317–336 Masarifoglu M, Büyüklü AH (2019) Applying survival analysis to telecom churn data. Am J Theoretical Appl Stat 8(6):261 Mena CG, De Caigny A, Coussement K, De Bock KW, Lessmann S (2019) Churn Prediction with Sequential Data and Deep Neural Networks. A Comparative Analysis. arXiv (Cornell University) Naz K, Siddiqui IF, Koo J, Khan MA, Qureshi NMF (2022) Predictive modeling of employee churn analysis for IoT-Enabled software industry. Appl Sci 12(20):10495 Nekoukhou V, Bidram H, Roozegar R (2016) The Beta-Weibull Distribution on the Lattice of Integers. Ciência E Natura 39(1):40 Nestor B, McDermott MBA, Boag W, Berner G, Naumann T, Hughes MC, Goldenberg A, Ghassemi M (2019) Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks. Proceedings of Machine Learning Research, 106, 1–23 Saradhi VV, Palshikar GK (2011) Employee churn prediction. Expert Syst Appl 38(3):1999–2006 Saridakis G, Cooper CL (2016) Introduction: the state of employee turnover. In Edw Elgar Publishing eBooks, 1–4 Singh P, Singh SP, Singh DS (2019) An Introduction and Review on Machine Learning Application in Medicine and Healthcare. IEEE Conference on Information and Communication Technology, 1–6 Tamaddoni A, Stakhovych S, Ewing MT (2015) Comparing churn prediction techniques and assessing their performance. J Service Res 19(2):123–141 Yigit IO, Shourabizadeh H (2017) An approach for predicting employee churn by using data mining. 2017 International Artificial Intelligence and Data Processing Symposium (IDAP). Malatya, Turkey Footnotes Employees Jobs and Attributes County of Marin Legacy City-Parish Employees % in the context of this paper indicates percentage points. For example, if 80% of the cohort were retained in period t , while the model prediction is 78%, the MAE is two percentage points. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 19 Feb, 2025 Read the published version in Operations Research Forum → Version 1 posted Editorial decision: Revision requested 30 Sep, 2024 Reviews received at journal 25 Sep, 2024 Reviews received at journal 25 Sep, 2024 Reviewers agreed at journal 05 Sep, 2024 Reviewers agreed at journal 04 Sep, 2024 Reviewers invited by journal 28 Jul, 2024 Editor assigned by journal 22 Jul, 2024 Submission checks completed at journal 22 Jul, 2024 First submitted to journal 18 Jul, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4765185","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":335514279,"identity":"6b527a7b-4174-48aa-adb9-1f2cd7cc76dc","order_by":0,"name":"Evgeny A. Antipov","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABA0lEQVRIiWNgGAWjYHADxgYgYQNiNB4gTgcbWEsaWC+xWsDkYTCJVwu/RPrFxwUVDPLm8s1tDz62nbdb234YaEuNTTQuLZIzcoqNZ5xhMNzZxthuOLPtdvK2M4lALcfSchtwaDG4kZMmzdvGwLjhGGObNM+Z28lmB4BaGBsO49RifyMn/TfvPwZ7qJZzyWbnH+LXYiCRfoyZt4EhEaKl4oCd2Q0CtkicecMszXNMInnDscR2wxkVyQlmN4C2JODxC397+sPPPDU2thsOH3/24IOBnb3Z+fSHDz7U2ODUwsDAYwCyDMQCx0wiWGUCTuUgwP4AxgJrscereBSMglEwCkYkAAAM2mI5usy37wAAAABJRU5ErkJggg==","orcid":"","institution":"Canadian University Dubai","correspondingAuthor":true,"prefix":"","firstName":"Evgeny","middleName":"A.","lastName":"Antipov","suffix":""},{"id":335514281,"identity":"83fa06a7-0438-409b-8720-eca43b81cefc","order_by":1,"name":"Anastasia Gagarskaya","email":"","orcid":"","institution":"National Research University Higher School of Economics","correspondingAuthor":false,"prefix":"","firstName":"Anastasia","middleName":"","lastName":"Gagarskaya","suffix":""},{"id":335514283,"identity":"dcb1a9b4-7905-4b4a-9914-d7734b072510","order_by":2,"name":"Yulia Trofimova","email":"","orcid":"","institution":"National Research University Higher School of Economics","correspondingAuthor":false,"prefix":"","firstName":"Yulia","middleName":"","lastName":"Trofimova","suffix":""},{"id":335514284,"identity":"5bb8248c-e9a2-4221-8851-301e7e114dde","order_by":3,"name":"Elena Pokryshevskaya","email":"","orcid":"","institution":"Dubai Internet City","correspondingAuthor":false,"prefix":"","firstName":"Elena","middleName":"","lastName":"Pokryshevskaya","suffix":""}],"badges":[],"createdAt":"2024-07-18 22:23:22","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4765185/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4765185/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s43069-025-00417-0","type":"published","date":"2025-02-19T15:57:29+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":62643011,"identity":"870ca11c-18fa-4075-8183-faa3856cb296","added_by":"auto","created_at":"2024-08-16 19:39:46","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":38232,"visible":true,"origin":"","legend":"\u003cp\u003eDependence of MAE (%) on training sample size for each test sample size and for each of the three datasets\u003c/p\u003e","description":"","filename":"groupimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-4765185/v1/84b245ec2836e57d8b2b7034.jpeg"},{"id":77053824,"identity":"182d1e26-744f-4d93-a34a-9df4aa887539","added_by":"auto","created_at":"2025-02-24 16:30:30","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":896124,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4765185/v1/b1f9f75e-f2ff-4a9c-af5c-d06a0ab9b537.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Applying Shifted-Beta-Geometric and Beta-Discrete-Weibull Models for Employee Retention Curve Projection","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eChurn prediction is a significant focus of customer analytics research (Yiğit \u0026amp; Shourabizadeh, 2017), but employee churn, which incurs substantial losses for companies (Saridakis \u0026amp; Cooper, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2016\u003c/span\u003e), receives less attention (Saradhi \u0026amp; Palshikar, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Yiğit \u0026amp; Shourabizadeh, 2017). Predicting employee churn involves evaluating turnover rates over specific periods (Gentek, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Employee churn represents a loss of intellectual assets for a company (Musanga \u0026amp; Chibaya, 2023), leading to operational disruptions, knowledge loss, and increased hiring costs (Winne et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). High turnover requires constant recruitment, but the irreplaceability of experienced employees exacerbates the issue (Suraihi et al., 2021). The significant expenses of adaptation, training, and salaries contribute to these losses (Naz et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Thus, organizations seek to predict turnover and enhance retention strategies (Musanga \u0026amp; Chibaya, 2023). Accurate predictions enable companies to improve retention strategies and maintain a stable workforce. Predictive models for employee churn can help prevent workforce loss and provide a competitive advantage. At the same time, many businesses are unfamiliar with effective churn projection approaches (Musanga \u0026amp; Chibaya, 2023).\u003c/p\u003e \u003cp\u003eJob quitting and customer churn in contractual settings are similar stochastic processes. Both involve individuals leaving their group at the end of a period (Alamsyah \u0026amp; Salma, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Gentek, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Key metrics in these processes include retention and churn rates. The retention rate measures the proportion of individuals who remain active throughout the period, while the churn rate measures the proportion who leave by the end (Fader \u0026amp; Hardie, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2007\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eExisting literature categorizes churn prediction models into two broad groups: machine learning approaches and probabilistic models based on statistical principles (Gentek, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Various studies compare these models, with machine learning and predictive analytics being commonly used for their ability to process large datasets and capture complex relationships (Fader \u0026amp; Hardie, 2018).\u003c/p\u003e \u003cp\u003eCommonly used machine learning models include traditional classification algorithms (e.g., Logistic Regression, Random Forest, Naive Bayes, SVM, Decision Trees) (Kim et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2008\u003c/span\u003e; Alamsyah \u0026amp; Salma, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Singh et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Nestor et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Gentek, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), deep learning models (Mena et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Ozcan \u0026amp; Ozmen, 2021), and predictive analysis with time series models (e.g., ARIMA, ETS, XGB, LightGBM) (Jung, 2011; Javed \u0026amp; Azhar, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Gregory, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). These models require large, representative datasets and often struggle with non-linear relationships and changing behaviors (Naz et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Nestor et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Kang \u0026amp; Oh, 2023).\u003c/p\u003e \u003cp\u003eIn many cases, limited data makes traditional predictive models impractical. Therefore, this research focuses on probabilistic models for churn or retention projection that do not require extensive data but can be applied to short time series of employee cohort sizes over time. Probabilistic models, such as time series analysis, Bayesian models, survival analysis, logistic regression, SDEs, Poisson models, and exponential distributions, use probability distributions to project individual behaviors into the future (Tamaddoni et al., 2016).\u003c/p\u003e \u003cp\u003eProbabilistic models offer advantages over machine learning approaches, including ease of implementation, transparency, and stability with limited data (Chakraborty et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). Most studies on probabilistic models focus on customers, medical science, finance, demographics, politics, weather, and sports (Gandy, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Berry et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Zhang \u0026amp; Thomas, 2012; Jun et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Kolasa \u0026amp; Rubaszek, 2012; Zhou et al., 2022; Alho, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Levene \u0026amp; Fenner, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Iversen et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Boshnakov et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Fader and Hardie\u0026rsquo;s probabilistic models, primarily used for customer churn, retention, or lifetime value prediction, provide a theoretical framework based on probabilistic and statistical principles (Fader \u0026amp; Hardie, 2018). These models offer valuable insights for understanding customer behavior and can be adapted for employee churn/retention forecasting. This study contributes to the literature by exploring the application of Fader and Hardie\u0026rsquo;s shifted-beta geometric (sBG) and Beta-discrete Weibull (BdW) models (originally developed for customer analytics) to the case of employee churn prediction. We compare the out-of-sample performance of these probabilistic models using a logarithmic trend model as a benchmark. The results offer practical insights for HR analytics, presenting an alternative solution for employee churn prediction that is easy to implement without machine learning.\u003c/p\u003e \u003cp\u003eThe remainder of this study is organized as follows. The next section describes the chosen datasets. Section 3 presents the models applied in the study and compares them using multiple training/testing splits. Sections 4 and 5 discuss the results, findings, and conclusions, providing academic and managerial insights and proposing the most suitable model for specific cases.\u003c/p\u003e"},{"header":"2. Data","content":"\u003cp\u003eOur research leverages two unique US open data sources, typically not publicly available, offering an unparalleled opportunity to study the dynamics of employee cohort size. The first dataset provides information on regular hire employees within the County of Marin government organization, covering various departments from 2012 to 2020\u003ca class=\"FNLink\" href=\"#Fn1\" id=\"#FNLinkFn1\"\u003e\u003c/a\u003e. The second dataset includes Baton Rouge\u0026rsquo;s City-Parish employees from 2005 to 2017\u003ca class=\"FNLink\" href=\"#Fn2\" id=\"#FNLinkFn2\"\u003e\u003c/a\u003e, encompassing employment status until transitioning to a new payroll system. This dataset spans multiple departments, enabling a more comprehensive analysis.\u003c/p\u003e \u003cp\u003eWe examined the dynamics of three cohorts by tracking the number of employees remaining in each cohort over time:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eDataset 1\u003c/b\u003e: This dataset contains the monthly dynamics of a cohort from the County of Marin's Public Works Department, registered to be hired in December 2012 (initial size: 1275 employees) from December 2012 to December 2018. A noteworthy feature of this dataset is that the December 2012 cohort includes not only those hired in December 2012 itself but also those hired in previous months who were still active in December 2012. While this cohort definition is non-standard, it is practical for employers who rarely have large monthly cohorts but wish to project the monthly dynamics of employees active as of a specific date.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eDataset 2\u003c/b\u003e: This dataset details the monthly dynamics of a cohort from the Baton Rouge Police Department, hired in December 2005 (initial size: 53 employees) from December 2005 to December 2017. This cohort uses a standard definition, consisting of individuals hired within the same month.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eDataset 3\u003c/b\u003e: Similar to Dataset 2, this dataset includes the monthly dynamics of a cohort from the Baton Rouge Public Works Department, hired in December 2005 (initial size: 45 employees) from December 2005 to December 2017, also using a standard cohort definition.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eBy analyzing these datasets, we aim to gain insights into the retention and turnover trends within different government departments, thereby contributing valuable knowledge to workforce management and planning.\u003c/p\u003e"},{"header":"3. Methods","content":"\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e3.1 sBG Model by Fader \u0026amp; Hardie (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2007\u003c/span\u003e)\u003c/h2\u003e \u003cp\u003eThe shifted-beta-geometric (sBG) distribution, introduced by Fader \u0026amp; Hardie (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2007\u003c/span\u003e), is a model designed to predict customer retention rates based on single cohort data. This model operates within a hypothetical contractual framework, examining annual retention rates for cohorts of individuals. As a discrete-time model for contract duration, the sBG forecasts future churn rates within a cohort by analyzing past churn data (Fader \u0026amp; Hardie, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). Unlike other probability models such as Bayesian approaches or survival analysis methods like Kaplan-Meier estimation or Cox proportional hazards model, the sBG model requires limited data and involves only a few calculations, making it implementable with standard Microsoft Excel functions.\u003c/p\u003e \u003cp\u003eThe sBG model aims to estimate the alpha and beta parameters of the beta distribution that maximize the log-likelihood, thus measuring the estimated churn rates. It integrates Beta and Geometric distributions. The Beta distribution accounts for individual heterogeneity (e.g., differences in retention rates and propensities to stay or leave the company) and shifts with each change in time, reflecting the dynamic nature of employee behavior over time. The Geometric distribution models the probability of churning after a period, projecting the number of periods it takes for the event (e.g., employee termination) to occur (Ahn et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eAs the authors of the model state (Fader \u0026amp; Hardie, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2007\u003c/span\u003e), the sBG model is based on two main concepts. The first one is expressed by the Beta-distribution, by which heterogeneity in churn probability \u003cem\u003eθ\u003c/em\u003e is presented as:\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:f\\left(\\theta\\:|\\alpha\\:,\\:\\beta\\:\\right)=\\frac{{\\theta\\:}^{\\alpha\\:-1}{(1-\\theta\\:)}^{\\beta\\:-1}}{B(\\alpha\\:,\\:\\beta\\:)},\\:\\:\\:\\alpha\\:,\\:\\beta\\:\u0026gt;0,$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere:\u003c/p\u003e \u003cp\u003e \u003cem\u003eα\u003c/em\u003e and \u003cem\u003eβ\u003c/em\u003e stand for the parameters that shape the beta distribution and capture the heterogeneity of observed individuals\u0026rsquo; behavior;\u003c/p\u003e \u003cp\u003e \u003cem\u003eθ\u003c/em\u003e represents the probability of an event to happen (e.g., customer churns, employee leaves a company);\u003c/p\u003e \u003cp\u003e \u003cem\u003ef(θ|α, β)\u003c/em\u003e represents the probability density function of beta distribution for the θ parameter with shape parameters alpha and beta;\u003c/p\u003e \u003cp\u003e \u003cem\u003eB(α, β)\u003c/em\u003e represents the normalization constant that guarantees that the range under the probability density curve sums to 1.\u003c/p\u003e \u003cp\u003eThe second one involves the Geometric distribution and describes that an employee does not churn and stays in the company with continuous retention probability \u003cem\u003e1\u0026thinsp;\u0026minus;\u0026thinsp;θ\u003c/em\u003e. The duration of relationships between an employee and a company is characterized by the shifted geometric distribution that can be expressed as the following:\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:P\\left(T=t|\\theta\\:\\right)={\\theta\\:(1-\\theta\\:)}^{t-1},\\:\\:\\:t=1,\\:2,\\:3,\\:\\dots\\:$$\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:S=\\left(t|\\theta\\:\\right)={(1-\\theta\\:)}^{t},\\:\\:\\:\\:\\:\\:\\:t=1,\\:2,\\:3,\\:\\dots\\:$$\u003c/div\u003e\u003c/div\u003e,\u003c/p\u003e \u003cp\u003ewhere:\u003c/p\u003e \u003cp\u003e \u003cem\u003eθ\u003c/em\u003e stands for the probability of an individual to stay active;\u003c/p\u003e \u003cp\u003e \u003cem\u003et\u003c/em\u003e stands for the number of trials until an event happens;\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003e represents the probability of an individual to churn after each trial;\u003c/h3\u003e\n\u003cdiv class=\"Heading\"\u003e\u003cem\u003e1-θ\u003c/em\u003e represents the probability of an individual to churn after each trial;\u003c/div\u003e \u003cp\u003e \u003cem\u003eP(T\u0026thinsp;=\u0026thinsp;t | θ)\u003c/em\u003e stands for the probability of an event to happen (e.g., an individual to churn) after \u003cem\u003et\u003c/em\u003e of successful trials (e.g., months) given a retention probability of θ.\u003c/p\u003e \u003cp\u003eThe model also considers that it is impossible to use the formulas mentioned above directly since the value of \u003cem\u003eθ\u003c/em\u003e is unknown. \u003cem\u003eθ\u003c/em\u003e parameter represents a value, which stands for the probability of an employee dropping out during a certain period. The parameter is used in the shifted-geometric distribution to model the likelihood of churn. To solve the issue (\u003cem\u003eθ\u003c/em\u003e \u0026mdash; unknown), Fader and Hardie used the mathematical expectation of formulas for the beta distribution, which characterizes the heterogeneity of the cross-section. In other words, since the \u003cem\u003eθ\u003c/em\u003e parameter is assumed to vary across the population, the overall probability is computed by considering the expected value based on a beta distribution for \u003cem\u003eθ\u003c/em\u003e. This allows the model to display the corresponding result for a randomly selected person. The probability mass function and survivor function of the sBG model can be expressed as the following (Fader \u0026amp; Hardie, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2007\u003c/span\u003e):\u003cdiv id=\"Equd\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equd\" name=\"EquationSource\"\u003e\n$$\\:P\\left(T=t|\\alpha\\:,\\:\\beta\\:\\right)=\\frac{B(\\alpha\\:+1,\\:\\beta\\:+t-1)}{B(\\alpha\\:,\\:\\beta\\:)},\\:t=1,\\:2,\\:\\dots\\:$$\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Eque\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Eque\" name=\"EquationSource\"\u003e\n$$\\:S\\left(t|\\alpha\\:,\\:\\beta\\:\\right)=\\frac{B(\\alpha\\:,\\:\\beta\\:+t)}{B(\\alpha\\:,\\:\\beta\\:)},\\:\\:t=1,\\:2,\\:\\dots\\:,$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere:\u003c/p\u003e \u003cp\u003e \u003cem\u003eP(T\u0026thinsp;=\u0026thinsp;t| α\u003c/em\u003e, \u003cem\u003eβ\u003c/em\u003e) represents the probability of a successful event to happen (e.g., customer makes a purchase);\u003c/p\u003e \u003cp\u003eS(\u003cem\u003et| α\u003c/em\u003e, \u003cem\u003eβ\u003c/em\u003e) represents the probability of a successful event to occur given that it has not occurred yet (e.g., we are predicting that a customer will make their 1, 2, \u0026hellip; purchase during the next transaction, given that they have not made it yet);\u003c/p\u003e \u003cp\u003e \u003cem\u003eα\u003c/em\u003e and \u003cem\u003eβ\u003c/em\u003e are the parameters that shape the beta distribution and capture observed individuals\u0026rsquo; behavior heterogeneity.\u003c/p\u003e \u003cp\u003eAdditionally, the authors mention that sBG probabilities can be measured by application of the following formula without the direct involvement of the beta functions through a forward-recursion formula:\u003cdiv id=\"Equf\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equf\" name=\"EquationSource\"\u003e\n$$\\:P\\left(T=t|\\alpha\\:,\\:\\beta\\:\\right)=\\left\\{\\begin{array}{c}\\frac{\\alpha\\:}{\\alpha\\:+\\beta\\:}\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:t=1\\:\\:\\:\\:\\:\\:\\:\\:\\:\\\\\\:\\frac{\\beta\\:+t-2}{\\alpha\\:+\\beta\\:+t-1}P\\left(T=t-1\\right)\\:\\:\\:\\:t=2,\\:3,\\:\\dots\\:\\end{array}\\right.$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eSince working with beta functions might be complicated, the forward-recursion formula simplifies the model implementation process. It allows us to calculate the churn probability at the beginning \u003cem\u003e(T\u0026thinsp;=\u0026thinsp;1)\u003c/em\u003e and then iteratively calculate probabilities for further periods. If the \u003cem\u003eP(T\u0026thinsp;=\u0026thinsp;t)\u003c/em\u003e is known, it is possible to use the following formula for the survivor function calculation, which describes the probability of an individual\u0026rsquo;s survival:\u003cdiv id=\"Equg\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equg\" name=\"EquationSource\"\u003e\n$$\\:S\\left(t|\\alpha\\:,\\:\\beta\\:\\right)=1-\\sum\\:_{i-1}^{t}P(T=i|\\alpha\\:,\\:\\beta\\:),$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere:\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\\(\\:\\sum\\:_{i-1}^{t}P(T=i|\\alpha\\:,\\:\\beta\\:)\\)\u003c/span\u003e \u003c/span\u003e represents the sum of cumulative probabilities of successful events to occur once, twice, thrice, and up to t-times.\u003c/p\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e3.2 BdW Model by Fader \u0026amp; Hardie (2018)\u003c/h2\u003e \u003cp\u003eIn 2018, Fader and Hardie introduced the beta-discrete-Weibull (BdW) model, an extension of the beta-geometric (BG) model. The primary distinction between the two models lies in their handling of individual-level churn probability fluctuations (Fader \u0026amp; Hardie, 2018). The sBG model addresses increasing cohort retention rates over time by accounting for cross-sectional heterogeneity (Fader \u0026amp; Hardie, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). However, in a continuous time setting, retention rate changes can be better explained by varying tendencies to churn among individuals rather than by increasing periods alone. This individual diversity reflects changes in retention indicators, and the BdW model captures these individual-level retention probabilities, which may increase or decrease over time (Vijayaragunathan \u0026amp; Kishore, 2022).\u003c/p\u003e \u003cp\u003eFader and Hardie emphasized that while cohort-based ratios generally tend to increase monotonically over time, there are exceptions. For example, some cohorts experience an initial drop before the retention rate increases. The BG distribution cannot adequately model such patterns, prompting the development of the BdW model. The BdW model's flexibility allows it to accommodate non-monotonically increasing retention rates by accounting for individual-level changes in churn tendencies (Fader \u0026amp; Hardie, 2018). The Weibull distribution can model both increasing and decreasing churn probabilities over time for individuals within a cohort, thus providing a more nuanced understanding of retention dynamics.\u003c/p\u003e \u003cp\u003eThe BdW model's conceptual foundation lies in the discrete-Weibull (dW) distribution. Fader and Hardie (2018) reviewed existing theories and research on the dW distribution, noting that later dW models (Murthy et al., 2004; Rinne, 2009) offer simplicity and flexibility for discrete-time analysis. The BdW model combines aspects of both geometric and Weibull distributions, making it duration-dependent.\u003c/p\u003e \u003cp\u003eHowever, in testing with \"Regular\" and \"High End\" datasets of contractual settings (Berry and Linoff, 2004), Fader and Hardie (2018) found that the dW distribution alone could not accurately forecast the number of surviving individuals within a cohort. Conversely, the BG model, which accounts for heterogeneity, better predicted retention rates. Their findings suggest that an effective churn prediction model must capture both the length of the period and individual heterogeneity for improved accuracy. Consequently, the beta function, integral to the beta distribution, was incorporated to enhance the model's predictive capability.\u003c/p\u003e \u003cp\u003eIn summary, the BdW model extends the BG model by integrating the flexibility of the Weibull distribution to handle diverse retention patterns within cohorts. This approach provides a more accurate tool for predicting churn, particularly in settings where individual-level retention probabilities vary significantly over time.\u003c/p\u003e \u003cp\u003eThe following equation is described as parametric mixture model the beta-discrete-Weibull:\u003cdiv id=\"Equh\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equh\" name=\"EquationSource\"\u003e\n$$\\:P\\left(T=t|\\gamma\\:,\\:\\delta\\:,\\:c\\right)=S\\left(t-1|\\gamma\\:,\\:\\delta\\:,\\:c\\right)-S\\left(t|\\gamma\\:,\\:\\delta\\:,\\:c\\right)$$\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Equi\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equi\" name=\"EquationSource\"\u003e\n$$\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:\\:=\\frac{B\\left(\\gamma\\:,\\:\\delta\\:+{\\left(t-1\\right)}^{c}\\right)-B\\left(\\gamma\\:,\\:\\delta\\:+\\:{t}^{c}\\right)}{B\\left(\\gamma\\:,\\:\\delta\\:\\right)},\\:t=1,\\:2,\\:3,\\:\\dots\\:,$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere:\u003c/p\u003e \u003cp\u003e \u003cem\u003eS(t-1 | γ, δ, c)\u003c/em\u003e and \u003cem\u003eS(t | γ, δ, c)\u003c/em\u003e represent survival functions at times (t-1) and (t), respectively;\u003c/p\u003e \u003cp\u003e \u003cem\u003eB(γ, δ + (t-1)\u003c/em\u003e \u003csup\u003e \u003cem\u003ec\u003c/em\u003e \u003c/sup\u003e \u003cem\u003e)\u003c/em\u003e and B(\u003cem\u003eγ, δ + (t)\u003c/em\u003e\u003csup\u003e\u003cem\u003ec\u003c/em\u003e\u003c/sup\u003e\u003cem\u003e)\u003c/em\u003e represent cumulative distribution functions (CDF) of beta-distribution, including gamma and delta as parameters (CDF represents the probability to churn by time \u003cem\u003et-1\u003c/em\u003e and \u003cem\u003et\u003c/em\u003e);\u003c/p\u003e \u003cp\u003e \u003cem\u003eB(γ, δ)\u003c/em\u003e represents the normalization factor of the overall cumulative distribution function (probability to churn) of the beta distribution.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Linear and log-linear regression models\u003c/h2\u003e \u003cp\u003eIn addition to the sBG and BdW models, linear and semi-logarithmic (from now on, \u0026ldquo;logarithmic\u0026rdquo; for brevity) analysis linking the retained cohort % (y) and the period (t) is applied in this research as simple benchmarks. The following two specifications were used:\u003cdiv id=\"Equj\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equj\" name=\"EquationSource\"\u003e\n$$\\:y=a+bt\\:\\text{a}\\text{n}\\text{d}\\:y=a+bln\\left(t\\right)$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.5 Model comparison\u003c/h2\u003e \u003cp\u003eTo evaluate the effectiveness of the models, we predicted employee retention for periods of 1 year, 2 years, and 4 years across the three datasets. The forecasted variable was the percentage share of the cohort retained in a given month. The testing sample\u0026rsquo;s mean absolute error (measured in percentage points) was used to compare models.\u003c/p\u003e \u003cp\u003eEach model was tested using multiple training/testing splits:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eFor dataset 1: 61 months / 12 months, 49 months / 24 months, 25 months / 48 months.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eFor dataset 2 and dataset 3: 133 months / 12 months, 121 months / 24 months, 97 months / 48 months.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThese splits allow us to assess the models' performance across different time horizons and ensure a robust comparison of their predictive capabilities.\u003c/p\u003e \u003c/div\u003e"},{"header":"4. Results","content":"\u003cp\u003eWe aimed to determine the most universally effective model for predicting employee survival across different scenarios, considering that the dynamics of employee numbers can vary. To achieve this, we collected and analyzed data from three distinct datasets, each representing employees within different organizational contexts. Models were evaluated based on their ability to predict employee survival over varying time horizons, using different lengths of training and testing periods (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eTesting sample\u0026rsquo;s MAE for the sBG, BdW, Linear, and Logarithmic models for different training/testing splits.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eThe number of months used for training the model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eThe number of months used for testing the model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003esBG model\u0026rsquo;s MAE, %\u003ca class=\"FNLink\" href=\"#Fn3\" id=\"#FNLinkFn3\"\u003e\u003c/a\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eLinear model\u0026rsquo;s MAE, %\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eBdW model\u0026rsquo;s MAE, %\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eLogarithmic model\u0026rsquo;s MAE, %\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eDataset 1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5.22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e3.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e8.31\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e3.58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.40\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e8.36\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e6.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e2.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e8.28\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eDataset 2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e133\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e10.80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e7.51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.85\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e1.59\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e121\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e12.53\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e10.32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e1.59\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e14.59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e19.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e3.84\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eDataset 3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e133\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e3.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2.37\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e121\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5.18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e3.59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e2.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2.78\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e4.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e5.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e5.04\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe BdW model systematically performed much better than other models except for dataset 3, where the logarithmic model performed similarly. During the longest testing period, the MAE values for all models were approximately the same in the case of dataset 3.\u003c/p\u003e \u003cp\u003eOverall, the BdW model is the most versatile and best-performing model for all datasets and forms of dynamics, exhibiting fewer errors than other models. The advantage of the best-performing model is most pronounced over longer forecasting horizons. This is particularly evident in the second dataset, where the BdW model consistently outperforms the other models, especially when predicting retention over 48 months. This suggests that the BdW model offers a distinct advantage in capturing and accurately predicting underlying trends and patterns within the data for longer-term forecasting needs, such as predicting employee retention over several years.\u003c/p\u003e \u003cp\u003eNext, to assess the reliability of the BdW model in predicting employee churn and determine the acceptable number of months required to train the model, we fixed the number of months used for testing and incrementally increased the training period by 12 months. We then calculated the MAE for predictions made 12, 24, 36, and 48 months forward. The obtained results are presented in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, and plots of the dependence of MAE on training sample size for each test sample size and for each of the three datasets are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe BdW model estimation results for the longest training/testing splits.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eThe number of months used for training the model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eThe number of months used for testing the model\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c5\" namest=\"c3\"\u003e \u003cp\u003eBdW model's MAE, %\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDataset 1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eDataset 2\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDataset 3\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.66\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.88\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e7.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.26\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.65\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e3.55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.78\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e3.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.84\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.50\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e13.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e7.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.48\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.74\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e6.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.19\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e3.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.55\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.40\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e8.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.20\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e18.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.06\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.65\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e11.13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2.04\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e7.60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.13\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4.57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.62\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e3.71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e12.21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.46\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e23.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.32\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e13.32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2.19\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e8.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.34\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ndash;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.20\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe comparison results generally imply a tendency for larger training sets to ensure lower testing errors. For datasets 1 and 2, the most significant drop in MAE is observed when the training sample increases from 24 to 36 months. The MAE's dependence on the training sample size is least pronounced in the case of dataset 3, which is characterized by the lowest and most homogeneous MAEs across all experiments. Our comparison shows that even using as few as 12 months of training data provides decent accuracy for 12-month ahead forecasts, confirming the ability of the chosen theoretically justified BdW model to extrapolate many periods ahead.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"5. Discussion","content":"\u003cp\u003eThe BdW model has shown the most uniformly good performance across several datasets and is considered a reliable tool for long-term retention projections. Employee retention projection is a complex process that involves both the duration and estimation of behavioral factors (e.g., reasons for churn, robust external factors at certain times) to achieve accurate outcomes. Therefore, the model must consider increasing periods and capture the sample's heterogeneity.\u003c/p\u003e \u003cp\u003eAccording to Fader and Hardie (2018), retention rates change over time due to the persistently changing behavior of customers, which similarly applies to employees' behavior and reasons for churn. The BdW model performed the best among the tested models because it can capture individual-level churn/retention probabilities over increasing time durations.\u003c/p\u003e \u003cp\u003eThe poorer performance of other models can be attributed to their inability to grasp the relationships between variables affecting potential retention rate changes over time. For instance, the sBG model presented by Fader and Hardie (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2007\u003c/span\u003e) showed satisfactory results for short-term projections but may not perform robustly with other datasets over longer periods. The sBG model includes Beta and Geometric distributions, considering both sample heterogeneity and the probability of churning over time. However, it lacks the flexibility to capture long-term behavior tendencies effectively. In contrast, the BdW model offers a more flexible distribution projection for event occurrence. Unlike geometric distribution, the Weibull distribution captures nuances in individual behavior patterns over time (e.g., the likelihood of an employee staying or leaving a company).\u003c/p\u003e \u003cp\u003eThe linear and logarithmic regression models also fall short of long-term predictions. The linear model assumes a linear trend in cohort size, which does not hold for the complex dynamics of employee retention. Retention projection involves capturing non-linear relationships and evaluating evolving factors (e.g., external environment changes, HR market dynamics, and employees' physical and mental states). While logarithmic regression can capture nonlinear changes, it poorly corresponds to the underlying data-generating process.\u003c/p\u003e \u003cp\u003eFader and Hardie (2018) highlight the BdW model's flexibility in handling inhomogeneous changes in retention rate projection. Several studies emphasize the Weibull distribution's flexibility in capturing complex survival patterns and suggest different extensions to develop more accurate predictive models. For example, Jamal and Bucklin (2006) found that the Weibull model outperformed simpler hazard modeling approaches, underscoring the importance of integrating heterogeneity for churn/retention projection. Enkhmunkh et al. (\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2007\u003c/span\u003e) highlighted the Weibull distribution's accuracy and flexibility for shape-forming and simple cumulative distribution functions. Nekoukhou et al. (2017) introduced the discrete equivalent of the beta-Weibull distribution for discrete data and survival modeling. Chanasriphum et al. (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) developed a model based on the beta-generalized Weibull distribution for lifetime data forecasting, which outperformed other regression models.\u003c/p\u003e"},{"header":"6. Conclusion","content":"\u003cp\u003eThis paper investigates the feasibility of applying the BdW model, originally developed by Fader and Hardie (2018) for customer analytics, to project employee retention curves. We compared the accuracy of two probabilistic models\u0026mdash;the sBG and BdW models\u0026mdash;originally designed for customer churn prediction, to assess their suitability for employee churn prediction. Testing these models on three real-life datasets, we found that both models are effective in HR analytics for predicting employee retention, with the BdW model outperforming the sBG, the linear, and logarithmic regression models by capturing changes in attrition rate heterogeneity over time. Our analysis indicates that using 48 months of historical data can provide accurate retention rate predictions for future periods.\u003c/p\u003e \u003cp\u003eUsing probabilistic models for employee retention curve projection is addressed for the first time in academic HR analytics literature. Our research adds scientific novelty by demonstrating that such parsimonious models based on limited data can give surprisingly accurate forecasts. Although ML approaches can handle large datasets and generate individual-level predictions, they often lack interpretability, require substantial data, and apply mostly to short-term forecasting (e.g., 1\u0026ndash;6 months ahead). On the other hand, probabilistic models offer greater transparency of underlying data-generating processes. They can be implemented using tools like Microsoft Excel, making them accessible for organizations with limited data science expertise. Firms can make projections for many months ahead and plan recruitment accordingly. While model-based estimates naturally cannot account for rare external shocks, they can serve as a useful diagnostic tool for detecting the company\u0026rsquo;s under- or over-performance from the employee retention perspective.\u003c/p\u003e \u003cp\u003eHowever, probabilistic models have their limitations. Unlike customer churn, employee turnover can be voluntary or involuntary, varying by circumstances and industries. These different churn types may require distinct approaches, affecting forecast quality. Additionally, the presented parsimonious probabilistic models apply mostly to cohorts of relatively homogeneous employees (regarding the job function).\u003c/p\u003e \u003cp\u003eFuture research can test and evaluate the accuracy of these results across different industries and role types. Given the rapidly changing environments of recent years, it would be useful to test the models with recent datasets to determine if high degrees of uncertainty affect the quality of retention projections or if the best probabilistic models still generalize well even without explanatory variables.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNone\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNone\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval/declarations (include appropriate approvals or waivers)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent to participate\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and material\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe spreadsheet with data and model implementations is available in the dedicated Open Science Framework (OSF) repository.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCode availability (software application or custom code)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eE.A. conceptualized, planned, and reviewed the manuscript. A.G. developed the spreadsheet models. Y.T. prepared the data and wrote the main text. E.P. wrote the conclusion and extensively reviewed the whole paper.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAhn J, Hwang J, Kim D, Choi H, Kang S (2020) A survey on churn analysis in various business domains. IEEE Access 8:220816\u0026ndash;220839\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlamsyah A, Salma N (2018) A comparative study of employee churn Prediction model. 2018 4th International Conference on Science and Technology (ICST)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlho JM (2014) Forecasting demographic forecasts. Int J Forecast 30(4):1128\u0026ndash;1135\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAl-Suraihi WA, Samikon SA, Alsuraihi A, Ibrahim I (2021) Employee Turnover: Causes, importance and retention strategies. Eur J Bus Manage Res 6(3):1\u0026ndash;10\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBerry LR, Helman P, West M (2020) Probabilistic forecasting of heterogeneous consumer transaction\u0026ndash;sales time series. Int J Forecast 36(2):552\u0026ndash;569\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBoshnakov GN, Kharrat T, McHale IG (2017) A bivariate Weibull count model for forecasting association football scores. Int J Forecast 33(2):458\u0026ndash;466\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChakraborty M, Das S, Lavoie A (2015) How to show a probabilistic model is better. arXiv (Cornell University)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChanasriphum N, Seenoi P, Srisodaphol W (2019) The Log Beta generalized Weibull regression model for lifetime data. Journal of Physics: Conference Series, 1366(1), 012121\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDe Winne S, Marescaux E, Sels L, Van Beveren I, Vanormelingen S (2018) The impact of employee turnover and turnover volatility on labor productivity: a flexible non-linear approach. Int J Hum Resource Manage 30(21):3049\u0026ndash;3079\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEnkhmunkh N, Kim G, Hwang K, Hyun S (2007) A parameter estimation of Weibull distribution for reliability assessment with limited failure data. International Forum on Strategic Technology, 39\u0026ndash;42. Ulaanbaatar, Mongolia\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFader PS, Hardie BGS (2009) Probability Models for Customer-Base Analysis. J Interact Mark 23(1):61\u0026ndash;69\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFader PS, Hardie BGS, Lee KL (2005) Counting your customers the easy way: an alternative to the Pareto/NBD model. Mark Sci 24(2):275\u0026ndash;284\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFader PS, Hardie BGS Fitting the sBG Model to Multi-Cohort Data., BruceHardie (2007) \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.brucehardie.com/notes/017/\u003c/span\u003e\u003cspan address=\"https://www.brucehardie.com/notes/017/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed on 15 December 2023\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFader PS, Hardie BGS (2007) How to Project Customer Retention. J Interact Mark 21:76\u0026ndash;90\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGandy A (2012) Performance monitoring of credit portfolios using survival analysis. Int J Forecast 28(1):139\u0026ndash;144\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGentek A (2022) Employee Churn Prediction in Healthcare Industry using Supervised Machine Learning (MA thesis). KTH Royal Institute of Technology. Retrieved from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.diva-portal.org/smash/get/diva2:1711505/FULLTEXT01.pdf\u003c/span\u003e\u003cspan address=\"https://www.diva-portal.org/smash/get/diva2:1711505/FULLTEXT01.pdf\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGregory B (2018) Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data. arXiv (Cornell University\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIversen EB, Morales JM, M\u0026oslash;ller JK, Madsen H (2016) Short-term probabilistic forecasting of wind speed using stochastic differential equations. Int J Forecast 32(3):981\u0026ndash;990\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJamal Z, Bucklin RE (1987) Improving the diagnosis and prediction of customer churn: A heterogeneous hazard modeling approach. J Interact Mark 20(3\u0026ndash;4):16\u0026ndash;29\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJaved S, Azhar A (2017) Forecasting Employee Turnover for Human Resource Based on Time Series Analysis. Int J Econ Res 14(16):445\u0026ndash;456\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJun DB, Kim K, Park M (2016) Forecasting annual lung and bronchus cancer deaths using individual survival times. Int J Forecast 32(1):168\u0026ndash;179\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKang S, Oh H (2024) Forecasting South Korea\u0026rsquo;s presidential election via multiparty dynamic Bayesian modeling. Int J Forecast 40(1):124\u0026ndash;141\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim D, Lee H, Cho S (2008) Response modeling with support vector regression. Expert Syst Appl 34(2):1102\u0026ndash;1108\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKolasa M, Rubaszek M (2015) Forecasting using DSGE models with financial frictions. Int J Forecast 31(1):1\u0026ndash;19\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLevene M, Fenner T (2021) A stochastic differential equation approach to the analysis of the 2017 and 2019 UK general election polls. Int J Forecast 37(3):1227\u0026ndash;1234\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMahsa EG, Jafar TM (2013) Customer Lifetime Value Models: A literature Survey. Int J Industrial Eng Prod Res 24(4):317\u0026ndash;336\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMasarifoglu M, B\u0026uuml;y\u0026uuml;kl\u0026uuml; AH (2019) Applying survival analysis to telecom churn data. Am J Theoretical Appl Stat 8(6):261\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMena CG, De Caigny A, Coussement K, De Bock KW, Lessmann S (2019) Churn Prediction with Sequential Data and Deep Neural Networks. A Comparative Analysis. arXiv (Cornell University)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNaz K, Siddiqui IF, Koo J, Khan MA, Qureshi NMF (2022) Predictive modeling of employee churn analysis for IoT-Enabled software industry. Appl Sci 12(20):10495\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNekoukhou V, Bidram H, Roozegar R (2016) The Beta-Weibull Distribution on the Lattice of Integers. Ci\u0026ecirc;ncia E Natura 39(1):40\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNestor B, McDermott MBA, Boag W, Berner G, Naumann T, Hughes MC, Goldenberg A, Ghassemi M (2019) Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks. Proceedings of Machine Learning Research, 106, 1\u0026ndash;23\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSaradhi VV, Palshikar GK (2011) Employee churn prediction. Expert Syst Appl 38(3):1999\u0026ndash;2006\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSaridakis G, Cooper CL (2016) Introduction: the state of employee turnover. In Edw Elgar Publishing eBooks, 1\u0026ndash;4\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSingh P, Singh SP, Singh DS (2019) An Introduction and Review on Machine Learning Application in Medicine and Healthcare. IEEE Conference on Information and Communication Technology, 1\u0026ndash;6\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTamaddoni A, Stakhovych S, Ewing MT (2015) Comparing churn prediction techniques and assessing their performance. J Service Res 19(2):123\u0026ndash;141\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYigit IO, Shourabizadeh H (2017) An approach for predicting employee churn by using data mining. 2017 International Artificial Intelligence and Data Processing Symposium (IDAP). Malatya, Turkey\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Footnotes","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003e Employees Jobs and Attributes County of Marin\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e Legacy City-Parish Employees\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e % in the context of this paper indicates percentage points. For example, if 80% of the cohort were retained in period \u003cem\u003et\u003c/em\u003e, while the model prediction is 78%, the MAE is two percentage points.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"operations-research-forum","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Operations Research Forum](https://link.springer.com/journal/43069)","snPcode":"43069","submissionUrl":"https://submission.nature.com/new-submission/43069/3","title":"Operations Research Forum","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"employee churn, churn prediction, probabilistic models, beta distribution, geometric distribution, Weibull distribution","lastPublishedDoi":"10.21203/rs.3.rs-4765185/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4765185/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eEmployees are vital assets to any organization, and their departure can result in reduced human capital and operational disruptions. To mitigate this, companies employ predictive analysis to forecast potential employee churn. Probability-based modeling for projecting employee churn is an underexplored area in HR analytics. This paper tests the applicability of the shifted-beta-geometric (sBG) and beta-discrete-Weibull (BdW) models within the context of employee survival projection. Using data from three cohorts of employees, we compare the results of these models with each other as well as with linear and logarithmic regressions. Our key finding is the superior performance of the BdW model, which can capture differences in churn rates between employees and within employees over time. The beta distribution captures the heterogeneous employee loyalty, while the Weibull distribution effectively captures retention rate changes over time. Our research demonstrates that parsimonious probabilistic models, which require minimal data and have so far been used only in customer analytics, can be applied in HR analytics for projecting employee retention curves.\u003c/p\u003e","manuscriptTitle":"Applying Shifted-Beta-Geometric and Beta-Discrete-Weibull Models for Employee Retention Curve Projection","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-16 19:39:41","doi":"10.21203/rs.3.rs-4765185/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-09-30T15:22:32+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-09-26T01:56:15+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-09-25T22:07:32+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"330837298869244829692904298443179982359","date":"2024-09-05T04:24:51+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"6251070867218966041689001812844060758","date":"2024-09-04T12:10:22+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-07-28T06:04:42+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-07-22T22:25:48+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-07-22T22:24:53+00:00","index":"","fulltext":""},{"type":"submitted","content":"Operations Research Forum","date":"2024-07-18T22:11:46+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"operations-research-forum","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Operations Research Forum](https://link.springer.com/journal/43069)","snPcode":"43069","submissionUrl":"https://submission.nature.com/new-submission/43069/3","title":"Operations Research Forum","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"ad042120-7aac-46b0-a4c3-4044aced9c6f","owner":[],"postedDate":"August 16th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-02-24T16:24:25+00:00","versionOfRecord":{"articleIdentity":"rs-4765185","link":"https://doi.org/10.1007/s43069-025-00417-0","journal":{"identity":"operations-research-forum","isVorOnly":false,"title":"Operations Research Forum"},"publishedOn":"2025-02-19 15:57:29","publishedOnDateReadable":"February 19th, 2025"},"versionCreatedAt":"2024-08-16 19:39:41","video":"","vorDoi":"10.1007/s43069-025-00417-0","vorDoiUrl":"https://doi.org/10.1007/s43069-025-00417-0","workflowStages":[]},"version":"v1","identity":"rs-4765185","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4765185","identity":"rs-4765185","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-23T02:00:01.238055+00:00
License: CC-BY-4.0