Development and Validation of Survival Scores and the Assessment of Spatial Trends in End-Stage Kidney Disease Outcomes

doi:10.21203/rs.3.rs-6523746/v1

Development and Validation of Survival Scores and the Assessment of Spatial Trends in End-Stage Kidney Disease Outcomes

2025 · doi:10.21203/rs.3.rs-6523746/v1

preprint OA: closed

Full text JSON View at publisher

Full text 142,933 characters · extracted from preprint-html · click to expand

Development and Validation of Survival Scores and the Assessment of Spatial Trends in End-Stage Kidney Disease Outcomes | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Development and Validation of Survival Scores and the Assessment of Spatial Trends in End-Stage Kidney Disease Outcomes Nathan Meyer, Hossein Moradi Rekabdarkolaee, Brandon M. Varilek, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6523746/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background There is a need to create new mortality prediction models for end-stage kidney disease (ESKD). This study aimed to develop and validate survival scores for patients with ESKD using a mixture cure model (MCM) including assessing the spatial trends in ESKD outcomes. Methods This study used a United States Renal Data System (USRDS) dataset that contains 2,228,693 people with incident ESKD from 2000 through 2020, including those on dialysis or had at least one transplant. Many variables, including demographic and comorbid factors, were included within an MCM. This MCM was used to develop seven survival scores that would be summarized geographically. These survival scores are shown using maps of the United States and validated using the clinical measurements found within the USRDS dataset. Results Many spatial survival trends across the United States were observed that could be validated using the USRDS data and current literature. The Appalachian and Great Plains regions of the United States contained individuals who mostly had lower survivability. Conversely, individuals residing around Southern California, in the Southeast, and around the Texas-Mexico border had higher survivability. Most of these findings aligned with previous studies. Furthermore, many of the trends could be explained by both the coefficient estimates of the MCM and the characteristics of the individuals living in each region. For example, the MCM coefficient estimates found Hispanics to have a higher survivability than their non-Hispanic counterparts, which aligned with the predominantly Hispanic-populated area of the Texas-Mexico border. Lastly, serum creatine, a USRDS variable not used within the MCM, was found to have a moderately positive, linear relationship with the survival scores developed. Conclusions The survival scores developed and validated may benefit practitioners and policy-makers in more effectively addressing ESKD disparities. ESKD mixture cure model prognostic model risk score USRDS survival score Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Background In 2022, an average of 360 people in the United States (U.S.) initiated treatment for end-stage kidney disease (ESKD) each day [ 1 ]. In the same year, the adjusted incidence rate for all-cause mortality in adult ESKD patients was approximately 146 deaths per 1,000 person-years. The U.S. Renal Data System (USRDS) 2024 Annual Data Report relays these findings and also shares the many vast differences for mortality as partitioned by modality, age, race, ethnicity, and other factors [ 1 ]. For example, a study points out that American Indian / Alaska Native (AI/AN) persons disproportionately suffer from ESKD when compared to non-Hispanic White persons [ 2 , 3 ]. This study also states that this disparity is especially true in remote places, such as in the AI/AN tribal lands of South Dakota, where receiving a transplant, the treatment option of choice, becomes unachievable for many. Furthermore, ESKD patients face hard therapeutic decisions [ 4 ]. The consequences of their choice of medical care varies depending on the patient due to the heterogeneity seen across treatment outcomes [ 5 ]. Mortality risk scores for ESKD patients would aid in making treatment decisions [ 4 ]. A survival score would be the complement of a mortality risk score, which comes naturally from a survival model. In ESKD, accurate prediction models for assessing patient risk are extremely important to develop and validate, as the risk scores created may improve both patient outcomes and clinical practice [ 6 ]. Mortality risk scores may also result in targeted population screening [ 7 ]. In other words, we may stratify the population ( e.g. , low, medium, and high risk) such that the risk score of each person is automatically identified from patient profiles. This stratification of individual risk may also provide information on the complications for ESKD to help in prevention [ 8 ]. Further, using these risk scores within helpful tools may assist in prognosis, recognition of high-risk patients, and application of different therapeutics [ 9 ]. Existing mortality prediction models for ESKD patients have either concerns for applicability to the clinical setting or contain a risk of bias due to various reasons, including the study population selected. 3 Hence, it was concluded that contemporary mortality prediction models still need to be developed and validated; external validation is essential to apply the mortality risk scores developed to a clinical setting [ 9 ]. In other words, data based on a different sample of individuals should be used to test the risk scores instead of using the same data used during model development [ 10 ]. Mortality risk scores have been developed for ESKD patients, but have focused on older patients [ 11 , 12 ], patients in the early stages of receiving dialysis [ 13 ], or have developed other risk scores ( i.e. , not mortality risk scores) that attempt to predict the risk of developing progressive chronic kidney disease [ 14 ] or kidney transplant failure [ 15 ]. Factors known to accelerate chronic kidney disease are also important for assessing ESKD mortality risk such as demographic variables ( e.g. , age, sex, ethnicity), nephrotoxins ( e.g. , smoking, alcohol, drugs), and medical factors ( e.g. , cardiovascular disease, diabetes); furthermore, a large prospective study with considerable observational time may give more accurate (or useful) risk scores [ 7 ]. The survival scores we develop in this manuscript are based on a mixture cure model (MCM) developed using a large population of ESKD patients, allowing for a long follow-up time. This dataset also has many covariates, including the examples listed earlier. These survival scores are validated using not only these same variables but also variables not used when creating the survival scores. This validation step includes summarizing these survival scores geographically to better understand the underlying trends that emerge. Our primary objective is to develop and validate survival scores based on an MCM using the variables found within the USRDS dataset. In addition, these survival scores may be used to develop an easy-to-use tool that gives survival score predictions. This would enable both persons with ESKD and stakeholders, such as practitioners or policy-makers, to make data-informed decisions. Methods Dataset The survival scores developed are based on a survival model fitted on a USRDS dataset [ 16 , 17 ]. This dataset contains individuals with incident ESKD from 2000 through 2020 including both those who either are on dialysis or had at least one transplant. Time and event status were the outcome variables used in the survival analysis, such that the event of interest was all-cause mortality. Independent variables used within the model were age group, sex, race, Hispanic, primary disease, Liu comorbidity index, inability to ambulate, inability to transfer, needs assistance with daily activities, institutionalized, alcohol dependence, tobacco use, drug (illicit) dependence, amputation, toxic nephropathy, modality, transplant, employment, insurance, rurality, and region. The original dataset was reduced due to missing covariate values and some individuals that were over 108 years old. Furthermore, this study focuses on individuals within the 50 U.S. states and the District of Columbia. Because mortality for those on dialysis and those who receive a transplant widely differ [ 18 ], an MCM was used to capture this heterogeneity seen within the data. Mixture Cure Model Methodology Let $\:t$ be an observed time until the event of interest or time of right-censoring, $\:x$ and $\:z$ each be a vector of covariates, $\:b$ and $\:\beta\:$ each be a vector of coefficients, and $\:{S}_{0}\left(t\right)$ be the baseline uncured survival function. The MCM is then given as $$\:S\left(t;\varvec{x},\varvec{z}\right)=\pi\:\left(\varvec{z}\right)S\left(t;\varvec{x}\right)+1-\pi\:\left(\varvec{z}\right)$$ such that $$\:\pi\:\left(\varvec{z}\right)=\frac{{e}^{{\varvec{b}}^{\varvec{T}}\varvec{z}}}{1+{e}^{{\varvec{b}}^{\varvec{T}}\varvec{z}}}\:and\:S\left(t;\varvec{x}\right)={\left({S}_{0}\left(t\right)\right)}^{\text{exp}\left\{{\varvec{\beta\:}}^{\varvec{T}}\varvec{x}\right\}}$$ represent the incidence (or the proportion of uncured individuals) and latency (or the uncured survival function), respectively [ 19 ]. The expectation-maximization algorithm is used to find estimates for $\:b$ and $\:\beta\:$ [ 20 ]. Further, bootstrap sampling is used to find standard errors for these parameter estimates [ 21 ]. An MCM was first fit to the USRDS dataset, and corresponding survival score summaries were extracted. Further details of applying an MCM to the USRDS dataset was previously presented [ 22 ]. Concordance The concordance statistic was used to compare different methods of fitting the MCM. Concordance attempts to estimate the probability of a prediction moving in the same direction as the observed values for any two observations [ 23 ]. The concordance statistic is useful in showing how well a given model may correctly order individual survival levels [ 24 ]. Let $\:{y}_{i}$ and $\:{y}_{j}$ be two observations with corresponding predictions $\:{\widehat{y}}_{i}$ and $\:{\widehat{y}}_{j}$ for $\:i,j\in\:\:\{1,\dots\:,n\}$ . Then, the concordance statistic estimates $\:P\:\left({\widehat{\text{y}}}_{i}\:>\:{\widehat{\text{y}}}_{j}|{y}_{i}\:>\:{y}_{j}\right)$ . In other words, it estimates the probability of a correct prediction given the actual values. Then, $$\:C=\frac{c+{t}_{\widehat{y}}/2}{c+d+{t}_{\widehat{y}}}$$ is one possible estimate such that $\:c$ , $\:d$ and $\:{t}_{\widehat{y}}$ represent the concordant, discordant, and tied observations within the predicted values, respectively [ 25 ]. A higher concordance value is thus considered better. A concordance value of 0.5 indicates the model may be thought of as being just as useful as flipping a coin when ordering individual risk levels. Lastly, a concordance value less than 0.5 would mean that the model is actively ordering individual risk levels in an incorrect manner (the model is worse than just flipping a coin when comparing individuals). In a survival analysis setting, censoring must also be considered. The observed values, $\:{y}_{i}$ ’s, would be pairs of survival times and status indicators. Hence, censored times must be able to be compared against other times. For example, a censored time of two years would be considered larger than one year, whereas an observed time of three years is not necessarily greater than the censored time of two. Nor is the censored time of two years necessarily equal to another censored time of two. This is due to the assumption that censoring indicates to us only that the given individual has survived greater than the recorded time. Furthermore, $\:{\widehat{y}}_{i}$ ’s would represent risks that are calculated by considering linear predictors. In other words, the multiplication of the vector of estimates by the observed covariate values. This means a concordant pair would then be represented by the observed value and predicted risk moving in opposite directions, whereas a discordant pair is indicated by the movement of the two in the same direction. For instance, an individual that has an observed time of two years and a risk of four and another person with an observed time of three years and a risk of one would be considered a concordant pair. The infinitesimal jackknife method is used to find standard error estimates for each concordance estimate [ 25 ]. The ordinary jackknife method assumes $\:\widehat{{\theta\:}}$ is an estimate for some parameter to be estimated, $\:{\theta\:}$ . Further, let $\:{\widehat{{\theta\:}}}_{\left(i\right)}$ be the estimate when the $\:i$ th observation is removed for $\:i\in\:\{1,\:.\:.\:.\:,\:n\}$ [ 26 ]. Then, $$\:\text{var}\left(\widehat{{\theta\:}}\right)=\frac{n-1}{n}{\sum\:}_{i=1}^{n}{\left({\widehat{{\theta\:}}}_{\left(i\right)}-\stackrel{-}{{\theta\:}}\right)}^{2}$$ such that $\:\stackrel{-}{{\theta\:}}=\frac{1}{n}{\sum\:}_{i=1}^{n}{\widehat{{\theta\:}}}_{\left(i\right)}$ is the sample mean. In context, the true concordance would be represented by $\:\theta\:$ . Note that this method does not require refitting the model using a new dataset each time; on the other hand, only the estimate for concordance is found each time based on the results of the model fit once. The infinitesimal jackknife method instead involves assigning weights to each observation. Hence, the ordinary jackknife method is equivalent to assigning a weight of one to all observations except the observation that is being left out is assigned a weight of zero. Instead, the infinitesimal jackknife method assigns weights close to zero to “leave out” an observation. This method is not discussed further. Instead, it is pointed out that the results for this method are extremely similar to the ordinary jackknife method for “moderate to large data” to which the USRDS dataset would fall under this category [ 25 ]. Another method of estimating the standard error is to fit each model multiple times using Monte Carlo cross-validation (MCCV). MCCV consists of partitioning the data into training and testing sets at random, calculating the desired statistic, and repeating these two steps multiple times. More specifically, each model is refit using the training set, and the estimated concordance of the testing set is found. Furthermore, partitioning the data randomly each time means that each partition is independent of all other partitions. In context, the concordance estimate is the desired statistic. Since we have survival data, each partition must have an adequate number of individuals that were not censored. Hence, each partition is created so that the event status remains proportional to the original dataset every time. The estimated standard error of the concordance is then equal to the standard deviation of each cross-validation sample divided by the square root of the sample size. Development of Survival Scores The MCM is used to extract several survival scores at individual and geographic levels. Algorithm 1 shows an algorithm for the development of survival scores. After selecting the method for fitting the model, survival scores may be developed using both the USRDS dataset and the MCM results based on the chosen method. As indicated by the algorithm, these represent the input, whereas the output is the set of seven survival scores across counties. These seven survival scores mentioned are related to the survivability of an individual and are listed as follows: Score 1 - probability of surviving beyond 2 years Score 2 - probability of surviving beyond 5 years Score 3 - probability of surviving beyond 10 years Score 4 - probability of surviving beyond 15 years Score 5 - time at 25% survival probability Score 6 - time at 50% survival probability Score 7 - time at 75% survival probability. The last three survival scores represent the quartiles. The time variable within the dataset ranges from zero through 21 years with most events happening early on; hence, we used two, five, ten, and 15 years for the first four survival scores. Figure 1 displays a Kaplan-Meier survival curve along with the seven survival scores indicated as points on the line. This figure indicates that the survival scores adequately describe the entire survival curve. For example, the first four survival scores are somewhat equally spaced across the curve. The confidence intervals displayed are tight mostly due to the size of the dataset. Finally, the dataset must also be summarized by county when mapping these seven survival scores across the U.S. Three different possibilities of summarizing the data across counties is explored. The first option is the univariate mode, which is the most intuitive and straightforward. The USRDS data is summarized across counties by finding the mode of each covariate used within the model. The seven different survival scores are then calculated using this summarized data and the MCM results. More specifically, the results of an MCM provide estimates for the coefficients involved within the model. This allows the survival function to be estimated using the values found for each of the covariates ( i.e. , the mode of each county). This survival function then indicates the survival probability at any time. Note that survival probability represents the probability of surviving past a given time. Each of the seven survival scores may then be computed from this information giving the desired output of the algorithm. The second case is the multivariate mode, which also considers finding a mode; however, the USRDS data is now summarized across counties by finding a simultaneous mode of multiple covariates. In particular, the most important predictors according to the MCM are used as the set of multiple covariates to find the multivariate mode of. To find the most important predictors, the magnitude of the z-values for each coefficient summed across covariate level (as there are two coefficients per covariate level in the MCM) is considered. A large z-value indicates the covariate has greater importance. An estimated cumulative distribution function may be considered to find a clearer cut-off for which predictors are most important. After this multivariate mode is calculated, the univariate mode of each of the remaining covariates is found as this information is needed to complete the calculation of the seven survival scores. The third case given, the multiple mode, considers calculating the survival scores prior to summarizing the data. This contrasts with the first two cases, where the data was summarized first. Hence, the mode of the survival scores is found rather than of the covariates. Note that the mode is used, as opposed to the mean, due to the bimodal and highly right-skewed nature of the distribution of survival scores. In other words, the mean, unlike the mode, may not accurately represent this often highly skewed data. Since the survival scores are continuous, the estimated density was used to find the multiple mode. The word ‘multiple’ in the name refers to finding both the first and second mode (or the first and second peak of the distribution of survival scores). The second mode was considered as it may capture information about a different group of individuals. This gives two different sets of seven survival scores for this third case. As indicated by Algorithm 1 for the third case, each of the seven survival scores are first calculated for all individuals within the USRDS dataset. Then, a kernel density estimation of each survival score across counties is found using the survival scores calculated. The default settings of the ‘density’ function in R was used which included using a normal kernel with the bandwidth following Silverman’s rule of thumb [ 27 , 28 ]. Note that any of the methods for the development of survival scores could also be applied to the data at a different spatial level, such as zip codes. R Statistical Software version 4.3.2. was used for all statistical analyses [ 28 ]. Results This study included 2,228,693 people from the original 2,429,942 who had incident ESKD from 2000 through 2020 and met the inclusion criteria. The most common race was White among 85.1% of counties, followed by 12.9% for Black individuals, 1.7% for AIs/ANs, 0.2% for Asian individuals, and less than 0.1% for both Native Hawaiians / Pacific Islanders and the category of other races. In 96.5% of counties, the most common ethnicity was non-Hispanic leaving 3.5% for Hispanic individuals. Lastly, the most common primary disease was diabetes among 89.8% of counties, followed by 7.8% for hypertension, 1.8% for the category of other diseases, and 0.6% for glomerulonephritis / cystic kidney disease. The results section of this paper is structured as follows. First, model comparison is conducted using both an entire-based model and region-based models. This is followed by the determination of the most important predictors. Then, the application of the survival scores using maps of the United States is presented. Finally, model deployment is briefly discussed. Model Comparison Prior to the application of survival scores, model comparison is conducted. We first considered the appropriateness of using one MCM versus multiple MCMs partitioned by geographical region. This would not only create models that better fit the data but also more accurate survival scores. The U.S. Department of Health and Human Services (HHS) partition the U.S. into ten different regions such that each state is assigned to only one region [ 29 ]. The concordance index was used to compare either fitting the MCM using the entire dataset versus fitting different MCMs to the ten regions separately. The Indian Health Service (IHS) Great Plains region that consists of North Dakota, South Dakota, Nebraska, and Iowa was considered in addition to HHS regions [ 30 ]. One difference with the full model is that the “region” covariate (which partitioned the U.S. into four regions) was not used when constructing MCMs partitioned by region. Finding the appropriate estimate of the concordance statistic for the MCM itself is not immediately intuitive [ 31 ], and the MCM is computationally expensive to fit due to the use of the expectation-maximization algorithm [ 32 ]. Hence, Cox regression models (a simplified survival analysis) were used instead to complete model comparison. The estimated concordance for each region and the corresponding 95% normal-based confidence intervals (CIs) using the infinitesimal jackknife method are shown in Fig. 2 A. Figure 2 B then shows the same information but with sample size in relation to the full dataset superimposed. Further, the concordance is shown to be the same for the ‘entire’ region since it is calculated using the same model either way. When performing MCCV, the training set consists of 80% of the data, whereas the testing set will be the remaining 20%. Figure 2 C shows the results of 100 samples using MCCV. Further, Fig. 2 D depicts how these MCCV samples are used to develop 95% normal-based CIs for the original concordance estimates. Besides region three, the concordance estimates for the region-based models are consistently greater than the corresponding values for the entire-based model. This indicates that using the region-based models improves the ordering of individual survival levels and is thus a more accurate method. This conclusion about the Cox regression models will be applied to the MCMs. While the jackknife CIs in Fig. 2 A overlap with one another when comparing the region- and entire-based models, some MCCV CIs in Fig. 2 D do not and are much tighter. On the other hand, the CIs tend not to overlap across regions in both Figs. 2 A and 2 D. These differing concordance values across regions may be explained by the number of observations within each region. Figure 2 B indicates that sample size differences among regions seem to be associated with concordance. Overall, the concordance indicates that the region-based models are slightly more accurate; thus, the results of the region-based MCMs were used in the development of the survival scores as opposed to the entire-based MCM results. Most Important Predictors Using the previously described procedure for finding the most important predictors according to the MCM, the top 15 covariate factors with the highest importance values are listed: 1. Transplant - At least one transplant 2. Liu comorbidity index 3. Race - Black 4. Hispanic - Yes 5. Primary disease - Glomerulonephritis or Cystic kidney disease 6. Primary disease - Other 7. Race - Asia 8. Institutionalized - Nursing home 9. Age group − 80 and older 10. Primary disease - Hypertension 11. Employment - Employed 12. Age group − 70 to 79 13. Insurance - Medicare and Medicaid 14. Race - Native Hawaiian / Pacific Islander 15. Region - South When fitting the MCM, indicator variables must be created for categorical variables. Thus, multiple z-values are found for some covariates (one z-value per indicator variable). Despite this limitation, the top five items with the highest value contained different covariates. As expected, transplant status was the top variable listed. These top variables were used in further analyses to help indicate what covariates possibly influenced the survival scores. Furthermore, they were used as the subset of variables when considering the multivariate mode for summarizing the data across counties. However, finding the simultaneous mode of the top five covariates would be troublesome as no mode would almost always be found due to the inclusion of a continuous variable – the Liu comorbidity index. Thus, the Liu comorbidity index was not used when finding the simultaneous mode for the multivariate mode option. Further justification for using the top five predictor variables in further analyses is given in the supplemental materials. Survival Score Trends Across the United States With the region-based MCMs and most important predictors selected, the survival scores may be mapped across the U.S. and validated. Figure 3 shows several maps of the U.S. when using the region-based MCMs to find each survival score across counties. In these maps, a larger survival score corresponds to a greater survivability. Also, the few counties that are colored white contain no persons within the USRDS dataset. Lastly, according to Centers for Medicare & Medicaid Services reporting rules, values representing one to ten individuals may not be reported or derived from reported work [ 33 ]. Thus, neighboring counties were considered to perform imputation when summarizing the dataset. More specifically, a nearest neighbors algorithm was used to perform the imputation where a mode or average, whichever is most appropriate, was found. The number of neighbors each county had varied; further, all neighbors for each county were used to complete the imputation process. Counties with imputed information have a gray border within the maps of the survival scores. Figure 3 shows similar trends for each survival score. For example, the Appalachian regions of the U.S. appear to have a lower survivability com- pared to surrounding areas. This is a faint trend seen according to any survival score, but it is more prominently shown within the map of the sixth survival score. A stronger conclusion is reached when considering the lower survivability found for individuals living in the Great Plains region of the U.S. (the area east of the Rocky Mountains). Alternatively, those living in the Southeast, around the Texas-Mexico border, and Southern California tend to have higher survivability. Recall the top five most important predictors – transplant, Liu comorbidity index, race, Hispanic, and primary disease. These variables may help internally explain the survival scores shown across counties by investigating the trends within these variables across regions. Figures 4 A through 4 C display the most frequent race, Hispanic status, and primary disease across counties, respectively. These three variables appear to highly correlate with the trends seen within the resulting survival score maps from Fig. 3 . For example, the characteristics of race and primary disease appear to be the main motivating factors for the trend seen in the Southeastern region of the U.S. More specifically, Black individuals and those with hypertension appear most frequently within the Southeast. This would indicate that these individuals have a higher survivability compared to White persons or those with diabetes when analyzing Figs. 3 and 4 A. Further, the trend shown around the Texas-Mexico border seems to be greatly driven by Hispanic status. In other words, Figs. 3 and 4 B show that Hispanic individuals appear to have a higher survivability based on the survival scores developed. Lastly, there appears to be a connection between locations in Figs. 4 C where AIs/ANs appear most frequent ( i.e. , AI/AN tribes) and places in Fig. 3 that indicate higher survivability. The other two most important variables previously mentioned were the Liu comorbidity index and transplant status. These maps showed nearly the same value each ( i.e. , zero for the index and a mode of no transplant) across the entire U.S. To further explain the trends seen, Fig. 5 visualizes the MCM split between its two parts. More specifically, the MCM has a latency portion that gives the survival probabilities for the individuals that are considered uncured. The MCM also has an incidence portion that gives the probability of being uncured for any individual. Note that the latency of the MCM relates to the survival function and thus may be summarized as discussed using the seven survival scores. Figure 5 A displays only the first survival score. The map of the survival score for only the latency portion in Fig. 5 A appears extremely similar to the maps of the survival scores given in Fig. 3 . This implies that the survival scores of the MCM favors the latency portion. On the other hand, the map of the incidence in Fig. 5 B shows a reduction in the number of individuals who have a higher probability of being cured within the aforementioned regions ( e.g. , the Southeast). Furthermore, Table 1 shows the coefficient estimates from the entire-based MCM for the selected covariates of race, Hispanic, and primary disease. More specifically, the table shows the hazard and odds ratios which represent the latency and incidence portion of the model, respectively. The coefficient estimates given in this table further affirm the survival score trends discussed. See the supplemental materials for an extended version of Table 1 that includes all covariates. Overall, analyzing the profile maps of some variables used within the MCM and separating the two portions of the MCM appear to help validate the survival scores calculated using the univariate mode. Table 1 Entire-based MCM coefficient estimates and standard errors of select variables (all results reported here were significant at a 0.05 level). Characteristic Latency Incidence Estimate (s.e.) Estimate (s.e.) Race White (ref) Black -0.340 (0.002) 0.302 (0.016) Asian -0.399 (0.006) -0.677 (0.030) AI/AN -0.234 (0.008) 0.467 (0.061) NH/PI -0.379 (0.010) -0.315 (0.065) Hispanic , yes -0.344 (0.003) -0.530 (0.018) Primary disease Diabetes (ref) Hypertension -0.022 (0.002) -1.299 (0.026) G/CKD -0.205 (0.004) -1.309 (0.023) Abbreviations: s.e., standard error; AI/AN, American Indian / Alaska Native; NH/PI, Native Hawaiian / Pacific Islander; G/CKD, Glomerulonephritis / Cystic kidney disease The second two cases for summarizing survival scores shown in Algorithm 1 may show further trends. As mentioned, the multivariate mode concerns finding a simultaneous mode of transplant, race, Hispanic, and primary disease across counties. A reduction in the number of variables for the multivariate mode is applied if no mode is found. For example, a multivariate mode of transplant, race, and Hispanic status (size three) is considered if a multivariate mode of size four (including primary disease) is not found. Note that the variables are removed in order of the most important variable list. When comparing the univariate and multivariate mode for each survival score across counties, high correlations range from about 80–96% across survival scores. In other words, the multivariate mode results in similar survival scores as the univariate mode. Most of the counties contained a multivariate mode of length four (the maximum length). A map of each survival score computed using the multivariate mode shows similar trends to each map of the univariate mode given in Fig. 3 . The multiple-mode method of summarizing survival scores includes finding either the first or second multiple-mode. The first multiple-mode is moderately to highly correlated with the univariate mode as correlations range from 65–72% across survival scores. As expected, a map of survival scores two through seven calculated using the first multiple-mode shown in Fig. 6 gives similar trends to each map of the univariate mode shown in Fig. 3 . On the other hand, the second multiple-mode is uncorrelated with the univariate mode as correlations range from − 5–28% across survival scores. This lack of correlation is intuitive as the second multiple-mode focuses on a different group of individuals. As seen in Fig. 7 , a map of the survival scores two through seven calculated using the second multiple-mode shows additional counties that are missing due to a second mode not able to be calculated from the density. Furthermore, Fig. 7 shows that survival scores five and six are mainly comprised of higher survival scores. Since the second mode often represents those that have a higher survivability, survival score maps five and six indicate that many of these individuals do not reach a 25% or 50% survival probability within the approximate 21.5 years of follow-up time. In other words, the largest time of about 21.5 is the closest time value to these survival probabilities. The seventh survival score map does indicate that more of these individuals do reach a 75% survival probability. Lastly, the second multiple-mode displayed in Fig. 7 does not show the same trends as the univariate mode pictured in Fig. 3 matching the correlation results aforementioned. The second mode will often be greater than the first mode due to the right-skewness of the data. This implies that the second mode should capture individuals who have a higher survivability. To help explain the differing trends within these two modes, we now consider the characteristics of those within the first mode group versus the second mode group. To find the characteristics, an individual is found with a similar survival score as reported for each county and for each mode. The characteristics of these individuals are then mapped across county. For example, Figs. 8 A and 8 B indicate that transplant status is the main motivating factor for the difference seen between the first and second mode. In other words, a person that aligns with the first mode is most likely without a transplant, whereas the second mode is mainly composed of individuals with a transplant. Further clinical variables from the USRDS dataset may be used to further validate the survival scores found. The clinical variables include height; weight; body mass index (BMI); lipid profile TC, TG, HDL, and LDL; serum creatinine; blood urea nitrogen (BUN); Hemoglobin; and, Hemoglobin A1C. Outliers of these clinical variables were removed prior to further analyses. Each variable was then compared to each of the seven survival scores at the individual, zip code, and county levels (with both the mean and median of each clinical variable being considered individually when summarizing geographically). The correlation between each was calculated. From all clinical variables available in the dataset, serum creatinine had the highest correlation for all seven survival scores. The remaining variables had little to no correlation at the individual and zip code levels. More specifically, serum creatinine compared to the various survival scores across counties showed a moderately positive correlation ranging between 45–50% ( i.e. , survivability increases as serum creatinine increases). See the supplemental materials for more details. Model Deployment with a User Interface A shiny application was developed to provide an easy-to-use user interface for interacting with not only the USRDS dataset but also the results of the MCM [ 34 ]. This application contained three parts. First, interactive barplots that aid in describing the dataset were provided, which include the covariates used within the MCM. Different types of barplots may be explored including ones partitioned by transplant status. Second, individual survival curves, partitioned by transplant status, were given based on the selection of variables that were used to fit the MCM. Interpretations of these survival plots were also provided including the predicted probability of being cured for a person with the specific covariates selected. The survival scores discussed in this paper were also interpreted and shown on these survival curves. Lastly, selected survival score maps were presented and made interactive. More specifically, the univariate mode calculated across counties was used. This shiny application may be accessed online [ 35 ]. Discussion In this work, an MCM is fitted to a USRDS dataset to develop and validate survival scores for ESKD patients across the U.S. The survival scores were calculated from region-based MCMs as these models provided more accurate results according to the concordance statistic. Seven different survival scores were developed using three different methods of summarizing the data geographically. The simpler approach of the univariate mode appeared to be sufficient at summarizing the data. Survival scores varied according to individual characteristics, allowing interesting trends to emerge when the survival scores were mapped across the U.S. The most important predictors from the MCM aided in the exploration and validation of the survival scores. The top five most important predictors were transplant, Liu comorbidity index, race, Hispanic, and primary disease. These variables were used to explain some of the trends seen within the survival scores. The Southeast section of the U.S. indicates higher survivability, which appears to follow the same region where there is a significant proportion of both Black individuals and persons with hypertension. This is intuitive since the model showed lower hazard for black individuals in the latency portion. Further, the model showed both lower hazard and odds of being uncured for persons with hypertension. Similarly, the Texas-Mexico border indicates higher survivability, which appears to follow the same area where there is a significant proportion of Hispanic individuals. Again, this is intuitive as the model showed both lower hazard and odds of being uncured for Hispanic individuals. These relationships related to race and ethnicity are further supported within literature as both Hispanic and Black individuals are often reported to have a lower risk of mortality, which this is usually attributed to their quick progression of kidney disease [ 36 – 39 ]. Lower survivability for individuals living in the Great Plains region of the U.S. may be attributed to both the presence of many agricultural communities and rural areas. For example, rural residents face significant hardships in receiving ESKD treatment, such as dialysis care, resulting in higher mortality rates [ 40 , 41 ]. Furthermore, individuals have a higher risk of developing chronic kidney disease when living in agricultural communities throughout the U.S. due to many factors, including geo-environmental and argo-environmental influences [ 42 ]. Another study shares the connection between the risk of ESKD and pesticide exposure [ 43 ]. Furthermore, we found a lower survivability in ESKD patients around the Appalachian regions which aligns with the results of a recent study using a USRDS dataset to explore ESKD mortality [ 44 ]. This same study also found lower age-standardized mortality rates near Southern California, matching our results about higher survivability. Additionally, this research found that lower ESKD survivability among counties is significantly associated with a lower percentage of Black residents. This result corresponds to the race variable’s heavy impact on the Southeastern region. The variables used within the model explain the survival scores developed. In addition, the clinical variable serum creatinine, which was not included in the model development, showed correlation with the survival scores. Hence, this provides some outside model validation. We found that serum creatinine had a linearly positive relationship with the survival scores developed, which was moderately strong when summarized across counties. Counterintuitively, this relationship indicates that a higher serum creatinine level indicates a lower mortality. This relationship has been found in a previous study that conducted a retrospective study of incident ESKD patients living in Maryland and Virginia [ 45 ]. This article provided a few explanations for why this may have happened. For example, serum creatinine level as an indication of overall health may supersede the application of this measure to specifically kidney function. More specifically, muscle mass and nutritional status, for instance, may confound the relationship between ESKD and serum creatinine. Conclusions The main limitation of this study is that external validation is required for the developed survival scores to be generalized and used within the clinical setting [ 10 ]. This study did make use of variables outside the MCM covariates to validate the survival scores; however, these outside variables originated from the same USRDS dataset used to fit the MCM. Thus, different observations are required to externally validate the survival scores developed. Overall, this study developed and internally validated survival scores based on MCMs for patients with ESKD using a large USRDS dataset. The spatial trends shown may enable future stakeholders to make data-informed decisions. Abbreviations AI/AN American Indian / Alaska Native BMI Body mass index BUN Blood urea nitrogen CI Confidence interval ESKD End-stage kidney disease G/CKD Glomerulonephritis / Cystic kidney disease HHS Health and Human Services IHS Indian Health Service MCCV Monte Carlo cross-validation MCM Mixture cure model NH/PI Native Hawaiian / Pacific Islander s.e. standard error U.S. United States USRDS United States Renal Data System. Declarations Ethics approval and consent to participate The authors confirm that all methods were carried out in accordance with relevant guidelines and regulations. Consent for publication Not applicable. Availability of data and materials Access to USRDS data is limited to researchers and institutions with approved Data Use Agreements and will not be released. The relevant code used to generate the results presented within this paper will be posted to GitHub. Competing interests The authors declare that they have no competing interests. Funding This research was, in part, funded by the National Institutes of Health (NIH) Agreement No. 1OT2OD032581. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NIH. Authors' contributions All authors made substantial contributions to the conception, design, and interpretation of this work. NM conducted the primary analysis of the data and drafted the manuscript. SM led the conceptualization and design of the work. All authors substantively revised the work, approved the submitted version, and agreed to be accountable for all aspects of the work. Acknowledgments The research reported in this work was supported by South Dakota State University, AIM-AHEAD Coordinating Center, award number OTA-21-017, and was, in part, funded by the National Institutes of Health Agreement No. 1OT2OD032581. The work is solely the responsibility of the authors and does not necessarily represent the official view of AIM-AHEAD or the National Institutes of Health. The data reported here have been supplied by the United States Renal Data System (USRDS). The interpretation and reporting of these data are the responsibility of the author(s) and in no way should be seen as an official policy or interpretation of the U.S. government. References National Institutes of Health. USRDS Annual Data Report: Epidemiology of kidney disease in the United States. 2024. Rekabdarkolaee HM, Longacre LE, Isaacson MJ, Varilek BM. Hospice Referral Rate Disparities of American Indian/Alaska Native Kidney Transplant Recipients with End-Stage Kidney Disease: A Retrospective Cohort Analysis. American Journal of Hospice and Palliative Medicine®. 2025;0(0):1-8. Varilek BM, Isaacson MJ, Moradi Rekabdarkolaee H. Evaluating disparities in end-stage kidney disease survival among American Indian/Alaska native persons with diabetes. Journal of Racial and Ethnic Health Disparities. 2024:1-11. Jarrar F, Pasternak M, Harrison TG, James MT, Quinn RR, Lam NN, et al. Mortality Risk Prediction Models for People With Kidney Failure: A Systematic Review. JAMA Network Open. 2025;8(1):e2453190-e. Taal MW, Brenner BM. Renal risk scores: progress and prospects. Kidney international. 2008;73(11):1216-9. Li Q, Li P, Xu Z, Lu Z, Yang C, Ning J. Association of diabetes with cardiovascular calcification and all-cause mortality in end-stage renal disease in the early stages of hemodialysis: a retrospective cohort study. Cardiovascular Diabetology. 2024;23(1):259. Taal MW, Brenner BM. Predicting initiation and progression of chronic kidney disease: developing renal risk scores. Kidney international. 2006;70(10):1694-705. O'Seaghdha CM, Lyass A, Massaro JM, Meigs JB, Coresh J, D'Agostino Sr RB, et al. A risk score for chronic kidney disease in the general population. The American journal of medicine. 2012;125(3):270-7. Ramspek CL, Voskamp PWM, Van Ittersum FJ, Krediet RT, Dekker FW, Van Diepen M. Prediction models for the mortality risk in chronic dialysis patients: a systematic review and independent external validation study. Clinical epidemiology. 2017:451-64. Ramspek CL, Jager KJ, Dekker FW, Zoccali C, van Diepen M. External validation of prognostic models: what, why, how, when and where? Clinical kidney journal. 2021;14(1):49-58. Thamer M, Kaufman JS, Zhang Y, Zhang Q, Cotter DJ, Bang H. Predicting early death among elderly dialysis patients: development and validation of a risk score to assist shared decision making for dialysis initiation. American Journal of Kidney Diseases. 2015;66(6):1024-32. Ramspek CL, Verberne WR, van Buren M, Dekker FW, Bos WJW, van Diepen M. Predicting mortality risk on dialysis and conservative care: development and internal validation of a prediction tool for older patients with advanced chronic kidney disease. Clinical Kidney Journal. 2021;14(1):189-96. Ivory SE, Polkinghorne KR, Khandakar Y, Kasza J, Zoungas S, Steenkamp R, et al. Predicting 6-month mortality risk of patients commencing dialysis treatment for end-stage kidney disease. Nephrology Dialysis Transplantation. 2017;32(9):1558-65. Halbesma N, Jansen DF, Heymans MW, Stolk RP, de Jong PE, Gansevoort RT, et al. Development and validation of a general population renal risk score. Clinical journal of the American Society of Nephrology. 2011;6(7):1731-8. Moore J, He X, Shabir S, Hanvesakul R, Benavente D, Cockwell P, et al. Development and evaluation of a composite risk score to predict kidney transplant failure. American journal of kidney diseases. 2011;57(5):744-51. U.S. Renal Data System. 2023 USRDS Annual Data Report: Epidemiology of Kidney Disease in the United States. National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, 2023. U.S. Renal Data System. 2023 Researcher’s Guide to the USRDS Database. National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, 2023. Potluri VS, Reddy YN, Tummalapalli SL, Peng C, Huang Q, Zhao Y, et al. Early effects of the end-stage renal disease treatment choices model on kidney transplant waitlist additions. Clinical Journal of the American Society of Nephrology. 2024:10.2215. Cai C, Zou Y, Peng Y, Zhang J. smcure: An R-Package for estimating semiparametric mixture cure models. Computer methods and programs in biomedicine. 2012;108(3):1255-60. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological). 1977;39(1):1-22. Amico M, Van Keilegom I. Cure models in survival analysis. Annual Review of Statistics and Its Application. 2018;5:311-42. Meyer N, Michael S, Moradi Rekabdarkolaee H, Varilek B, Ngorsuraches S, Brooks P. Proportional Hazards Mixture Cure Models for End-Stage Kidney Disease. [Poster]. In press 2024. Kutner MH, Nachtsheim CJ, Neter J. Applied linear regression models. 4 ed: McGraw-Hill/Irwin; 2004. Hartman N, Kim S, He K, Kalbfleisch JD. Pitfalls of the concordance index for survival outcomes. Statistics in Medicine. 2023;42(13):2179-90. Therneau TM. A Package for Survival Analysis in R. 3.5-5. ed2023. Wasserman L. All of statistics: a concise course in statistical inference: Springer; 2005. Silverman BW. Density Estimation for Statistics and Data Analysis1986. R Core Team. R: A Language and Environment for Statistical Computing. 4.3.2 ed. Vienna, Austria: R Foundation for Statistical Computing; 2023. Office of Intergovernmental and External Affairs (IEA). HHS Regional Offices 2024 [Available from: https://www.hhs.gov/about/agencies/iea/regional-offices/index.html. Indian Health Service. Great Plains Area 2024 [Available from: https://www.ihs.gov/greatplains/. Zhang Y, Shao Y. Concordance measure and discriminatory accuracy in transformation cure models. Biostatistics. 2018;19(1):14-26. Michael S, Melnykov V. An effective strategy for initializing the EM algorithm in finite mixture models. Advances in Data Analysis and Classification. 2016;10:563-83. Research Data Assistance Center. CMS Cell Size Suppression Policy 2024 [Available from: https://resdac.org/articles/cms-cell-size-suppression-policy. Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y, et al. shiny: Web Application Framework for R. 1.8.0 ed2023. Meyer N and Michael S. ESKD Shiny App 2025 [2025-04-18]. Available from: https://graphics.shinyapps.io/ESKD_Shiny_App/ Giusti S, Arrigain S, Lopez R, Pomfret E, Cervantes L, Schold JD. Evaluating Long Term Outcomes Among Hispanic Kidney Transplant Recipients. American Journal of Kidney Diseases. 2025;0(0). Mour GK, Kukla A, Jaramillo A, Ramon DS, Wadei HM, Stegall MD. Renal disease and kidney transplantation in Hispanic American persons. Kidney360. 2024;5(11):1763-70. Bellos I, Marinaki S, Samoli E, Boletis IN, Benetou V. Sociodemographic disparities in adults with kidney failure: a meta-analysis. Diseases. 2024;12(1):23. Harding JL, Pavkov M, Wang Z, Benoit S, Burrows NR, Imperatore G, et al. Long-term mortality among kidney transplant recipients with and without diabetes: a nationwide cohort study in the USA. BMJ Open Diabetes Research and Care. 2021;9(1):e001962. Crouch E, Yell N, Herbert L, Browne T, Hung P. Availability and Quality of Dialysis Care in Rural versus Urban US Counties. American Journal of Nephrology. 2024;55(3):361-8. National Institutes of Health. Healthcare Disparities 2023 [Available from: https://usrds-adr.niddk.nih.gov/2023/supplements-covid-19-disparities/14-healthcare-disparities. Wilke RA, Qamar M, Lupu RA, Gu S, Zhao J. Chronic kidney disease in agricultural communities. The American journal of medicine. 2019;132(10):e727-e32. Ben Khadda Z, Fagroud M, El Karmoudi Y, Ezrari S, Elhanafi L, Radu A-F, et al. Association between pesticide exposure and end-stage renal disease: A case-control study from Morocco based on the STROBE guidelines. Ecotoxicology and Environmental Safety. 2024;288:117360. Snow KK, Patzer RE, Patel SA, Harding JL. County-level characteristics associated with variation in ESKD mortality in the United States, 2010–2018. Kidney360. 2022;3(5):891-9. Fink JC, Burdick RA, Kurth SJ, Blahut SA, Armistead NC, Turner MS, et al. Significance of serum creatinine values in new end-stage renal disease patients. American journal of kidney diseases. 1999;34(4):694-701. Algorithm 1 Algorithm 1 is available in the Supplementary Files section. Additional Declarations No competing interests reported. Supplementary Files SupplementalMaterials.docx Algorithm1.pdf Algorithm 1. Algorithm for the development of survival scores. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6523746","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":459938644,"identity":"df337b4d-4d82-4024-96ae-0b29e349352e","order_by":0,"name":"Nathan Meyer","email":"","orcid":"","institution":"South Dakota State University","correspondingAuthor":false,"prefix":"","firstName":"Nathan","middleName":"","lastName":"Meyer","suffix":""},{"id":459938645,"identity":"0764e109-9115-40c8-afbf-571d1d7349c1","order_by":1,"name":"Hossein Moradi Rekabdarkolaee","email":"","orcid":"","institution":"South Dakota State University","correspondingAuthor":false,"prefix":"","firstName":"Hossein","middleName":"Moradi","lastName":"Rekabdarkolaee","suffix":""},{"id":459938646,"identity":"4394e782-8d9c-4973-adfa-829b0513a8ec","order_by":2,"name":"Brandon M. Varilek","email":"","orcid":"","institution":"University of Nebraska Medical Center","correspondingAuthor":false,"prefix":"","firstName":"Brandon","middleName":"M.","lastName":"Varilek","suffix":""},{"id":459938647,"identity":"3f7fa324-8622-4890-affe-977bfadb75f9","order_by":3,"name":"Surachat Ngorsuraches","email":"","orcid":"","institution":"Auburn University","correspondingAuthor":false,"prefix":"","firstName":"Surachat","middleName":"","lastName":"Ngorsuraches","suffix":""},{"id":459938648,"identity":"759dd11f-8b08-41be-9175-2f5377a3f7ae","order_by":4,"name":"Patti Brooks","email":"","orcid":"","institution":"Dakota State University","correspondingAuthor":false,"prefix":"","firstName":"Patti","middleName":"","lastName":"Brooks","suffix":""},{"id":459938649,"identity":"11314caa-30a2-4a22-9990-83794c67c1c0","order_by":5,"name":"Jerry Schrier","email":"","orcid":"","institution":"Avera McKennan Hospital \u0026 University Health Center, Sioux Falls","correspondingAuthor":false,"prefix":"","firstName":"Jerry","middleName":"","lastName":"Schrier","suffix":""},{"id":459938652,"identity":"7ef7afdf-bcd9-4ee5-84e2-e771006cee12","order_by":6,"name":"Semhar Michael","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABH0lEQVRIie2QMUvDQBiGPwm0y5n5QsT+ha8EgkKwf6Uh0MmI4JJBaKZ0MDjbyb/QKS4OdxR0Scx60MGC4KQQcKlYxCunWw4yOtyzfHD3Pdy9L4DB8A8ZpNDbzTADIsd5AMD+7li7gkwpF0rBidy0frd1Cihlmipl2UHpX03Wn/cQX/cr3jRYj+y65h9NAoe2GLdnyau7Yf4KcUbOIucGV+FcRBZlJXiORgERF5QwqVCCLsHVGIUFlGcQLnTK83vhbJlsjBLva4tPI6yX1oZ/w1SriP3CJUrxXVng3oJFPcpTkM+1K1jGhXvAaJiRU/84x2iXxT8qH+hwXq7b48+qwnljQXg7Kz2xSU5kY/xFJJfBwH7UfExBO5wYDAaDoTs/knVpperd3mgAAAAASUVORK5CYII=","orcid":"","institution":"South Dakota State University","correspondingAuthor":true,"prefix":"","firstName":"Semhar","middleName":"","lastName":"Michael","suffix":""}],"badges":[],"createdAt":"2025-04-24 21:38:11","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6523746/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6523746/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":83514159,"identity":"73778359-bba2-44fc-af8d-19c22192f5c3","added_by":"auto","created_at":"2025-05-27 17:51:33","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":139887,"visible":true,"origin":"","legend":"\u003cp\u003eKaplan-Meier survival plot of the USRDS dataset along with 95% confidence intervals given as black dashed-lines. Each labeled point represents one of the seven survival scores.\u003c/p\u003e","description":"","filename":"fig1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6523746/v1/c529b78b1ff7fe053c768634.jpg"},{"id":83513838,"identity":"048d98de-c5dc-460d-b568-1bdf29dbd6a7","added_by":"auto","created_at":"2025-05-27 17:43:33","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":224872,"visible":true,"origin":"","legend":"\u003cp\u003eConcordance measures for either the entire-based model (black circles / white box plots) or the region-based models (red triangles / box plots). Subfigure (A) contains infinitesimal jackknife CIs; (B) displays a blue “X” to indicate the sample size of each region; (C) contains MCCV samples; (D) contains MCCV constructed CIs for the original concordance estimates with an “X” indicating the mean of the cross-validation sample.\u003c/p\u003e","description":"","filename":"fig2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6523746/v1/d46567fe61f5865950da069e.jpg"},{"id":83514162,"identity":"cdd4cc33-7c7c-466c-a2cf-26683dd24529","added_by":"auto","created_at":"2025-05-27 17:51:33","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":308627,"visible":true,"origin":"","legend":"\u003cp\u003eSurvival scores two through seven calculated using the univariate mode (with a log transformation applied). Counties with gray borders indicate imputed values for suppressed data.\u003c/p\u003e","description":"","filename":"fig3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6523746/v1/51546cf6f796d41be490ea79.jpg"},{"id":83513827,"identity":"a25ede2e-601a-4ebe-97a3-7f7c2b4f1d2f","added_by":"auto","created_at":"2025-05-27 17:43:33","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":121717,"visible":true,"origin":"","legend":"\u003cp\u003eThe mode of (A) race, (B) Hispanic status, and (C) primary disease across counties. Note that AI/AN is American Indian / Alaska Native, NH/PI is Native Hawaiian / Pacific Islander, and G/CKD is Glomerulonephritis / Cystic kidney disease. Counties with gray borders indicate imputed values for suppressed data.\u003c/p\u003e","description":"","filename":"fig4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6523746/v1/79a0495a93cfc5434a97b8ec.jpg"},{"id":83513829,"identity":"58f136e1-e6e0-44ad-b10a-17daea6619d6","added_by":"auto","created_at":"2025-05-27 17:43:33","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":100873,"visible":true,"origin":"","legend":"\u003cp\u003eA comparison of the geographical results from the two parts of the MCM. Subfigure (A) uses only the latency portion of the MCM and displays survival score one whereas (B) displays the incidence with a logit transformation (the cure proportion is shown). Counties with gray borders indicate imputed values for suppressed data.\u003c/p\u003e","description":"","filename":"fig5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6523746/v1/0b1a9b1ea9364a31cad95260.jpg"},{"id":83513835,"identity":"1eb5dfb3-e695-42d9-99b7-cb5d418e70d8","added_by":"auto","created_at":"2025-05-27 17:43:33","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":274305,"visible":true,"origin":"","legend":"\u003cp\u003eSurvival scores two through four (with a logit transformation applied) and five through seven (with a log transformation applied) calculated using the first multiple-mode. Counties with gray borders indicate imputed values for suppressed data.\u003c/p\u003e","description":"","filename":"fig6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6523746/v1/f82a08873a1deb19050ffbbb.jpg"},{"id":83513832,"identity":"3a96321d-89bf-431d-9e6e-44b7cd049456","added_by":"auto","created_at":"2025-05-27 17:43:33","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":309445,"visible":true,"origin":"","legend":"\u003cp\u003eSurvival scores two through four (with a logit transformation applied) and five through seven (with a log transformation applied) calculated using the second multiple-mode. Counties with gray borders indicate imputed values for suppressed data.\u003c/p\u003e","description":"","filename":"fig7.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6523746/v1/8ad5a052641522010387de98.jpg"},{"id":83513831,"identity":"feb4504c-69f8-464a-be6e-f732fb91354c","added_by":"auto","created_at":"2025-05-27 17:43:33","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":80388,"visible":true,"origin":"","legend":"\u003cp\u003eTransplant status for individuals within the (A) first mode and (B) second mode for survival score six. Counties with gray borders indicate imputed values for suppressed data.\u003c/p\u003e","description":"","filename":"fig8.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6523746/v1/cabdf463b5f3fad482cd99b8.jpg"},{"id":106972322,"identity":"1f04ef45-f047-4b64-9e0c-57e4d405e5e3","added_by":"auto","created_at":"2026-04-15 10:22:58","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2249370,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6523746/v1/19b48c79-1386-47c3-abed-46729e3c0a5e.pdf"},{"id":83514160,"identity":"9d0322e0-36b9-4d98-9502-2f295186ed83","added_by":"auto","created_at":"2025-05-27 17:51:33","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":385319,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementalMaterials.docx","url":"https://assets-eu.researchsquare.com/files/rs-6523746/v1/44faf897bef242eab63dc391.docx"},{"id":83513830,"identity":"0d1effd1-0419-4f2b-be28-496d832b1af4","added_by":"auto","created_at":"2025-05-27 17:43:33","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":112484,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAlgorithm 1.\u003c/strong\u003e Algorithm for the development of survival scores.\u003c/p\u003e","description":"","filename":"Algorithm1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6523746/v1/29aefe6e07a7b78859ce2313.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Development and Validation of Survival Scores and the Assessment of Spatial Trends in End-Stage Kidney Disease Outcomes","fulltext":[{"header":"Background","content":"\u003cp\u003eIn 2022, an average of 360 people in the United States (U.S.) initiated treatment for end-stage kidney disease (ESKD) each day [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. In the same year, the adjusted incidence rate for all-cause mortality in adult ESKD patients was approximately 146 deaths per 1,000 person-years. The U.S. Renal Data System (USRDS) 2024 Annual Data Report relays these findings and also shares the many vast differences for mortality as partitioned by modality, age, race, ethnicity, and other factors [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. For example, a study points out that American Indian / Alaska Native (AI/AN) persons disproportionately suffer from ESKD when compared to non-Hispanic White persons [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. This study also states that this disparity is especially true in remote places, such as in the AI/AN tribal lands of South Dakota, where receiving a transplant, the treatment option of choice, becomes unachievable for many. Furthermore, ESKD patients face hard therapeutic decisions [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. The consequences of their choice of medical care varies depending on the patient due to the heterogeneity seen across treatment outcomes [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Mortality risk scores for ESKD patients would aid in making treatment decisions [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. A survival score would be the complement of a mortality risk score, which comes naturally from a survival model.\u003c/p\u003e \u003cp\u003eIn ESKD, accurate prediction models for assessing patient risk are extremely important to develop and validate, as the risk scores created may improve both patient outcomes and clinical practice [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Mortality risk scores may also result in targeted population screening [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. In other words, we may stratify the population (\u003cem\u003ee.g.\u003c/em\u003e, low, medium, and high risk) such that the risk score of each person is automatically identified from patient profiles. This stratification of individual risk may also provide information on the complications for ESKD to help in prevention [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Further, using these risk scores within helpful tools may assist in prognosis, recognition of high-risk patients, and application of different therapeutics [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eExisting mortality prediction models for ESKD patients have either concerns for applicability to the clinical setting or contain a risk of bias due to various reasons, including the study population selected.\u003csup\u003e3\u003c/sup\u003e Hence, it was concluded that contemporary mortality prediction models still need to be developed and validated; external validation is essential to apply the mortality risk scores developed to a clinical setting [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. In other words, data based on a different sample of individuals should be used to test the risk scores instead of using the same data used during model development [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Mortality risk scores have been developed for ESKD patients, but have focused on older patients [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e], patients in the early stages of receiving dialysis [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], or have developed other risk scores (\u003cem\u003ei.e.\u003c/em\u003e, not mortality risk scores) that attempt to predict the risk of developing progressive chronic kidney disease [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] or kidney transplant failure [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFactors known to accelerate chronic kidney disease are also important for assessing ESKD mortality risk such as demographic variables (\u003cem\u003ee.g.\u003c/em\u003e, age, sex, ethnicity), nephrotoxins (\u003cem\u003ee.g.\u003c/em\u003e, smoking, alcohol, drugs), and medical factors (\u003cem\u003ee.g.\u003c/em\u003e, cardiovascular disease, diabetes); furthermore, a large prospective study with considerable observational time may give more accurate (or useful) risk scores [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. The survival scores we develop in this manuscript are based on a mixture cure model (MCM) developed using a large population of ESKD patients, allowing for a long follow-up time. This dataset also has many covariates, including the examples listed earlier. These survival scores are validated using not only these same variables but also variables not used when creating the survival scores. This validation step includes summarizing these survival scores geographically to better understand the underlying trends that emerge. Our primary objective is to develop and validate survival scores based on an MCM using the variables found within the USRDS dataset. In addition, these survival scores may be used to develop an easy-to-use tool that gives survival score predictions. This would enable both persons with ESKD and stakeholders, such as practitioners or policy-makers, to make data-informed decisions.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003eDataset\u003c/p\u003e\u003cp\u003eThe survival scores developed are based on a survival model fitted on a USRDS dataset [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. This dataset contains individuals with incident ESKD from 2000 through 2020 including both those who either are on dialysis or had at least one transplant. Time and event status were the outcome variables used in the survival analysis, such that the event of interest was all-cause mortality. Independent variables used within the model were age group, sex, race, Hispanic, primary disease, Liu comorbidity index, inability to ambulate, inability to transfer, needs assistance with daily activities, institutionalized, alcohol dependence, tobacco use, drug (illicit) dependence, amputation, toxic nephropathy, modality, transplant, employment, insurance, rurality, and region. The original dataset was reduced due to missing covariate values and some individuals that were over 108 years old. Furthermore, this study focuses on individuals within the 50 U.S. states and the District of Columbia. Because mortality for those on dialysis and those who receive a transplant widely differ [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], an MCM was used to capture this heterogeneity seen within the data.\u003c/p\u003e\u003cp\u003eMixture Cure Model Methodology\u003c/p\u003e\u003cp\u003eLet \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:t\$\u003c/span\u003e\u003c/span\u003e be an observed time until the event of interest or time of right-censoring, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:x\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:z\$\u003c/span\u003e\u003c/span\u003e each be a vector of covariates, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:b\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\beta\\:\$\u003c/span\u003e\u003c/span\u003e each be a vector of coefficients, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{S}_{0}\\left(t\\right)\$\u003c/span\u003e\u003c/span\u003e be the baseline uncured survival function. The MCM is then given as\u003c/p\u003e\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$\\:S\\left(t;\\varvec{x},\\varvec{z}\\right)=\\pi\\:\\left(\\varvec{z}\\right)S\\left(t;\\varvec{x}\\right)+1-\\pi\\:\\left(\\varvec{z}\\right)$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003esuch that\u003c/p\u003e\u003cdiv id=\"Equb\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equb\" name=\"EquationSource\"\u003e\n$$\\:\\pi\\:\\left(\\varvec{z}\\right)=\\frac{{e}^{{\\varvec{b}}^{\\varvec{T}}\\varvec{z}}}{1+{e}^{{\\varvec{b}}^{\\varvec{T}}\\varvec{z}}}\\:and\\:S\\left(t;\\varvec{x}\\right)={\\left({S}_{0}\\left(t\\right)\\right)}^{\\text{exp}\\left\\{{\\varvec{\\beta\\:}}^{\\varvec{T}}\\varvec{x}\\right\\}}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003erepresent the incidence (or the proportion of uncured individuals) and latency (or the uncured survival function), respectively [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. The expectation-maximization algorithm is used to find estimates for \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:b\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\beta\\:\$\u003c/span\u003e\u003c/span\u003e [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. Further, bootstrap sampling is used to find standard errors for these parameter estimates [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. An MCM was first fit to the USRDS dataset, and corresponding survival score summaries were extracted. Further details of applying an MCM to the USRDS dataset was previously presented [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eConcordance\u003c/p\u003e\u003cp\u003eThe concordance statistic was used to compare different methods of fitting the MCM. Concordance attempts to estimate the probability of a prediction moving in the same direction as the observed values for any two observations [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. The concordance statistic is useful in showing how well a given model may correctly order individual survival levels [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. Let \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{y}_{i}\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{y}_{j}\$\u003c/span\u003e\u003c/span\u003e be two observations with corresponding predictions \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\widehat{y}}_{i}\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\widehat{y}}_{j}\$\u003c/span\u003e\u003c/span\u003e for \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:i,j\\in\\:\\:\\{1,\\dots\\:,n\\}\$\u003c/span\u003e\u003c/span\u003e. Then, the concordance statistic estimates \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:P\\:\\left({\\widehat{\\text{y}}}_{i}\\:\u0026gt;\\:{\\widehat{\\text{y}}}_{j}|{y}_{i}\\:\u0026gt;\\:{y}_{j}\\right)\$\u003c/span\u003e\u003c/span\u003e. In other words, it estimates the probability of a correct prediction given the actual values. Then,\u003c/p\u003e\u003cdiv id=\"Equc\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equc\" name=\"EquationSource\"\u003e\n$$\\:C=\\frac{c+{t}_{\\widehat{y}}/2}{c+d+{t}_{\\widehat{y}}}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003eis one possible estimate such that \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:c\$\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:d\$\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{t}_{\\widehat{y}}\$\u003c/span\u003e\u003c/span\u003e represent the concordant, discordant, and tied observations within the predicted values, respectively [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. A higher concordance value is thus considered better. A concordance value of 0.5 indicates the model may be thought of as being just as useful as flipping a coin when ordering individual risk levels. Lastly, a concordance value less than 0.5 would mean that the model is actively ordering individual risk levels in an incorrect manner (the model is worse than just flipping a coin when comparing individuals).\u003c/p\u003e\u003cp\u003eIn a survival analysis setting, censoring must also be considered. The observed values, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{y}_{i}\$\u003c/span\u003e\u003c/span\u003e’s, would be pairs of survival times and status indicators. Hence, censored times must be able to be compared against other times. For example, a censored time of two years would be considered larger than one year, whereas an observed time of three years is not necessarily greater than the censored time of two. Nor is the censored time of two years necessarily equal to another censored time of two. This is due to the assumption that censoring indicates to us only that the given individual has survived greater than the recorded time. Furthermore, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\widehat{y}}_{i}\$\u003c/span\u003e\u003c/span\u003e’s would represent risks that are calculated by considering linear predictors. In other words, the multiplication of the vector of estimates by the observed covariate values. This means a concordant pair would then be represented by the observed value and predicted risk moving in opposite directions, whereas a discordant pair is indicated by the movement of the two in the same direction. For instance, an individual that has an observed time of two years and a risk of four and another person with an observed time of three years and a risk of one would be considered a concordant pair.\u003c/p\u003e\u003cp\u003eThe infinitesimal jackknife method is used to find standard error estimates for each concordance estimate [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. The ordinary jackknife method assumes \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\widehat{{\\theta\\:}}\$\u003c/span\u003e\u003c/span\u003e is an estimate for some parameter to be estimated, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\theta\\:}\$\u003c/span\u003e\u003c/span\u003e. Further, let \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:{\\widehat{{\\theta\\:}}}_{\\left(i\\right)}\$\u003c/span\u003e\u003c/span\u003e be the estimate when the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:i\$\u003c/span\u003e\u003c/span\u003e\u003csup\u003eth\u003c/sup\u003e observation is removed for \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:i\\in\\:\\{1,\\:.\\:.\\:.\\:,\\:n\\}\$\u003c/span\u003e\u003c/span\u003e [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. Then,\u003c/p\u003e\u003cdiv id=\"Equd\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equd\" name=\"EquationSource\"\u003e\n$$\\:\\text{var}\\left(\\widehat{{\\theta\\:}}\\right)=\\frac{n-1}{n}{\\sum\\:}_{i=1}^{n}{\\left({\\widehat{{\\theta\\:}}}_{\\left(i\\right)}-\\stackrel{-}{{\\theta\\:}}\\right)}^{2}$$\u003c/div\u003e\u003c/div\u003e\u003cp\u003esuch that \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\stackrel{-}{{\\theta\\:}}=\\frac{1}{n}{\\sum\\:}_{i=1}^{n}{\\widehat{{\\theta\\:}}}_{\\left(i\\right)}\$\u003c/span\u003e\u003c/span\u003e is the sample mean. In context, the true concordance would be represented by \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\$\\:\\theta\\:\$\u003c/span\u003e\u003c/span\u003e. Note that this method does not require refitting the model using a new dataset each time; on the other hand, only the estimate for concordance is found each time based on the results of the model fit once. The infinitesimal jackknife method instead involves assigning weights to each observation. Hence, the ordinary jackknife method is equivalent to assigning a weight of one to all observations except the observation that is being left out is assigned a weight of zero. Instead, the infinitesimal jackknife method assigns weights close to zero to “leave out” an observation. This method is not discussed further. Instead, it is pointed out that the results for this method are extremely similar to the ordinary jackknife method for “moderate to large data” to which the USRDS dataset would fall under this category [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eAnother method of estimating the standard error is to fit each model multiple times using Monte Carlo cross-validation (MCCV). MCCV consists of partitioning the data into training and testing sets at random, calculating the desired statistic, and repeating these two steps multiple times. More specifically, each model is refit using the training set, and the estimated concordance of the testing set is found. Furthermore, partitioning the data randomly each time means that each partition is independent of all other partitions. In context, the concordance estimate is the desired statistic. Since we have survival data, each partition must have an adequate number of individuals that were not censored. Hence, each partition is created so that the event status remains proportional to the original dataset every time. The estimated standard error of the concordance is then equal to the standard deviation of each cross-validation sample divided by the square root of the sample size.\u003c/p\u003e\u003cp\u003eDevelopment of Survival Scores\u003c/p\u003e\u003cp\u003eThe MCM is used to extract several survival scores at individual and geographic levels. Algorithm 1 shows an algorithm for the development of survival scores. After selecting the method for fitting the model, survival scores may be developed using both the USRDS dataset and the MCM results based on the chosen method. As indicated by the algorithm, these represent the input, whereas the output is the set of seven survival scores across counties. These seven survival scores mentioned are related to the survivability of an individual and are listed as follows:\u003c/p\u003e\u003cp\u003eScore 1 - probability of surviving beyond 2 years\u003c/p\u003e\u003cp\u003eScore 2 - probability of surviving beyond 5 years\u003c/p\u003e\u003cp\u003eScore 3 - probability of surviving beyond 10 years\u003c/p\u003e\u003cp\u003eScore 4 - probability of surviving beyond 15 years\u003c/p\u003e\u003cp\u003eScore 5 - time at 25% survival probability\u003c/p\u003e\u003cp\u003eScore 6 - time at 50% survival probability\u003c/p\u003e\u003cp\u003eScore 7 - time at 75% survival probability.\u003c/p\u003e\u003cp\u003eThe last three survival scores represent the quartiles. The time variable within the dataset ranges from zero through 21 years with most events happening early on; hence, we used two, five, ten, and 15 years for the first four survival scores. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e displays a Kaplan-Meier survival curve along with the seven survival scores indicated as points on the line. This figure indicates that the survival scores adequately describe the entire survival curve. For example, the first four survival scores are somewhat equally spaced across the curve. The confidence intervals displayed are tight mostly due to the size of the dataset. Finally, the dataset must also be summarized by county when mapping these seven survival scores across the U.S. Three different possibilities of summarizing the data across counties is explored.\u003c/p\u003e\u003cp\u003eThe first option is the univariate mode, which is the most intuitive and straightforward. The USRDS data is summarized across counties by finding the mode of each covariate used within the model. The seven different survival scores are then calculated using this summarized data and the MCM results. More specifically, the results of an MCM provide estimates for the coefficients involved within the model. This allows the survival function to be estimated using the values found for each of the covariates (\u003cem\u003ei.e.\u003c/em\u003e, the mode of each county). This survival function then indicates the survival probability at any time. Note that survival probability represents the probability of surviving past a given time. Each of the seven survival scores may then be computed from this information giving the desired output of the algorithm.\u003c/p\u003e\u003cp\u003eThe second case is the multivariate mode, which also considers finding a mode; however, the USRDS data is now summarized across counties by finding a simultaneous mode of multiple covariates. In particular, the most important predictors according to the MCM are used as the set of multiple covariates to find the multivariate mode of. To find the most important predictors, the magnitude of the z-values for each coefficient summed across covariate level (as there are two coefficients per covariate level in the MCM) is considered. A large z-value indicates the covariate has greater importance. An estimated cumulative distribution function may be considered to find a clearer cut-off for which predictors are most important. After this multivariate mode is calculated, the univariate mode of each of the remaining covariates is found as this information is needed to complete the calculation of the seven survival scores.\u003c/p\u003e\u003cp\u003eThe third case given, the multiple mode, considers calculating the survival scores prior to summarizing the data. This contrasts with the first two cases, where the data was summarized first. Hence, the mode of the survival scores is found rather than of the covariates. Note that the mode is used, as opposed to the mean, due to the bimodal and highly right-skewed nature of the distribution of survival scores. In other words, the mean, unlike the mode, may not accurately represent this often highly skewed data. Since the survival scores are continuous, the estimated density was used to find the multiple mode. The word ‘multiple’ in the name refers to finding both the first and second mode (or the first and second peak of the distribution of survival scores). The second mode was considered as it may capture information about a different group of individuals. This gives two different sets of seven survival scores for this third case.\u003c/p\u003e\u003cp\u003eAs indicated by Algorithm 1 for the third case, each of the seven survival scores are first calculated for all individuals within the USRDS dataset. Then, a kernel density estimation of each survival score across counties is found using the survival scores calculated. The default settings of the ‘density’ function in R was used which included using a normal kernel with the bandwidth following Silverman’s rule of thumb [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. Note that any of the methods for the development of survival scores could also be applied to the data at a different spatial level, such as zip codes. R Statistical Software version 4.3.2. was used for all statistical analyses [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e].\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eThis study included 2,228,693 people from the original 2,429,942 who had incident ESKD from 2000 through 2020 and met the inclusion criteria. The most common race was White among 85.1% of counties, followed by 12.9% for Black individuals, 1.7% for AIs/ANs, 0.2% for Asian individuals, and less than 0.1% for both Native Hawaiians / Pacific Islanders and the category of other races. In 96.5% of counties, the most common ethnicity was non-Hispanic leaving 3.5% for Hispanic individuals. Lastly, the most common primary disease was diabetes among 89.8% of counties, followed by 7.8% for hypertension, 1.8% for the category of other diseases, and 0.6% for glomerulonephritis / cystic kidney disease. The results section of this paper is structured as follows. First, model comparison is conducted using both an entire-based model and region-based models. This is followed by the determination of the most important predictors. Then, the application of the survival scores using maps of the United States is presented. Finally, model deployment is briefly discussed.\u003c/p\u003e \u003cp\u003eModel Comparison\u003c/p\u003e \u003cp\u003ePrior to the application of survival scores, model comparison is conducted. We first considered the appropriateness of using one MCM versus multiple MCMs partitioned by geographical region. This would not only create models that better fit the data but also more accurate survival scores. The U.S. Department of Health and Human Services (HHS) partition the U.S. into ten different regions such that each state is assigned to only one region [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. The concordance index was used to compare either fitting the MCM using the entire dataset versus fitting different MCMs to the ten regions separately. The Indian Health Service (IHS) Great Plains region that consists of North Dakota, South Dakota, Nebraska, and Iowa was considered in addition to HHS regions [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. One difference with the full model is that the \u0026ldquo;region\u0026rdquo; covariate (which partitioned the U.S. into four regions) was not used when constructing MCMs partitioned by region. Finding the appropriate estimate of the concordance statistic for the MCM itself is not immediately intuitive [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e], and the MCM is computationally expensive to fit due to the use of the expectation-maximization algorithm [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. Hence, Cox regression models (a simplified survival analysis) were used instead to complete model comparison.\u003c/p\u003e \u003cp\u003eThe estimated concordance for each region and the corresponding 95% normal-based confidence intervals (CIs) using the infinitesimal jackknife method are shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB then shows the same information but with sample size in relation to the full dataset superimposed. Further, the concordance is shown to be the same for the \u0026lsquo;entire\u0026rsquo; region since it is calculated using the same model either way. When performing MCCV, the training set consists of 80% of the data, whereas the testing set will be the remaining 20%. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC shows the results of 100 samples using MCCV. Further, Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eD depicts how these MCCV samples are used to develop 95% normal-based CIs for the original concordance estimates.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eBesides region three, the concordance estimates for the region-based models are consistently greater than the corresponding values for the entire-based model. This indicates that using the region-based models improves the ordering of individual survival levels and is thus a more accurate method. This conclusion about the Cox regression models will be applied to the MCMs. While the jackknife CIs in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA overlap with one another when comparing the region- and entire-based models, some MCCV CIs in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eD do not and are much tighter. On the other hand, the CIs tend not to overlap across regions in both Figs.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA and \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eD. These differing concordance values across regions may be explained by the number of observations within each region. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB indicates that sample size differences among regions seem to be associated with concordance. Overall, the concordance indicates that the region-based models are slightly more accurate; thus, the results of the region-based MCMs were used in the development of the survival scores as opposed to the entire-based MCM results.\u003c/p\u003e \u003cp\u003eMost Important Predictors\u003c/p\u003e \u003cp\u003eUsing the previously described procedure for finding the most important predictors according to the MCM, the top 15 covariate factors with the highest importance values are listed:\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1. Transplant - At least one transplant\u003c/p\u003e \u003cp\u003e2. Liu comorbidity index\u003c/p\u003e \u003cp\u003e3. Race - Black\u003c/p\u003e \u003cp\u003e4. Hispanic - Yes\u003c/p\u003e \u003cp\u003e5. Primary disease - Glomerulonephritis or Cystic kidney disease\u003c/p\u003e \u003cp\u003e6. Primary disease - Other\u003c/p\u003e \u003cp\u003e7. Race - Asia\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8. Institutionalized - Nursing home\u003c/p\u003e \u003cp\u003e9. Age group \u0026minus;\u0026thinsp;80 and older\u003c/p\u003e \u003cp\u003e10. Primary disease - Hypertension\u003c/p\u003e \u003cp\u003e11. Employment - Employed\u003c/p\u003e \u003cp\u003e12. Age group \u0026minus;\u0026thinsp;70 to 79\u003c/p\u003e \u003cp\u003e13. Insurance - Medicare and Medicaid\u003c/p\u003e \u003cp\u003e14. Race - Native Hawaiian / Pacific Islander\u003c/p\u003e \u003cp\u003e15. Region - South\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eWhen fitting the MCM, indicator variables must be created for categorical variables. Thus, multiple z-values are found for some covariates (one z-value per indicator variable). Despite this limitation, the top five items with the highest value contained different covariates.\u003c/p\u003e \u003cp\u003eAs expected, transplant status was the top variable listed. These top variables were used in further analyses to help indicate what covariates possibly influenced the survival scores. Furthermore, they were used as the subset of variables when considering the multivariate mode for summarizing the data across counties. However, finding the simultaneous mode of the top five covariates would be troublesome as no mode would almost always be found due to the inclusion of a continuous variable \u0026ndash; the Liu comorbidity index. Thus, the Liu comorbidity index was not used when finding the simultaneous mode for the multivariate mode option. Further justification for using the top five predictor variables in further analyses is given in the supplemental materials.\u003c/p\u003e \u003cp\u003eSurvival Score Trends Across the United States\u003c/p\u003e \u003cp\u003eWith the region-based MCMs and most important predictors selected, the survival scores may be mapped across the U.S. and validated. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows several maps of the U.S. when using the region-based MCMs to find each survival score across counties. In these maps, a larger survival score corresponds to a greater survivability. Also, the few counties that are colored white contain no persons within the USRDS dataset. Lastly, according to Centers for Medicare \u0026amp; Medicaid Services reporting rules, values representing one to ten individuals may not be reported or derived from reported work [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. Thus, neighboring counties were considered to perform imputation when summarizing the dataset. More specifically, a nearest neighbors algorithm was used to perform the imputation where a mode or average, whichever is most appropriate, was found. The number of neighbors each county had varied; further, all neighbors for each county were used to complete the imputation process. Counties with imputed information have a gray border within the maps of the survival scores. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows similar trends for each survival score. For example, the Appalachian regions of the U.S. appear to have a lower survivability com- pared to surrounding areas. This is a faint trend seen according to any survival score, but it is more prominently shown within the map of the sixth survival score. A stronger conclusion is reached when considering the lower survivability found for individuals living in the Great Plains region of the U.S. (the area east of the Rocky Mountains). Alternatively, those living in the Southeast, around the Texas-Mexico border, and Southern California tend to have higher survivability.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eRecall the top five most important predictors \u0026ndash; transplant, Liu comorbidity index, race, Hispanic, and primary disease. These variables may help internally explain the survival scores shown across counties by investigating the trends within these variables across regions. Figures\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA through \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eC display the most frequent race, Hispanic status, and primary disease across counties, respectively. These three variables appear to highly correlate with the trends seen within the resulting survival score maps from Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. For example, the characteristics of race and primary disease appear to be the main motivating factors for the trend seen in the Southeastern region of the U.S. More specifically, Black individuals and those with hypertension appear most frequently within the Southeast. This would indicate that these individuals have a higher survivability compared to White persons or those with diabetes when analyzing Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e and \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA. Further, the trend shown around the Texas-Mexico border seems to be greatly driven by Hispanic status. In other words, Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e and \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB show that Hispanic individuals appear to have a higher survivability based on the survival scores developed. Lastly, there appears to be a connection between locations in Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eC where AIs/ANs appear most frequent (\u003cem\u003ei.e.\u003c/em\u003e, AI/AN tribes) and places in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e that indicate higher survivability. The other two most important variables previously mentioned were the Liu comorbidity index and transplant status. These maps showed nearly the same value each (\u003cem\u003ei.e.\u003c/em\u003e, zero for the index and a mode of no transplant) across the entire U.S.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo further explain the trends seen, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e visualizes the MCM split between its two parts. More specifically, the MCM has a latency portion that gives the survival probabilities for the individuals that are considered uncured. The MCM also has an incidence portion that gives the probability of being uncured for any individual. Note that the latency of the MCM relates to the survival function and thus may be summarized as discussed using the seven survival scores. Figure\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA displays only the first survival score. The map of the survival score for only the latency portion in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA appears extremely similar to the maps of the survival scores given in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. This implies that the survival scores of the MCM favors the latency portion. On the other hand, the map of the incidence in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB shows a reduction in the number of individuals who have a higher probability of being cured within the aforementioned regions (\u003cem\u003ee.g.\u003c/em\u003e, the Southeast). Furthermore, Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows the coefficient estimates from the entire-based MCM for the selected covariates of race, Hispanic, and primary disease. More specifically, the table shows the hazard and odds ratios which represent the latency and incidence portion of the model, respectively. The coefficient estimates given in this table further affirm the survival score trends discussed. See the supplemental materials for an extended version of Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e that includes all covariates. Overall, analyzing the profile maps of some variables used within the MCM and separating the two portions of the MCM appear to help validate the survival scores calculated using the univariate mode.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eEntire-based MCM coefficient estimates and standard errors of select variables (all results reported here were significant at a 0.05 level).\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eCharacteristic\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLatency\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIncidence\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEstimate (s.e.)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEstimate (s.e.)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eRace\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWhite\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(ref)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBlack\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.340 (0.002)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.302 (0.016)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAsian\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.399 (0.006)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-0.677 (0.030)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAI/AN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.234 (0.008)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.467 (0.061)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNH/PI\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.379 (0.010)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-0.315 (0.065)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eHispanic\u003c/b\u003e, yes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.344 (0.003)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-0.530 (0.018)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003ePrimary disease\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiabetes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e(ref)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.022 (0.002)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-1.299 (0.026)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eG/CKD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.205 (0.004)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-1.309 (0.023)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c3\" namest=\"c1\"\u003e \u003cp\u003eAbbreviations: s.e., standard error; AI/AN, American Indian / Alaska Native; NH/PI, Native Hawaiian / Pacific Islander; G/CKD, Glomerulonephritis / Cystic kidney disease\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe second two cases for summarizing survival scores shown in Algorithm 1 may show further trends. As mentioned, the multivariate mode concerns finding a simultaneous mode of transplant, race, Hispanic, and primary disease across counties. A reduction in the number of variables for the multivariate mode is applied if no mode is found. For example, a multivariate mode of transplant, race, and Hispanic status (size three) is considered if a multivariate mode of size four (including primary disease) is not found. Note that the variables are removed in order of the most important variable list. When comparing the univariate and multivariate mode for each survival score across counties, high correlations range from about 80\u0026ndash;96% across survival scores. In other words, the multivariate mode results in similar survival scores as the univariate mode. Most of the counties contained a multivariate mode of length four (the maximum length). A map of each survival score computed using the multivariate mode shows similar trends to each map of the univariate mode given in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eThe multiple-mode method of summarizing survival scores includes finding either the first or second multiple-mode. The first multiple-mode is moderately to highly correlated with the univariate mode as correlations range from 65\u0026ndash;72% across survival scores. As expected, a map of survival scores two through seven calculated using the first multiple-mode shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e gives similar trends to each map of the univariate mode shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. On the other hand, the second multiple-mode is uncorrelated with the univariate mode as correlations range from \u0026minus;\u0026thinsp;5\u0026ndash;28% across survival scores. This lack of correlation is intuitive as the second multiple-mode focuses on a different group of individuals. As seen in Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e, a map of the survival scores two through seven calculated using the second multiple-mode shows additional counties that are missing due to a second mode not able to be calculated from the density. Furthermore, Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e shows that survival scores five and six are mainly comprised of higher survival scores. Since the second mode often represents those that have a higher survivability, survival score maps five and six indicate that many of these individuals do not reach a 25% or 50% survival probability within the approximate 21.5 years of follow-up time. In other words, the largest time of about 21.5 is the closest time value to these survival probabilities. The seventh survival score map does indicate that more of these individuals do reach a 75% survival probability. Lastly, the second multiple-mode displayed in Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e does not show the same trends as the univariate mode pictured in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e matching the correlation results aforementioned.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe second mode will often be greater than the first mode due to the right-skewness of the data. This implies that the second mode should capture individuals who have a higher survivability. To help explain the differing trends within these two modes, we now consider the characteristics of those within the first mode group versus the second mode group. To find the characteristics, an individual is found with a similar survival score as reported for each county and for each mode. The characteristics of these individuals are then mapped across county. For example, Figs.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eA and \u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e8\u003c/span\u003eB indicate that transplant status is the main motivating factor for the difference seen between the first and second mode. In other words, a person that aligns with the first mode is most likely without a transplant, whereas the second mode is mainly composed of individuals with a transplant.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFurther clinical variables from the USRDS dataset may be used to further validate the survival scores found. The clinical variables include height; weight; body mass index (BMI); lipid profile TC, TG, HDL, and LDL; serum creatinine; blood urea nitrogen (BUN); Hemoglobin; and, Hemoglobin A1C. Outliers of these clinical variables were removed prior to further analyses. Each variable was then compared to each of the seven survival scores at the individual, zip code, and county levels (with both the mean and median of each clinical variable being considered individually when summarizing geographically). The correlation between each was calculated. From all clinical variables available in the dataset, serum creatinine had the highest correlation for all seven survival scores. The remaining variables had little to no correlation at the individual and zip code levels. More specifically, serum creatinine compared to the various survival scores across counties showed a moderately positive correlation ranging between 45\u0026ndash;50% (\u003cem\u003ei.e.\u003c/em\u003e, survivability increases as serum creatinine increases). See the supplemental materials for more details.\u003c/p\u003e \u003cp\u003eModel Deployment with a User Interface\u003c/p\u003e \u003cp\u003eA shiny application was developed to provide an easy-to-use user interface for interacting with not only the USRDS dataset but also the results of the MCM [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. This application contained three parts. First, interactive barplots that aid in describing the dataset were provided, which include the covariates used within the MCM. Different types of barplots may be explored including ones partitioned by transplant status. Second, individual survival curves, partitioned by transplant status, were given based on the selection of variables that were used to fit the MCM. Interpretations of these survival plots were also provided including the predicted probability of being cured for a person with the specific covariates selected. The survival scores discussed in this paper were also interpreted and shown on these survival curves. Lastly, selected survival score maps were presented and made interactive. More specifically, the univariate mode calculated across counties was used. This shiny application may be accessed online [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e].\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn this work, an MCM is fitted to a USRDS dataset to develop and validate survival scores for ESKD patients across the U.S. The survival scores were calculated from region-based MCMs as these models provided more accurate results according to the concordance statistic. Seven different survival scores were developed using three different methods of summarizing the data geographically. The simpler approach of the univariate mode appeared to be sufficient at summarizing the data. Survival scores varied according to individual characteristics, allowing interesting trends to emerge when the survival scores were mapped across the U.S. The most important predictors from the MCM aided in the exploration and validation of the survival scores. The top five most important predictors were transplant, Liu comorbidity index, race, Hispanic, and primary disease. These variables were used to explain some of the trends seen within the survival scores.\u003c/p\u003e \u003cp\u003eThe Southeast section of the U.S. indicates higher survivability, which appears to follow the same region where there is a significant proportion of both Black individuals and persons with hypertension. This is intuitive since the model showed lower hazard for black individuals in the latency portion. Further, the model showed both lower hazard and odds of being uncured for persons with hypertension. Similarly, the Texas-Mexico border indicates higher survivability, which appears to follow the same area where there is a significant proportion of Hispanic individuals. Again, this is intuitive as the model showed both lower hazard and odds of being uncured for Hispanic individuals. These relationships related to race and ethnicity are further supported within literature as both Hispanic and Black individuals are often reported to have a lower risk of mortality, which this is usually attributed to their quick progression of kidney disease [\u003cspan additionalcitationids=\"CR37 CR38\" citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eLower survivability for individuals living in the Great Plains region of the U.S. may be attributed to both the presence of many agricultural communities and rural areas. For example, rural residents face significant hardships in receiving ESKD treatment, such as dialysis care, resulting in higher mortality rates [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. Furthermore, individuals have a higher risk of developing chronic kidney disease when living in agricultural communities throughout the U.S. due to many factors, including geo-environmental and argo-environmental influences [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. Another study shares the connection between the risk of ESKD and pesticide exposure [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. Furthermore, we found a lower survivability in ESKD patients around the Appalachian regions which aligns with the results of a recent study using a USRDS dataset to explore ESKD mortality [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. This same study also found lower age-standardized mortality rates near Southern California, matching our results about higher survivability. Additionally, this research found that lower ESKD survivability among counties is significantly associated with a lower percentage of Black residents. This result corresponds to the race variable\u0026rsquo;s heavy impact on the Southeastern region.\u003c/p\u003e \u003cp\u003eThe variables used within the model explain the survival scores developed. In addition, the clinical variable serum creatinine, which was not included in the model development, showed correlation with the survival scores. Hence, this provides some outside model validation. We found that serum creatinine had a linearly positive relationship with the survival scores developed, which was moderately strong when summarized across counties. Counterintuitively, this relationship indicates that a higher serum creatinine level indicates a lower mortality. This relationship has been found in a previous study that conducted a retrospective study of incident ESKD patients living in Maryland and Virginia [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e]. This article provided a few explanations for why this may have happened. For example, serum creatinine level as an indication of overall health may supersede the application of this measure to specifically kidney function. More specifically, muscle mass and nutritional status, for instance, may confound the relationship between ESKD and serum creatinine.\u003c/p\u003e"},{"header":"Conclusions","content":"\u003cp\u003eThe main limitation of this study is that external validation is required for the developed survival scores to be generalized and used within the clinical setting [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. This study did make use of variables outside the MCM covariates to validate the survival scores; however, these outside variables originated from the same USRDS dataset used to fit the MCM. Thus, different observations are required to externally validate the survival scores developed. Overall, this study developed and internally validated survival scores based on MCMs for patients with ESKD using a large USRDS dataset. The spatial trends shown may enable future stakeholders to make data-informed decisions.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cdiv class=\"DefinitionList\"\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eAI/AN\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eAmerican Indian / Alaska Native\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eBMI\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eBody mass index\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eBUN\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eBlood urea nitrogen\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eCI\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eConfidence interval\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eESKD\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eEnd-stage kidney disease\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eG/CKD\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eGlomerulonephritis / Cystic kidney disease\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eHHS\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eHealth and Human Services\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eIHS\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eIndian Health Service\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eMCCV\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eMonte Carlo cross-validation\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eMCM\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eMixture cure model\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eNH/PI\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eNative Hawaiian / Pacific Islander\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003es.e.\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003estandard error\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eU.S.\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eUnited States\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eUSRDS\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eUnited States Renal Data System.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003ch3\u003eEthics approval and consent to participate\u003c/h3\u003e\n\u003cp\u003eThe authors confirm that all methods were carried out in accordance with relevant guidelines and regulations.\u003c/p\u003e\n\u003ch3\u003eConsent for publication\u003c/h3\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003ch3\u003eAvailability of data and materials\u003c/h3\u003e\n\u003cp\u003eAccess to USRDS data is limited to researchers and institutions with approved Data Use Agreements and will not be released. The relevant code used to generate the results presented within this paper will be posted to GitHub.\u003c/p\u003e\n\u003ch3\u003eCompeting interests\u003c/h3\u003e\n\u003cp\u003eThe authors declare that they have no competing interests.\u003c/p\u003e\n\u003ch3\u003eFunding\u003c/h3\u003e\n\u003cp\u003eThis research was, in part, funded by the National Institutes of Health (NIH) Agreement No. 1OT2OD032581. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NIH.\u003c/p\u003e\n\u003ch3\u003eAuthors\u0026apos; contributions\u003c/h3\u003e\n\u003cp\u003eAll authors made substantial contributions to the conception, design, and interpretation of this work. NM conducted the primary analysis of the data and drafted the manuscript. SM led the conceptualization and design of the work. All authors substantively revised the work, approved the submitted version, and agreed to be accountable for all aspects of the work.\u003c/p\u003e\n\u003ch3\u003eAcknowledgments\u003c/h3\u003e\n\u003cp\u003eThe research reported in this work was supported by South Dakota State University, AIM-AHEAD Coordinating Center, award number OTA-21-017, and was, in part, funded by the National Institutes of Health Agreement No. 1OT2OD032581. The work is solely the responsibility of the authors and does not necessarily represent the official view of AIM-AHEAD or the National Institutes of Health. The data reported here have been supplied by the United States Renal Data System (USRDS). The interpretation and reporting of these data are the responsibility of the author(s) and in no way should be seen as an official policy or interpretation of the U.S. government.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eNational Institutes of Health. USRDS Annual Data Report: Epidemiology of kidney disease in the United States. 2024.\u003c/li\u003e\n\u003cli\u003eRekabdarkolaee HM, Longacre LE, Isaacson MJ, Varilek BM. Hospice Referral Rate Disparities of American Indian/Alaska Native Kidney Transplant Recipients with End-Stage Kidney Disease: A Retrospective Cohort Analysis. American Journal of Hospice and Palliative Medicine\u0026reg;. 2025;0(0):1-8.\u003c/li\u003e\n\u003cli\u003eVarilek BM, Isaacson MJ, Moradi Rekabdarkolaee H. Evaluating disparities in end-stage kidney disease survival among American Indian/Alaska native persons with diabetes. Journal of Racial and Ethnic Health Disparities. 2024:1-11.\u003c/li\u003e\n\u003cli\u003eJarrar F, Pasternak M, Harrison TG, James MT, Quinn RR, Lam NN, et al. Mortality Risk Prediction Models for People With Kidney Failure: A Systematic Review. JAMA Network Open. 2025;8(1):e2453190-e.\u003c/li\u003e\n\u003cli\u003eTaal MW, Brenner BM. Renal risk scores: progress and prospects. Kidney international. 2008;73(11):1216-9.\u003c/li\u003e\n\u003cli\u003eLi Q, Li P, Xu Z, Lu Z, Yang C, Ning J. Association of diabetes with cardiovascular calcification and all-cause mortality in end-stage renal disease in the early stages of hemodialysis: a retrospective cohort study. Cardiovascular Diabetology. 2024;23(1):259.\u003c/li\u003e\n\u003cli\u003eTaal MW, Brenner BM. Predicting initiation and progression of chronic kidney disease: developing renal risk scores. Kidney international. 2006;70(10):1694-705.\u003c/li\u003e\n\u003cli\u003eO\u0026apos;Seaghdha CM, Lyass A, Massaro JM, Meigs JB, Coresh J, D\u0026apos;Agostino Sr RB, et al. A risk score for chronic kidney disease in the general population. The American journal of medicine. 2012;125(3):270-7.\u003c/li\u003e\n\u003cli\u003eRamspek CL, Voskamp PWM, Van Ittersum FJ, Krediet RT, Dekker FW, Van Diepen M. Prediction models for the mortality risk in chronic dialysis patients: a systematic review and independent external validation study. Clinical epidemiology. 2017:451-64.\u003c/li\u003e\n\u003cli\u003eRamspek CL, Jager KJ, Dekker FW, Zoccali C, van Diepen M. External validation of prognostic models: what, why, how, when and where? Clinical kidney journal. 2021;14(1):49-58.\u003c/li\u003e\n\u003cli\u003eThamer M, Kaufman JS, Zhang Y, Zhang Q, Cotter DJ, Bang H. Predicting early death among elderly dialysis patients: development and validation of a risk score to assist shared decision making for dialysis initiation. American Journal of Kidney Diseases. 2015;66(6):1024-32.\u003c/li\u003e\n\u003cli\u003eRamspek CL, Verberne WR, van Buren M, Dekker FW, Bos WJW, van Diepen M. Predicting mortality risk on dialysis and conservative care: development and internal validation of a prediction tool for older patients with advanced chronic kidney disease. Clinical Kidney Journal. 2021;14(1):189-96.\u003c/li\u003e\n\u003cli\u003eIvory SE, Polkinghorne KR, Khandakar Y, Kasza J, Zoungas S, Steenkamp R, et al. Predicting 6-month mortality risk of patients commencing dialysis treatment for end-stage kidney disease. Nephrology Dialysis Transplantation. 2017;32(9):1558-65.\u003c/li\u003e\n\u003cli\u003eHalbesma N, Jansen DF, Heymans MW, Stolk RP, de Jong PE, Gansevoort RT, et al. Development and validation of a general population renal risk score. Clinical journal of the American Society of Nephrology. 2011;6(7):1731-8.\u003c/li\u003e\n\u003cli\u003eMoore J, He X, Shabir S, Hanvesakul R, Benavente D, Cockwell P, et al. Development and evaluation of a composite risk score to predict kidney transplant failure. American journal of kidney diseases. 2011;57(5):744-51.\u003c/li\u003e\n\u003cli\u003eU.S. Renal Data System. 2023 USRDS Annual Data Report: Epidemiology of Kidney Disease in the United States. National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, 2023.\u003c/li\u003e\n\u003cli\u003eU.S. Renal Data System. 2023 Researcher\u0026rsquo;s Guide to the USRDS Database. National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, 2023.\u003c/li\u003e\n\u003cli\u003ePotluri VS, Reddy YN, Tummalapalli SL, Peng C, Huang Q, Zhao Y, et al. Early effects of the end-stage renal disease treatment choices model on kidney transplant waitlist additions. Clinical Journal of the American Society of Nephrology. 2024:10.2215.\u003c/li\u003e\n\u003cli\u003eCai C, Zou Y, Peng Y, Zhang J. smcure: An R-Package for estimating semiparametric mixture cure models. Computer methods and programs in biomedicine. 2012;108(3):1255-60.\u003c/li\u003e\n\u003cli\u003eDempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological). 1977;39(1):1-22.\u003c/li\u003e\n\u003cli\u003eAmico M, Van Keilegom I. Cure models in survival analysis. Annual Review of Statistics and Its Application. 2018;5:311-42.\u003c/li\u003e\n\u003cli\u003eMeyer N, Michael S, Moradi Rekabdarkolaee H, Varilek B, Ngorsuraches S, Brooks P. Proportional Hazards Mixture Cure Models for End-Stage Kidney Disease. [Poster]. In press 2024.\u003c/li\u003e\n\u003cli\u003eKutner MH, Nachtsheim CJ, Neter J. Applied linear regression models. 4 ed: McGraw-Hill/Irwin; 2004.\u003c/li\u003e\n\u003cli\u003eHartman N, Kim S, He K, Kalbfleisch JD. Pitfalls of the concordance index for survival outcomes. Statistics in Medicine. 2023;42(13):2179-90.\u003c/li\u003e\n\u003cli\u003eTherneau TM. A Package for Survival Analysis in R. 3.5-5. ed2023.\u003c/li\u003e\n\u003cli\u003eWasserman L. All of statistics: a concise course in statistical inference: Springer; 2005.\u003c/li\u003e\n\u003cli\u003eSilverman BW. Density Estimation for Statistics and Data Analysis1986.\u003c/li\u003e\n\u003cli\u003eR Core Team. R: A Language and Environment for Statistical Computing. 4.3.2 ed. Vienna, Austria: R Foundation for Statistical Computing; 2023.\u003c/li\u003e\n\u003cli\u003eOffice of Intergovernmental and External Affairs (IEA). HHS Regional Offices 2024 [Available from: https://www.hhs.gov/about/agencies/iea/regional-offices/index.html.\u003c/li\u003e\n\u003cli\u003eIndian Health Service. Great Plains Area 2024 [Available from: https://www.ihs.gov/greatplains/.\u003c/li\u003e\n\u003cli\u003eZhang Y, Shao Y. Concordance measure and discriminatory accuracy in transformation cure models. Biostatistics. 2018;19(1):14-26.\u003c/li\u003e\n\u003cli\u003eMichael S, Melnykov V. An effective strategy for initializing the EM algorithm in finite mixture models. Advances in Data Analysis and Classification. 2016;10:563-83.\u003c/li\u003e\n\u003cli\u003eResearch Data Assistance Center. CMS Cell Size Suppression Policy 2024 [Available from: https://resdac.org/articles/cms-cell-size-suppression-policy.\u003c/li\u003e\n\u003cli\u003eChang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y, et al. shiny: Web Application Framework for R. 1.8.0 ed2023.\u003c/li\u003e\n\u003cli\u003eMeyer N and Michael S. ESKD Shiny App 2025 [2025-04-18]. Available from: https://graphics.shinyapps.io/ESKD_Shiny_App/\u003c/li\u003e\n\u003cli\u003eGiusti S, Arrigain S, Lopez R, Pomfret E, Cervantes L, Schold JD. Evaluating Long Term Outcomes Among Hispanic Kidney Transplant Recipients. American Journal of Kidney Diseases. 2025;0(0).\u003c/li\u003e\n\u003cli\u003eMour GK, Kukla A, Jaramillo A, Ramon DS, Wadei HM, Stegall MD. Renal disease and kidney transplantation in Hispanic American persons. Kidney360. 2024;5(11):1763-70.\u003c/li\u003e\n\u003cli\u003eBellos I, Marinaki S, Samoli E, Boletis IN, Benetou V. Sociodemographic disparities in adults with kidney failure: a meta-analysis. Diseases. 2024;12(1):23.\u003c/li\u003e\n\u003cli\u003eHarding JL, Pavkov M, Wang Z, Benoit S, Burrows NR, Imperatore G, et al. Long-term mortality among kidney transplant recipients with and without diabetes: a nationwide cohort study in the USA. BMJ Open Diabetes Research and Care. 2021;9(1):e001962.\u003c/li\u003e\n\u003cli\u003eCrouch E, Yell N, Herbert L, Browne T, Hung P. Availability and Quality of Dialysis Care in Rural versus Urban US Counties. American Journal of Nephrology. 2024;55(3):361-8.\u003c/li\u003e\n\u003cli\u003eNational Institutes of Health. Healthcare Disparities 2023 [Available from: https://usrds-adr.niddk.nih.gov/2023/supplements-covid-19-disparities/14-healthcare-disparities.\u003c/li\u003e\n\u003cli\u003eWilke RA, Qamar M, Lupu RA, Gu S, Zhao J. Chronic kidney disease in agricultural communities. The American journal of medicine. 2019;132(10):e727-e32.\u003c/li\u003e\n\u003cli\u003eBen Khadda Z, Fagroud M, El Karmoudi Y, Ezrari S, Elhanafi L, Radu A-F, et al. Association between pesticide exposure and end-stage renal disease: A case-control study from Morocco based on the STROBE guidelines. Ecotoxicology and Environmental Safety. 2024;288:117360.\u003c/li\u003e\n\u003cli\u003eSnow KK, Patzer RE, Patel SA, Harding JL. County-level characteristics associated with variation in ESKD mortality in the United States, 2010\u0026ndash;2018. Kidney360. 2022;3(5):891-9.\u003c/li\u003e\n\u003cli\u003eFink JC, Burdick RA, Kurth SJ, Blahut SA, Armistead NC, Turner MS, et al. Significance of serum creatinine values in new end-stage renal disease patients. American journal of kidney diseases. 1999;34(4):694-701.\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Algorithm 1","content":"\u003cp\u003eAlgorithm 1 is available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"ESKD, mixture cure model, prognostic model, risk score, USRDS, survival score","lastPublishedDoi":"10.21203/rs.3.rs-6523746/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6523746/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eThere is a need to create new mortality prediction models for end-stage kidney disease (ESKD). This study aimed to develop and validate survival scores for patients with ESKD using a mixture cure model (MCM) including assessing the spatial trends in ESKD outcomes.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eThis study used a United States Renal Data System (USRDS) dataset that contains 2,228,693 people with incident ESKD from 2000 through 2020, including those on dialysis or had at least one transplant. Many variables, including demographic and comorbid factors, were included within an MCM. This MCM was used to develop seven survival scores that would be summarized geographically. These survival scores are shown using maps of the United States and validated using the clinical measurements found within the USRDS dataset.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eMany spatial survival trends across the United States were observed that could be validated using the USRDS data and current literature. The Appalachian and Great Plains regions of the United States contained individuals who mostly had lower survivability. Conversely, individuals residing around Southern California, in the Southeast, and around the Texas-Mexico border had higher survivability. Most of these findings aligned with previous studies. Furthermore, many of the trends could be explained by both the coefficient estimates of the MCM and the characteristics of the individuals living in each region. For example, the MCM coefficient estimates found Hispanics to have a higher survivability than their non-Hispanic counterparts, which aligned with the predominantly Hispanic-populated area of the Texas-Mexico border. Lastly, serum creatine, a USRDS variable not used within the MCM, was found to have a moderately positive, linear relationship with the survival scores developed.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eThe survival scores developed and validated may benefit practitioners and policy-makers in more effectively addressing ESKD disparities.\u003c/p\u003e","manuscriptTitle":"Development and Validation of Survival Scores and the Assessment of Spatial Trends in End-Stage Kidney Disease Outcomes","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-27 17:43:28","doi":"10.21203/rs.3.rs-6523746/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"6803f62e-cb40-419d-a6b0-4fefad26aa88","owner":[],"postedDate":"May 27th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-04-15T10:03:44+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-27 17:43:28","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6523746","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6523746","identity":"rs-6523746","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00