Classifications for Radiographic Evaluation of Radiolucent Bone Lesions have Poor Inter- and Intra-observer Agreement

doi:10.21203/rs.3.rs-4301904/v1

Classifications for Radiographic Evaluation of Radiolucent Bone Lesions have Poor Inter- and Intra-observer Agreement

2024 · doi:10.21203/rs.3.rs-4301904/v1

preprint OA: closed

Full text JSON View at publisher

Full text 96,231 characters · extracted from preprint-html · click to expand

Classifications for Radiographic Evaluation of Radiolucent Bone Lesions have Poor Inter- and Intra-observer Agreement | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Classifications for Radiographic Evaluation of Radiolucent Bone Lesions have Poor Inter- and Intra-observer Agreement Taylor J. Willenbring, Sarah M. Papa, Kenneth A. Mann, Salvatore Cavallaro, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4301904/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Radiolucent bone lesions are encountered in all orthopedic specialties, and concise description is essential to inform evaluation and treatment. We studied the interobserver reliability and intra-observer reproducibility of three classification systems of radiographic radiolucent lesions: (1) original Lodwick classification, (2) modified Lodwick classification, and (3) Enneking classification for benign tumors. We hypothesized that intra-observer reproducibility would be good but interobserver reliability would be poor, improving with training level, and highest for the Enneking classification. Methods Forty-eight case sets of de-identified radiographs of radiolucent osseous lesions were selected from an orthopedic oncology practice. Each set included two orthogonal views of the lesion from initial presentation. Twenty participants (one third-year medical student, 18 residents, one orthopedic oncologist) classified each case twice, with a minimum two-week gap between sessions, according to the Lodwick classification, modified Lodwick classification, and Enneking classification. Interobserver reliability and intra-observer reproducibility were calculated using Fleiss’ kappa and Krippendorff’s alpha, treating the classifications as nominal and ordinal rankings, respectively. Linear regression models were used to determine the effect of training level on reproducibility. Contingency tables were used to assess the accuracy of correctly identifying benign versus malignant lesions against their known diagnoses. Results Interobserver reliability was poor, as demonstrated by agreement of 39% (κ = 0.23; α = 0.54), 39% (κ = 0.25; α = 0.48), and 53% (κ = 0.28; α = 0.45) for the Lodwick, modified Lodwick, and Enneking classifications, respectively. Intra-observer reproducibility also lacked strong agreement (κ = 0.42–0.45). Training level had no effect on reproducibility (R 2 0.05 for all classifications). Comparison of intra-observer reproducibility showed Krippendorff’s alpha for the Lodwick (α = 0.72), modified Lodwick (α = 0.69), and Enneking classification (α = 0.63). Self-agreement for individuals ranged from 39–78%. Lesions were correctly classified as malignant for 73.3%, 59.0%, and 62% of cases for the three classification systems, respectively. Conclusions Our data demonstrate that three common classifications for osseous radiolucent lesions are neither reliable nor reproducible. Consistency of classification varied depending on lesion characteristics, with the strongest reproducibility demonstrated for the highest and lowest grades of the classification systems. There was no association between orthopedic experience and intra-observer reproducibility. These deficiencies may be improved with AI applications. orthopedic oncology radiolucent bone lesions intra-observer reproducibility interobserver reliability Lodwick modified Lodwick Enneking Figures Figure 1 Figure 2 Figure 3 BACKGROUND Radiolucent lesions of bone are encountered within all orthopedic specialties, and therefore, concise methods of describing them are essential for effective communication between practitioners and to inform decision-making regarding further evaluation, need for biopsy, and treatment. Radiographic evaluation of bone lesions is the key first step in assessing tumor aggressiveness, and therefore in making determinations regarding the need for biopsy versus close follow up [ 1 ]. In 1980, Lodwick developed a classification system for grading the aggressiveness of radiolucent bone lesions based on four characteristics: pattern of bone destruction, presence or absence of cortical penetration, presence or absence of a sclerotic rim, and presence or absence of an expanded cortical shell [ 2 ]. Lesions were thus divided into five categories (IA, IB, IC, II, or III) from least to most aggressive [ 2 ]. In attempts to validate the reproducibility of the Lodwick system, one study demonstrated high agreement between raters for both grades IA (likely benign) and III (likely malignant), but more variable for the remaining grades [ 3 ]. Since its initial publication, efforts have been made to simplify the Lodwick classification system, including using a decision tree approach for ease of application of the system in the clinical setting [ 4 ]. A year after the publication of the Lodwick classification, the Modified Lodwick-Madewell grading system was proposed to decrease the complexity and increase the reproducibility of its predecessor [ 5 ]. This system incorporates additional lesion characteristics, including changing margins and occult presentation on radiographs to divide lesions into six categories (IA, IB, II, IIIA, IIIB, IIIC) that are modifications of or additions to the original Lodwick grades [ 5 ]. When applying these criteria to a cohort of 183 patients, the authors reported a high clinical predictive capacity when compared to biopsy, with 94% of grade I lesions being classified correctly as benign and 81% of grade III lesions being classified correctly as malignant [ 5 ]. However, the reliability and reproducibility of this grading system have not been assessed. In addition to the Lodwick-based classifications, other systems have aimed to simplify the classification of radiolucent bone lesions including the classification system for benign bone lesions created by Enneking [ 6 ]. This system was formally adopted by the American Joint Committee for Cancer (AJCC) in 1980 with the goal of providing management guidance as well as prognostic value [ 7 ]. The Enneking classification uses three grades, latent, active, and aggressive, based on the characteristics of the tumor margins on radiographs [ 6 ]. A study by Drumond in 2010 found a 95.2% agreement between Enneking grading and ultimate treatment course demonstrating its utility for benign bone tumors [ 8 ]. However, a separate analysis demonstrated only fair interobserver reliability (Fleiss kappa 0.26–0.38), moderate intra-observer reliability (Cohen kappa 0.48), and low intra-observer agreement (67%) [ 9 ]. Inter- and intra-observer reliability were increased amongst orthopedic tumor surgeons/fellows versus senior orthopedic residents. Additional characteristics often considered when assessing radiolucent bone lesions include patient age, symptoms, size and number of lesions, anatomic location, presence of a soft tissue component, presence and type of periosteal reaction, and presence and characteristics of matrix mineralization [ 1 , 10 ]. While these characteristics are not explicitly included in the aforementioned classification systems, they do play a role in ultimate decision making. We aimed to study these three classification systems of radiolucent bone lesions which rely heavily, if not entirely, on radiographic imaging. We utilize pre-validated reproducibility statistics to assess the interobserver reliability and intra-rater reproducibility. Given the complexity of the current radiographic classification systems for assessing bone tumors, we hypothesized that (1) intra-observer reproducibility of each of these systems would be good and would improve with training level, (2) interobserver reliability would be poor, and (3) that the Enneking classification would have the highest intra-observer reproducibility. METHODS This single institution study was deemed institutional review board exempt (IRB 2191652-1). Forty-eight case sets of de-identified radiographs of radiolucent osseous lesions with biopsy-proven (30 case sets) or clinically confirmed (17 case sets) diagnoses were selected from the orthopedic oncology practice of the senior author (TAD). A single case with unknown diagnosis due to non-diagnostic biopsy was also included. These cases were selected to provide a broad array of anticipated margins across the spectrum of the three classification systems, anatomic sites, benign and malignant lesions, and specific diagnoses. Selected lesions included both benign (33 case sets; 16 biopsy-proven, 17 clinically confirmed) and malignant (14 case sets, all biopsy-proven) conditions as well as both primary bone (41 case sets; 24 biopsy-proven, 17 clinically confirmed) and metastatic (6 case sets, all biopsy-proven) processes. The single unknown case was suspected to be a metastatic malignant lesion; however, biopsy was non-diagnostic. Malignant lesions were included to provide appropriate examples of aggressive lesions to assess each classification system's ability to correctly identify aggressive lesions and indicate the need for biopsy. An additional file with description of the diagnoses has been included [See Additional File 1]. Clinically confirmed diagnoses were made based on imaging characteristics and stability through a period of clinical and radiographic follow up. Each set included at least 2 orthogonal views of the lesion from initial presentation without prior biopsy or oncologic treatments. As the aim of this study was to evaluate the classification systems purely on their radiographic features, no information regarding clinical history, context of lesion discovery, presence or absence of pain, past medical history, or physical exam findings were provided. After obtaining informed consent, twenty individuals classified each lesion according to the Lodwick classification [ 2 ], the modified Lodwick classification [ 5 ], and Enneking classification [ 6 ], via an online survey. The twenty participants included one third-year medical student, four PGY1 residents, three PGY2 residents, three PGY3 residents, four PGY4 residents, four PGY5 residents, and one orthopedic oncologist. All participants classified each lesion twice, with a minimum two-week gap between sessions, to assess intra-observer reproducibility. Participants were given a reference sheet explaining each classification system they could reference throughout the survey (Fig. 1 ). Instructions were provided regarding the ability to zoom in on images to improve viewing of the lesion characteristics and that the survey would take approximately one hour to complete. An example case is shown in Fig. 2 . Interobserver reliability was calculated using both Fleiss kappa and Krippendorff’s alpha, treating the classification grades as nominal and ordinal rankings, respectively. Intra-observer reproducibility was also calculated using Fleiss’ kappa and Krippendorff’s alpha. Linear regression models were then used to determine the effect of training level on ability to apply the classification systems consistently. Comparison of intra-rater reproducibility between classifications was performed using paired T-tests with correction for multiple comparison of the respective Krippendorff’s alpha values. Statistical analysis was performed using SPSS version 29 (Armonk, NY) with associated plugin for Krippendorff’s alpha calculations [ 11 ]. JMP Pro version 17 (Cary, NC) was utilized for the remainder of analyses. Contingency tables were calculated to assess the accuracy of each classification system to identify lesions as benign or malignant when compared to their known diagnoses. This was first performed using only lesions with biopsy-proven diagnoses followed by an analysis with all lesions, including those with clinical diagnoses. Within the Lodwick and Modified Lodwick classifications, lesions given grades 1 and 2 (and their sub-types) were considered benign while those with grade 3 were considered to have high suspicion for malignancy. Likewise, the first two grades of the Enneking classification were considered benign. The positive and negative predictive values for the extreme grades (Grade 1, most likely to be benign and Grade 3, most likely to be malignant) in each classification were also calculated. RESULTS Interobserver reliability was poor for all three classifications as demonstrated by agreement of only 39% (κ = 0.23), 39% (κ = 0.25), and 53% (κ = 0.28) for the Lodwick, modified Lodwick, and Enneking classifications, respectively (Table 1 ). Table 1 Agreement kappa and alpha values for each classification. Classification Inter-rater Reliability Intra-rater Reproducibility Kappa Alpha Kappa Alpha Lodwick 0.23 0.54 0.45 0.71 Modified Lodwick 0.25 0.48 0.44 0.69 Enneking 0.28 0.45 0.43 0.63 The three classifications only achieved “fair agreement” and “minimal agreement” levels for reliability according to kappa interpretations by Landis (k > 0.41) [ 12 ] and McHugh (k > 0.60) [ 13 ], respectively. When the classifications were treated as ordinal data, there was no improvement in the overall agreement (α = 0.54, 0.48, 0.45, respectively). Traditionally, ordinal data is considered reliable when Krippendorff’s α > 0.8. In higher risk scenarios, acceptability cutoffs become more stringent, requiring higher α values. Systems are considered grossly inconsistent with α < 0.676. When assessing the reliability between raters in relation to the classification grades, the highest and lowest grades for the Lodwick and Enneking classification demonstrated the highest reliability. This pattern was present in the modified Lodwick classification, except for the IIIC grade which demonstrated a low reliability (Table 2 ) Table 2 Agreement kappa values for each grade within the respective classification demonstrating the most inconsistency for intermediate grades. Grade Lodwick Modified Lodwick Enneking Kappa 1(a) 0.3999 0.4067 0.3504 1b 0.284 0.2882 - 1c 0.1232 - - 2 0.0955 0.064 0.1226 3(a) 0.2879 0.0843 0.3827 3b - 0.3891 - 3c - 0.0591 - Intra-observer reproducibility achieved only weak agreement according to McHugh [ 13 ] and moderate agreement according to Landis [ 12 ], although kappa values were improved (κ = 0.42–0.45) relative to the interobserver measures (Table 3 ). Table 3 Agreement kappa and alpha values for each classification. Classification Inter-rater Reliability Intra-rater Reproducibility Kappa Alpha Kappa Alpha Lodwick 0.23 0.54 0.45 0.71 Modified Lodwick 0.25 0.48 0.44 0.69 Enneking 0.28 0.45 0.43 0.63 Training level had no effect on the ability to reproducibly classify lesions using any of the three classifications (R 2 0.05 for all classifications, Fig. 3 ). Comparison of classification systems with respect to intra-observer reproducibility resulted with Krippendorff’s alpha for the Lodwick (α = 0.72), modified Lodwick (α = 0.69), and Enneking classification (α = 0.63). Self-agreement for individuals was quite variable, ranging from 39–78%. The overall ability to distinguish benign (grade 1 or 2) versus malignant (grade 3) lesions using the Enneking classification was low with a sensitivity of 0.662 and a specificity of 0.692 for the first classification attempt (Table 4 ). Table 4 2 x 2 Contingency Tables for all scorers compared to clinical diagnosis using Enneking Classification. Method of diagnosis confirmation included biopsy, clinical, and imaging findings. First Scoring Session Enneking Classification Malignant Benign 1 or 2 score 115 429 3 score 225 191 Accuracy: 0.681, Sensitivity: 0.662, Specificity 0.692. Second Scoring Session Enneking Classification Malignant Benign 1 or 2 score 121 455 3 score 219 165 Accuracy: 0.702, Sensitivity: 0.644, Specificity 0.732. Results were similar during the second classification attempt with a sensitivity of 0.644 and a specificity of 0.732. The agreement between classification and biopsy proven diagnosis was highest for grade 1 lesions, with 76.3%, 80.9%, and 77.2% of grade 1 lesions correctly classified as benign using the Lodwick, modified Lodwick, and Enneking classification, respectively. Agreement was much more variable for grade 3 lesions, with 73.3%, 59.0%, and 62.0% correctly classified as malignant using the Lodwick, modified Lodwick and Enneking classification, respectively (Table 5 ). Table 5 Scoring system predictions for correctly identifying benign and malignant lesions. Only cases with biopsy results reported were used here (N = 640 observations). Mean percent is shown with mean scores from the first and second scoring sessions in parentheses. Scoring System Grade 1 lesions (% benign) Negative Predictive Value (NPV) Grade 3 lesions (% malignant) Positive Predictive Value (PPV) Lodwick 76.3 (76.1, 76.5) 73.3 (74.3, 72.3) Modified Lodwick 80.9 (80.1, 81.6) 59.0 (59.4, 58.6) Enneking 77.2 (76.0, 78.4) 62.0 (61.7, 62.2) When including lesions with both biopsy and clinically proven diagnosis, negative predictive value in classifying benign lesions was improved, with 80.2%, 83.6%, and 81.9% of grade 1 lesions correctly classified as benign using the Lodwick, modified Lodwick, and Enneking classification, respectively. However, positive predictive value in classifying malignant lesions decreased, with 67.4%, 51.1%, and 55.6% of grade 3 lesions correctly classified as malignant using the Lodwick, modified Lodwick, and Enneking classification, respectively (Table 6 ). Table 6 Scoring system predictions for correctly identifying benign and malignant lesions. Lesions with both biopsy and clinically proven diagnosis included (N = 960 observations). Mean percent is shown with mean scores from the first and second scoring sessions in parentheses. Scoring System Grade 1 lesions (% benign) Negative Predictive Value (NPV) Grade 3 lesions (% malignant) Positive Predictive Value (PPV) Lodwick 80.2 (80.5, 80.0) 67.4 (69.0, 65.9) Modified Lodwick 83.6 (82.8, 84.3) 51.1 (51.2, 50.9) Enneking 81.9 (80.1, 83.6) 55.6 (54.1, 57.0) DISCUSSION Our data demonstrates that three commonly used classifications for osseous radiolucent lesions on radiographs were not reliable nor reproducible using our specific study set of cases in a group of 20 predominately resident participants. This is consistent with previous data regarding the Enneking classification demonstrating only fair interobserver reliability and moderate intra-observer reproducibility [ 9 ]. The reliability of classification was highly variable depending on the individual lesions’ characteristics. Consistency was highest for both the highest and lowest grades of the Lodwick and Enneking classifications, with greater variability for intermediate grades. This was also true for the modified Lodwick classification system, except the IIIC grade, possibly due to the paucity of IIIC lesions included in the case set. Overall sensitivity and specificity for distinguishing benign versus malignant lesions was low for all classification systems. However, the ability to correctly classify lesions as benign versus malignant was highest for grade 1 lesions across the classification systems. The ability to correctly classify grade 3 lesions as malignant was much more variable and was highest for the Lodwick classification system. This is partially consistent with previous studies demonstrating that accuracy for application of both the Lodwick and modified Lodwick classifications is high for grade IA (likely benign) and grade III (likely malignant) lesions but becomes more variable for lesions not at either extreme of these classification systems [ 3 , 5 ]. However, our study demonstrated much lower accuracy in applying the classification systems to grade 3 lesions. This suggests that these classifications may be useful for certain lesions but are unable to be reliably applied broadly across a wide array of lesion types. Interestingly, there was no association between orthopedic experience and intra-observer reproducibility, particularly within the 5-year span of orthopedic residency training, supporting the notion that the descriptions themselves are poorly applied to some lesions. Contingency calculations of classification results compared to known diagnoses demonstrated overall poor ability to distinguish benign from malignant lesions. However, when considering only the lowest Grade 1 classes, there were more acceptable negative predictive values ranging from 80.2–83.6%. While the overall sensitivity and consistency of these classifications may be limited, Grade 1 classifications demonstrate a reasonable predictiveness. The deficiencies noted in this study may be improved with artificial intelligence (AI) applications in the future, and the results emphasize the need for better techniques. A recent study by Park et al. found that an AI system trained to classify radiographic proximal femoral lesions as benign, malignant, or no tumor demonstrated higher accuracy compared to four physicians collectively (two general orthopedic surgeons and two orthopedic oncologists) as well as individually [ 14 ]. Similarly, He et al. describe high accuracy of AI in classifying radiographic primary bone tumors as benign vs not-benign, malignant vs not-malignant, and benign vs intermediate vs malignant. The accuracy of their AI system was equivalent to that of two radiologists with musculoskeletal subspecialty training, and superior to that of 3 junior radiologists [ 15 ]. As this field continues to expand, it is likely that these systems will become increasingly accurate, and thus may become superior to current classification systems. This study does have some limitations. One is the small number of case sets included in the study. Although smaller than some previous studies with 233 [ 3 ] or 183 [ 5 ] cases, it is similar to the 65-case study size of Alpuerto et al [ 9 ]. The 48 cases included in the current study also represent a wide breadth of tumor diagnoses and radiographic appearances. This study is also limited by the narrow breadth of orthopedic experience evaluated, with most coming from the five-year span of orthopedic residency training. However, the focus of our study was to evaluate the reliability of those initially encountering and describing these lesions, namely orthopedic residents soon to become community surgeons. The inclusion of one medical student and one fellowship-trained orthopedic oncologist was meant to serve as outlying anchors on the extremes of the experience spectrum. Malignant lesions were also included in our case set in addition to benign lesions. While the Lodwick and modified Lodwick classification systems have been designed for use with both benign and malignant lesions [ 3 , 5 ], the Enneking classification for benign tumors was not designed for use with malignant lesions [ 6 ]. The inclusion of malignant lesions was deemed appropriate to provide examples of aggressive lesions to assess the Enneking classification’s ability to correctly identify aggressive lesions and inform the need for biopsy. Of the included cases, not all were biopsy-proven diagnoses. However, those not histologically assessed were diagnosed clinically using imaging characteristics and stability through clinical and radiographic follow-up. Finally, this study was limited to participants in the field of orthopedic surgery, and did not include radiologists or radiology trainees, who also often encounter radiolucent bone lesions, sometimes as the initial provider discovering the lesion. Inclusion of this group in future studies would allow investigation of the degree of reliability and reproducibility of these classification systems when applied by a non-orthopedic specialty. This would also yield a comparison of the metrics between orthopedic surgeons and radiologists. CONCLUSIONS This study demonstrates the poor interobserver reliability and intra-observer reproducibility of three classifications used to radiographically describe radiolucent lesions of bone. As they currently stand, there is no reliable method to classify these lesions and communicate about them effectively. Although AI protocols may improve reliability, further work needs to be done utilizing machine learning to train with larger data sets. Abbreviations AI Artificial intelligence AJCC American Joint Committee for Cancer PGY Post graduate year SPSS Statistical Package for the Social Sciences Declarations Ethics approval and consent to participate : This study was not subject to IRB approval. Participants were limited to voluntary orthopedic trainees and faculty. All included images were deidentified. Consent for publication : Not applicable Availability of data and materials : The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. Competing interests: One author (TAD) has received funding from: BMC Musculoskeletal Disorders: Editorial or governing board Bone Support (Cerament): Paid consultant BoneSupport (Cerament): Research support Clinical Orthopaedics and Related Research: Editorial or governing board eMedicine: Editorial or governing board; Publishing royalties, financial or material support Journal of Orthopaedic Research: Editorial or governing board Journal of Surgical Oncology: Editorial or governing board Journal of the American Academy of Orthopaedic Surgeons: Publishing royalties, financial or material support Medicina: Editorial or governing board Open Journal of Orthopedics: Editorial or governing board OREF: Research support PLOS One: Editorial or governing board Stryker: Research support Up To Date: Publishing royalties, financial or material support Wolters Kluwer Health - Lippincott Williams & Wilkins: Publishing royalties, financial or material support Wright Medical Technology, Inc.: Research support The remaining authors (TJW, SMP, KAM, SC) do not have any disclosures. Funding : This study was internally funded. Author’s Contributions : TAD and SC: conception and study design TJW and SP: data collection TJW, KAM, and TAD: data analysis and interpretation TJW and SP: manuscript preparation TAD, KAM, and SC: manuscript editing and revision All authors read and approved the final manuscript. Acknowledgements : Not applicable References Costelloe CM, Madewell JE. Radiography in the initial diagnosis of primary bone tumors. American Journal of Roentgenology . 2013;200(1):3-7. doi:10.2214/AJR.12.8488/ASSET/IMAGES/01_12_8488_14.JPEG Lodwick GS, Wilson AJ, Farrell C, Virtama P, Dittrich F. Determining growth rates of focal lesions of bone from radiographs. Radiology . 1980;134(3):577-583. doi:10.1148/RADIOLOGY.134.3.6928321 Lodwick GS, Wilson AJ, Farrell C, Virtama P, Smeltzer FM, Dittrich F. Estimating rate of growth in bone lesions: observer performance and error. Radiology . 1980;134(3):585-590. doi:10.1148/RADIOLOGY.134.3.6986621 Benndorf M, Bamberg F, Jungmann PM. The Lodwick classification for grading growth rate of lytic bone tumors: a decision tree approach. Skeletal Radiol . 2022;51(4):737-745. doi:10.1007/S00256-021-03868-8 Caracciolo JT, Temple HT, Letson GD, Kransdorf MJ. A Modified Lodwick-Madewell Grading System for the Evaluation of Lytic Bone Lesions. AJR Am J Roentgenol . 2016;207(1):150-156. doi:10.2214/AJR.15.14368 Enneking WF. A System of Staging Musculoskeletal Neoplasms. Clin Orthop Relat Res . 1986;(204):9-24. Jawad MU, Scully SP. In Brief: Classifications in Brief: Enneking Classification: Benign and Malignant Tumors of the Musculoskeletal System. Clin Orthop Relat Res . 2010;468(7):2000. doi:10.1007/S11999-010-1315-7 Drumond JMN. EFFICACY OF THE ENNEKING STAGING SYSTEM IN RELATION TO TREATING BENIGN BONE TUMORS AND TUMOR-LIKE BONE LESIONS. Rev Bras Ortop (Sao Paulo) . 2010;45(1):46. doi:10.1016/S2255-4971(15)30216-0 Alpuerto BB, Wang EHM. Interobserver and Intra-observer Reliability of the Enneking Classification in Plain Radiographic Staging of Benign Bone Tumors of the Extremities in Patients Seen at the Philippine General Hospital. Acta Med Philipp . 2021;55(3):341-348. doi:10.47895/AMP.VI0.1750 Miller TT. Bone tumors and tumorlike conditions: analysis with conventional radiography. Radiology . 2008;246(3):662-674. doi:10.1148/RADIOL.2463061038 Hayes AF, Krippendorff K. Answering the Call for a Standard Reliability Measure for Coding Data. Commun Methods Meas . 2007;1(1):77-89. doi:10.1080/19312450709336664 Landis JR, Koch GG. The Measurement of Observer Agreement for Categorical Data. Biometrics . 1977;33(1):159. doi:10.2307/2529310 McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb) . 2012;22(3):276. doi:10.11613/bm.2012.031 Park CW, Oh SJ, Kim KS, et al. Artificial intelligence-based classification of bone tumors in the proximal femur on plain radiographs: System development and validation. PLoS One . 2022;17(2). doi:10.1371/JOURNAL.PONE.0264140 He Y, Pan I, Bao B, et al. Deep learning-based classification of primary bone tumors on radiographs: A preliminary study. EBioMedicine . 2020;62. doi:10.1016/J.EBIOM.2020.103121 Additional Declarations Competing interest reported. One author (TAD) has received funding from: BMC Musculoskeletal Disorders: Editorial or governing board Bone Support (Cerament): Paid consultant BoneSupport (Cerament): Research support Clinical Orthopaedics and Related Research: Editorial or governing board eMedicine: Editorial or governing board; Publishing royalties, financial or material support Journal of Orthopaedic Research: Editorial or governing board Journal of Surgical Oncology: Editorial or governing board Journal of the American Academy of Orthopaedic Surgeons: Publishing royalties, financial or material support Medicina: Editorial or governing board Open Journal of Orthopedics: Editorial or governing board OREF: Research support PLOS One: Editorial or governing board Stryker: Research support Up To Date: Publishing royalties, financial or material support Wolters Kluwer Health - Lippincott Williams & Wilkins: Publishing royalties, financial or material support Wright Medical Technology, Inc.: Research support The remaining authors (TJW, SMP, KAM, SC) do not have any disclosures. Supplementary Files AdditionalFile.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4301904","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":319235639,"identity":"46bc2436-3457-4ae2-8324-f1e2196896f2","order_by":0,"name":"Taylor J. Willenbring","email":"","orcid":"","institution":"SUNY Upstate Medical University","correspondingAuthor":false,"prefix":"","firstName":"Taylor","middleName":"J.","lastName":"Willenbring","suffix":""},{"id":319235640,"identity":"1416925e-e04a-4222-968c-1bf83dda9b39","order_by":1,"name":"Sarah M. Papa","email":"","orcid":"","institution":"SUNY Upstate Medical University","correspondingAuthor":false,"prefix":"","firstName":"Sarah","middleName":"M.","lastName":"Papa","suffix":""},{"id":319235641,"identity":"6e2b74fb-65e2-4dd0-b337-591cdb8fb384","order_by":2,"name":"Kenneth A. Mann","email":"","orcid":"","institution":"SUNY Upstate Medical University","correspondingAuthor":false,"prefix":"","firstName":"Kenneth","middleName":"A.","lastName":"Mann","suffix":""},{"id":319235642,"identity":"ebdbcc6d-9dd8-4a04-957a-0dcda3848545","order_by":3,"name":"Salvatore Cavallaro","email":"","orcid":"","institution":"SUNY Upstate Medical University","correspondingAuthor":false,"prefix":"","firstName":"Salvatore","middleName":"","lastName":"Cavallaro","suffix":""},{"id":319235646,"identity":"3ce9c3a5-766a-40d9-8b85-05611de2d899","order_by":4,"name":"Timothy A. Damron","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA2ElEQVRIiWNgGAWjYDCCAxBKDkkogSgtBsZAgrEBoppILYkNRGvhO3467cPHHX/SN9xuf/7g5w8bBn72HAO8WiTP5G6eOfOMQe6GO2cMG3sS0hgke97g12JwIHczM28bUMuNHMYGnoTDDAY3CNhicP4tWEu6wY30h41/Ev4z2BPUcgNiS4LBjQTDZp6EAwwGEoT8cuPtZsaZbcaGM4F+mS2TlswjceZZAV4tfOdzNzN8bJOT57vd/uDjGxs7Of725A14tSCABITiIVI5kpZRMApGwSgYBRgAAErmTYTUPbIVAAAAAElFTkSuQmCC","orcid":"","institution":"SUNY Upstate Medical University","correspondingAuthor":true,"prefix":"","firstName":"Timothy","middleName":"A.","lastName":"Damron","suffix":""}],"badges":[],"createdAt":"2024-04-21 19:09:09","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4301904/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4301904/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":60198400,"identity":"7eb8a372-abcd-4649-aa30-7b954121b151","added_by":"auto","created_at":"2024-07-13 02:12:29","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":996803,"visible":true,"origin":"","legend":"\u003cp\u003eClassification reference sheet provided to survey participants for Lodwick, Modified Lodwick, and Enneking classifications.\u003c/p\u003e","description":"","filename":"Figure1ReferenceSheet.png","url":"https://assets-eu.researchsquare.com/files/rs-4301904/v1/4a96c4bc28b5edba0ea24148.png"},{"id":60198396,"identity":"5d17251a-d723-4a9c-941b-faa59fb1ab14","added_by":"auto","created_at":"2024-07-13 02:12:23","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1105048,"visible":true,"origin":"","legend":"\u003cp\u003eExample case from survey.\u003c/p\u003e","description":"","filename":"Figure2SampleCase.png","url":"https://assets-eu.researchsquare.com/files/rs-4301904/v1/d8938999d53ea46b032140bf.png"},{"id":60198395,"identity":"15424f09-a8a9-4435-92c0-7fa2401f202f","added_by":"auto","created_at":"2024-07-13 02:12:23","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":47498,"visible":true,"origin":"","legend":"\u003cp\u003eOverall reproducibility using Krippendorff’s alpha, categorized by training level (1=Student, 2=PGY-1 … 6=PGY-5, 7=attending).\u003c/p\u003e","description":"","filename":"Figure3TrainingRegression.png","url":"https://assets-eu.researchsquare.com/files/rs-4301904/v1/7781982af3b38b853ccb3260.png"},{"id":65427634,"identity":"0372ea8b-2f0f-4450-8ea5-449d5375216c","added_by":"auto","created_at":"2024-09-27 09:32:28","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2005855,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4301904/v1/f3e16803-8dcb-45f5-a86a-e8ac62890812.pdf"},{"id":60198401,"identity":"97ade8b5-1762-4b01-8028-159883ba3949","added_by":"auto","created_at":"2024-07-13 02:12:32","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":14058,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalFile.docx","url":"https://assets-eu.researchsquare.com/files/rs-4301904/v1/f9458464696c678186151b54.docx"}],"financialInterests":"Competing interest reported. One author (TAD) has received funding from:\n\nBMC Musculoskeletal Disorders: Editorial or governing board\nBone Support (Cerament): Paid consultant\nBoneSupport (Cerament): Research support\nClinical Orthopaedics and Related Research: Editorial or governing board\neMedicine: Editorial or governing board; Publishing royalties, financial or material support\nJournal of Orthopaedic Research: Editorial or governing board\nJournal of Surgical Oncology: Editorial or governing board\nJournal of the American Academy of Orthopaedic Surgeons: Publishing royalties, financial or material support\nMedicina: Editorial or governing board\nOpen Journal of Orthopedics: Editorial or governing board\nOREF: Research support\nPLOS One: Editorial or governing board\nStryker: Research support\nUp To Date: Publishing royalties, financial or material support\nWolters Kluwer Health - Lippincott Williams \u0026 Wilkins: Publishing royalties, financial or material support\nWright Medical Technology, Inc.: Research support\n\nThe remaining authors (TJW, SMP, KAM, SC) do not have any disclosures.","formattedTitle":"Classifications for Radiographic Evaluation of Radiolucent Bone Lesions have Poor Inter- and Intra-observer Agreement","fulltext":[{"header":"BACKGROUND","content":"\u003cp\u003eRadiolucent lesions of bone are encountered within all orthopedic specialties, and therefore, concise methods of describing them are essential for effective communication between practitioners and to inform decision-making regarding further evaluation, need for biopsy, and treatment. Radiographic evaluation of bone lesions is the key first step in assessing tumor aggressiveness, and therefore in making determinations regarding the need for biopsy versus close follow up [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn 1980, Lodwick developed a classification system for grading the aggressiveness of radiolucent bone lesions based on four characteristics: pattern of bone destruction, presence or absence of cortical penetration, presence or absence of a sclerotic rim, and presence or absence of an expanded cortical shell [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Lesions were thus divided into five categories (IA, IB, IC, II, or III) from least to most aggressive [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. In attempts to validate the reproducibility of the Lodwick system, one study demonstrated high agreement between raters for both grades IA (likely benign) and III (likely malignant), but more variable for the remaining grades [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Since its initial publication, efforts have been made to simplify the Lodwick classification system, including using a decision tree approach for ease of application of the system in the clinical setting [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eA year after the publication of the Lodwick classification, the Modified Lodwick-Madewell grading system was proposed to decrease the complexity and increase the reproducibility of its predecessor [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. This system incorporates additional lesion characteristics, including changing margins and occult presentation on radiographs to divide lesions into six categories (IA, IB, II, IIIA, IIIB, IIIC) that are modifications of or additions to the original Lodwick grades [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. When applying these criteria to a cohort of 183 patients, the authors reported a high clinical predictive capacity when compared to biopsy, with 94% of grade I lesions being classified correctly as benign and 81% of grade III lesions being classified correctly as malignant [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. However, the reliability and reproducibility of this grading system have not been assessed.\u003c/p\u003e \u003cp\u003eIn addition to the Lodwick-based classifications, other systems have aimed to simplify the classification of radiolucent bone lesions including the classification system for benign bone lesions created by Enneking [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. This system was formally adopted by the American Joint Committee for Cancer (AJCC) in 1980 with the goal of providing management guidance as well as prognostic value [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. The Enneking classification uses three grades, latent, active, and aggressive, based on the characteristics of the tumor margins on radiographs [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. A study by Drumond in 2010 found a 95.2% agreement between Enneking grading and ultimate treatment course demonstrating its utility for benign bone tumors [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. However, a separate analysis demonstrated only fair interobserver reliability (Fleiss kappa 0.26\u0026ndash;0.38), moderate intra-observer reliability (Cohen kappa 0.48), and low intra-observer agreement (67%) [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Inter- and intra-observer reliability were increased amongst orthopedic tumor surgeons/fellows versus senior orthopedic residents.\u003c/p\u003e \u003cp\u003eAdditional characteristics often considered when assessing radiolucent bone lesions include patient age, symptoms, size and number of lesions, anatomic location, presence of a soft tissue component, presence and type of periosteal reaction, and presence and characteristics of matrix mineralization [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. While these characteristics are not explicitly included in the aforementioned classification systems, they do play a role in ultimate decision making.\u003c/p\u003e \u003cp\u003eWe aimed to study these three classification systems of radiolucent bone lesions which rely heavily, if not entirely, on radiographic imaging. We utilize pre-validated reproducibility statistics to assess the interobserver reliability and intra-rater reproducibility. Given the complexity of the current radiographic classification systems for assessing bone tumors, we hypothesized that (1) intra-observer reproducibility of each of these systems would be good and would improve with training level, (2) interobserver reliability would be poor, and (3) that the Enneking classification would have the highest intra-observer reproducibility.\u003c/p\u003e"},{"header":"METHODS","content":"\u003cp\u003eThis single institution study was deemed institutional review board exempt (IRB 2191652-1). Forty-eight case sets of de-identified radiographs of radiolucent osseous lesions with biopsy-proven (30 case sets) or clinically confirmed (17 case sets) diagnoses were selected from the orthopedic oncology practice of the senior author (TAD). A single case with unknown diagnosis due to non-diagnostic biopsy was also included. These cases were selected to provide a broad array of anticipated margins across the spectrum of the three classification systems, anatomic sites, benign and malignant lesions, and specific diagnoses. Selected lesions included both benign (33 case sets; 16 biopsy-proven, 17 clinically confirmed) and malignant (14 case sets, all biopsy-proven) conditions as well as both primary bone (41 case sets; 24 biopsy-proven, 17 clinically confirmed) and metastatic (6 case sets, all biopsy-proven) processes. The single unknown case was suspected to be a metastatic malignant lesion; however, biopsy was non-diagnostic. Malignant lesions were included to provide appropriate examples of aggressive lesions to assess each classification system's ability to correctly identify aggressive lesions and indicate the need for biopsy. An additional file with description of the diagnoses has been included [See Additional File 1]. Clinically confirmed diagnoses were made based on imaging characteristics and stability through a period of clinical and radiographic follow up. Each set included at least 2 orthogonal views of the lesion from initial presentation without prior biopsy or oncologic treatments. As the aim of this study was to evaluate the classification systems purely on their radiographic features, no information regarding clinical history, context of lesion discovery, presence or absence of pain, past medical history, or physical exam findings were provided.\u003c/p\u003e \u003cp\u003eAfter obtaining informed consent, twenty individuals classified each lesion according to the Lodwick classification [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], the modified Lodwick classification [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], and Enneking classification [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], via an online survey. The twenty participants included one third-year medical student, four PGY1 residents, three PGY2 residents, three PGY3 residents, four PGY4 residents, four PGY5 residents, and one orthopedic oncologist. All participants classified each lesion twice, with a minimum two-week gap between sessions, to assess intra-observer reproducibility. Participants were given a reference sheet explaining each classification system they could reference throughout the survey (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Instructions were provided regarding the ability to zoom in on images to improve viewing of the lesion characteristics and that the survey would take approximately one hour to complete. An example case is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eInterobserver reliability was calculated using both Fleiss kappa and Krippendorff\u0026rsquo;s alpha, treating the classification grades as nominal and ordinal rankings, respectively. Intra-observer reproducibility was also calculated using Fleiss\u0026rsquo; kappa and Krippendorff\u0026rsquo;s alpha. Linear regression models were then used to determine the effect of training level on ability to apply the classification systems consistently. Comparison of intra-rater reproducibility between classifications was performed using paired T-tests with correction for multiple comparison of the respective Krippendorff\u0026rsquo;s alpha values. Statistical analysis was performed using SPSS version 29 (Armonk, NY) with associated plugin for Krippendorff\u0026rsquo;s alpha calculations [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. JMP Pro version 17 (Cary, NC) was utilized for the remainder of analyses.\u003c/p\u003e \u003cp\u003eContingency tables were calculated to assess the accuracy of each classification system to identify lesions as benign or malignant when compared to their known diagnoses. This was first performed using only lesions with biopsy-proven diagnoses followed by an analysis with all lesions, including those with clinical diagnoses. Within the Lodwick and Modified Lodwick classifications, lesions given grades 1 and 2 (and their sub-types) were considered benign while those with grade 3 were considered to have high suspicion for malignancy. Likewise, the first two grades of the Enneking classification were considered benign. The positive and negative predictive values for the extreme grades (Grade 1, most likely to be benign and Grade 3, most likely to be malignant) in each classification were also calculated.\u003c/p\u003e"},{"header":"RESULTS","content":"\u003cp\u003eInterobserver reliability was poor for all three classifications as demonstrated by agreement of only 39% (κ\u0026thinsp;=\u0026thinsp;0.23), 39% (κ\u0026thinsp;=\u0026thinsp;0.25), and 53% (κ\u0026thinsp;=\u0026thinsp;0.28) for the Lodwick, modified Lodwick, and Enneking classifications, respectively (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAgreement kappa and alpha values for each classification.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClassification\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003eInter-rater Reliability\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eIntra-rater Reproducibility\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eKappa\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eAlpha\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003eKappa\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cem\u003eAlpha\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLodwick\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eModified Lodwick\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eEnneking\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.63\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003e\u003c/h3\u003e\n\u003cp\u003eThe three classifications only achieved \u0026ldquo;fair agreement\u0026rdquo; and \u0026ldquo;minimal agreement\u0026rdquo; levels for reliability according to kappa interpretations by Landis (k\u0026thinsp;\u0026gt;\u0026thinsp;0.41) [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] and McHugh (k\u0026thinsp;\u0026gt;\u0026thinsp;0.60) [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], respectively. When the classifications were treated as ordinal data, there was no improvement in the overall agreement (α\u0026thinsp;=\u0026thinsp;0.54, 0.48, 0.45, respectively). Traditionally, ordinal data is considered reliable when Krippendorff\u0026rsquo;s α\u0026thinsp;\u0026gt;\u0026thinsp;0.8. In higher risk scenarios, acceptability cutoffs become more stringent, requiring higher α values. Systems are considered grossly inconsistent with α\u0026thinsp;\u0026lt;\u0026thinsp;0.676. When assessing the reliability between raters in relation to the classification grades, the highest and lowest grades for the Lodwick and Enneking classification demonstrated the highest reliability. This pattern was present in the modified Lodwick classification, except for the IIIC grade which demonstrated a low reliability (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e)\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAgreement kappa values for each grade within the respective classification demonstrating the most inconsistency for intermediate grades.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGrade\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLodwick\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eModified Lodwick\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eEnneking\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003e\u003cem\u003eKappa\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003e1(a)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.3999\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.4067\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.3504\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003e1b\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.284\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.2882\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003e1c\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.1232\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003e2\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.0955\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.064\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.1226\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003e3(a)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.2879\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.0843\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.3827\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003e3b\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.3891\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003e3c\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.0591\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eIntra-observer reproducibility achieved only weak agreement according to McHugh [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] and moderate agreement according to Landis [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e], although kappa values were improved (κ\u0026thinsp;=\u0026thinsp;0.42\u0026ndash;0.45) relative to the interobserver measures (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAgreement kappa and alpha values for each classification.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClassification\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003eInter-rater Reliability\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eIntra-rater Reproducibility\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eKappa\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eAlpha\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003eKappa\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cem\u003eAlpha\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLodwick\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eModified Lodwick\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eEnneking\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.63\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTraining level had no effect on the ability to reproducibly classify lesions using any of the three classifications (R\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.2, p\u0026thinsp;\u0026gt;\u0026thinsp;0.05 for all classifications, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Comparison of classification systems with respect to intra-observer reproducibility resulted with Krippendorff\u0026rsquo;s alpha for the Lodwick (α\u0026thinsp;=\u0026thinsp;0.72), modified Lodwick (α\u0026thinsp;=\u0026thinsp;0.69), and Enneking classification (α\u0026thinsp;=\u0026thinsp;0.63). Self-agreement for individuals was quite variable, ranging from 39\u0026ndash;78%.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe overall ability to distinguish benign (grade 1 or 2) versus malignant (grade 3) lesions using the Enneking classification was low with a sensitivity of 0.662 and a specificity of 0.692 for the first classification attempt (Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e2 x 2 Contingency Tables for all scorers compared to clinical diagnosis using Enneking Classification. Method of diagnosis confirmation included biopsy, clinical, and imaging findings. \u003cb\u003eFirst Scoring Session\u003c/b\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEnneking Classification\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMalignant\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eBenign\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1 or 2 score\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e115\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e429\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3 score\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e225\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e191\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAccuracy: 0.681, Sensitivity: 0.662, Specificity 0.692.\u003c/p\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eSecond Scoring Session\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEnneking Classification\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMalignant\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eBenign\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1 or 2 score\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e121\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e455\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3 score\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e219\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e165\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAccuracy: 0.702, Sensitivity: 0.644, Specificity 0.732.\u003c/p\u003e \u003cp\u003eResults were similar during the second classification attempt with a sensitivity of 0.644 and a specificity of 0.732. The agreement between classification and biopsy proven diagnosis was highest for grade 1 lesions, with 76.3%, 80.9%, and 77.2% of grade 1 lesions correctly classified as benign using the Lodwick, modified Lodwick, and Enneking classification, respectively. Agreement was much more variable for grade 3 lesions, with 73.3%, 59.0%, and 62.0% correctly classified as malignant using the Lodwick, modified Lodwick and Enneking classification, respectively (Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eScoring system predictions for correctly identifying benign and malignant lesions. Only cases with biopsy results reported were used here (N\u0026thinsp;=\u0026thinsp;640 observations). Mean percent is shown with mean scores from the first and second scoring sessions in parentheses.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eScoring System\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGrade 1 lesions (% benign)\u003c/p\u003e \u003cp\u003eNegative Predictive Value (NPV)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGrade 3 lesions (% malignant)\u003c/p\u003e \u003cp\u003ePositive Predictive Value (PPV)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLodwick\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e76.3 (76.1, 76.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e73.3 (74.3, 72.3)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModified Lodwick\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e80.9 (80.1, 81.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e59.0 (59.4, 58.6)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEnneking\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e77.2 (76.0, 78.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e62.0 (61.7, 62.2)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eWhen including lesions with both biopsy and clinically proven diagnosis, negative predictive value in classifying benign lesions was improved, with 80.2%, 83.6%, and 81.9% of grade 1 lesions correctly classified as benign using the Lodwick, modified Lodwick, and Enneking classification, respectively. However, positive predictive value in classifying malignant lesions decreased, with 67.4%, 51.1%, and 55.6% of grade 3 lesions correctly classified as malignant using the Lodwick, modified Lodwick, and Enneking classification, respectively (Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eScoring system predictions for correctly identifying benign and malignant lesions. Lesions with both biopsy and clinically proven diagnosis included (N\u0026thinsp;=\u0026thinsp;960 observations). Mean percent is shown with mean scores from the first and second scoring sessions in parentheses.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eScoring System\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGrade 1 lesions (% benign)\u003c/p\u003e \u003cp\u003eNegative Predictive Value (NPV)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGrade 3 lesions (% malignant)\u003c/p\u003e \u003cp\u003ePositive Predictive Value (PPV)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLodwick\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e80.2 (80.5, 80.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e67.4 (69.0, 65.9)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModified Lodwick\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e83.6 (82.8, 84.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e51.1 (51.2, 50.9)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEnneking\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e81.9 (80.1, 83.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e55.6 (54.1, 57.0)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eOur data demonstrates that three commonly used classifications for osseous radiolucent lesions on radiographs were not reliable nor reproducible using our specific study set of cases in a group of 20 predominately resident participants. This is consistent with previous data regarding the Enneking classification demonstrating only fair interobserver reliability and moderate intra-observer reproducibility [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. The reliability of classification was highly variable depending on the individual lesions\u0026rsquo; characteristics. Consistency was highest for both the highest and lowest grades of the Lodwick and Enneking classifications, with greater variability for intermediate grades. This was also true for the modified Lodwick classification system, except the IIIC grade, possibly due to the paucity of IIIC lesions included in the case set. Overall sensitivity and specificity for distinguishing benign versus malignant lesions was low for all classification systems. However, the ability to correctly classify lesions as benign versus malignant was highest for grade 1 lesions across the classification systems. The ability to correctly classify grade 3 lesions as malignant was much more variable and was highest for the Lodwick classification system. This is partially consistent with previous studies demonstrating that accuracy for application of both the Lodwick and modified Lodwick classifications is high for grade IA (likely benign) and grade III (likely malignant) lesions but becomes more variable for lesions not at either extreme of these classification systems [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. However, our study demonstrated much lower accuracy in applying the classification systems to grade 3 lesions. This suggests that these classifications may be useful for certain lesions but are unable to be reliably applied broadly across a wide array of lesion types. Interestingly, there was no association between orthopedic experience and intra-observer reproducibility, particularly within the 5-year span of orthopedic residency training, supporting the notion that the descriptions themselves are poorly applied to some lesions.\u003c/p\u003e \u003cp\u003eContingency calculations of classification results compared to known diagnoses demonstrated overall poor ability to distinguish benign from malignant lesions. However, when considering only the lowest Grade 1 classes, there were more acceptable negative predictive values ranging from 80.2\u0026ndash;83.6%. While the overall sensitivity and consistency of these classifications may be limited, Grade 1 classifications demonstrate a reasonable predictiveness.\u003c/p\u003e \u003cp\u003eThe deficiencies noted in this study may be improved with artificial intelligence (AI) applications in the future, and the results emphasize the need for better techniques. A recent study by Park et al. found that an AI system trained to classify radiographic proximal femoral lesions as benign, malignant, or no tumor demonstrated higher accuracy compared to four physicians collectively (two general orthopedic surgeons and two orthopedic oncologists) as well as individually [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Similarly, He et al. describe high accuracy of AI in classifying radiographic primary bone tumors as benign vs not-benign, malignant vs not-malignant, and benign vs intermediate vs malignant. The accuracy of their AI system was equivalent to that of two radiologists with musculoskeletal subspecialty training, and superior to that of 3 junior radiologists [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. As this field continues to expand, it is likely that these systems will become increasingly accurate, and thus may become superior to current classification systems.\u003c/p\u003e \u003cp\u003eThis study does have some limitations. One is the small number of case sets included in the study. Although smaller than some previous studies with 233 [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] or 183 [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] cases, it is similar to the 65-case study size of Alpuerto et al [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. The 48 cases included in the current study also represent a wide breadth of tumor diagnoses and radiographic appearances. This study is also limited by the narrow breadth of orthopedic experience evaluated, with most coming from the five-year span of orthopedic residency training. However, the focus of our study was to evaluate the reliability of those initially encountering and describing these lesions, namely orthopedic residents soon to become community surgeons. The inclusion of one medical student and one fellowship-trained orthopedic oncologist was meant to serve as outlying anchors on the extremes of the experience spectrum. Malignant lesions were also included in our case set in addition to benign lesions. While the Lodwick and modified Lodwick classification systems have been designed for use with both benign and malignant lesions [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], the Enneking classification for benign tumors was not designed for use with malignant lesions [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. The inclusion of malignant lesions was deemed appropriate to provide examples of aggressive lesions to assess the Enneking classification\u0026rsquo;s ability to correctly identify aggressive lesions and inform the need for biopsy. Of the included cases, not all were biopsy-proven diagnoses. However, those not histologically assessed were diagnosed clinically using imaging characteristics and stability through clinical and radiographic follow-up. Finally, this study was limited to participants in the field of orthopedic surgery, and did not include radiologists or radiology trainees, who also often encounter radiolucent bone lesions, sometimes as the initial provider discovering the lesion. Inclusion of this group in future studies would allow investigation of the degree of reliability and reproducibility of these classification systems when applied by a non-orthopedic specialty. This would also yield a comparison of the metrics between orthopedic surgeons and radiologists.\u003c/p\u003e"},{"header":"CONCLUSIONS","content":"\u003cp\u003eThis study demonstrates the poor interobserver reliability and intra-observer reproducibility of three classifications used to radiographically describe radiolucent lesions of bone. As they currently stand, there is no reliable method to classify these lesions and communicate about them effectively. Although AI protocols may improve reliability, further work needs to be done utilizing machine learning to train with larger data sets.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cdiv class=\"DefinitionList\"\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eAI\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eArtificial intelligence\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eAJCC\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eAmerican Joint Committee for Cancer\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003ePGY\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003ePost graduate year\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003eSPSS\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eStatistical Package for the Social Sciences\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e: This study was not subject to IRB approval. Participants were limited to voluntary orthopedic trainees and faculty. All included images were deidentified.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e: Not applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e: The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests:\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eOne author (TAD) has received funding from:\u003c/p\u003e\n\u003cp\u003eBMC Musculoskeletal Disorders: Editorial or governing board\u003cbr\u003e\u0026nbsp;Bone Support (Cerament): Paid consultant\u003cbr\u003e\u0026nbsp;BoneSupport (Cerament): Research support\u003cbr\u003e\u0026nbsp;Clinical Orthopaedics and Related Research: Editorial or governing board\u003cbr\u003e\u0026nbsp;eMedicine: Editorial or governing board; Publishing royalties, financial or material support\u003cbr\u003e\u0026nbsp;Journal of Orthopaedic Research: Editorial or governing board\u003cbr\u003e\u0026nbsp;Journal of Surgical Oncology: Editorial or governing board\u003cbr\u003e\u0026nbsp;Journal of the American Academy of Orthopaedic Surgeons: Publishing royalties, financial or material support\u003cbr\u003e\u0026nbsp;Medicina: Editorial or governing board\u003cbr\u003e\u0026nbsp;Open Journal of Orthopedics: Editorial or governing board\u003cbr\u003e\u0026nbsp;OREF: Research support\u003cbr\u003e\u0026nbsp;PLOS One: Editorial or governing board\u003cbr\u003e\u0026nbsp;Stryker: Research support\u003cbr\u003e\u0026nbsp;Up To Date: Publishing royalties, financial or material support\u003cbr\u003e\u0026nbsp;Wolters Kluwer Health - Lippincott Williams \u0026amp; Wilkins: Publishing royalties, financial or material support\u003cbr\u003e\u0026nbsp;Wright Medical Technology, Inc.: Research support\u003c/p\u003e\n\u003cp\u003eThe remaining authors (TJW, SMP, KAM, SC) do not have any disclosures.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e:\u0026nbsp;This study was internally funded.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor\u0026rsquo;s Contributions\u003c/strong\u003e:\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTAD and SC: conception and study design\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTJW and SP: data collection\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTJW, KAM, and TAD: data analysis and interpretation\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTJW and SP: manuscript preparation\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTAD, KAM, and SC: manuscript editing and revision\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAll authors read and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e: Not applicable\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eCostelloe CM, Madewell JE. Radiography in the initial diagnosis of primary bone tumors. \u003cem\u003eAmerican Journal of Roentgenology\u003c/em\u003e. 2013;200(1):3-7. doi:10.2214/AJR.12.8488/ASSET/IMAGES/01_12_8488_14.JPEG \u003c/li\u003e\n\u003cli\u003eLodwick GS, Wilson AJ, Farrell C, Virtama P, Dittrich F. Determining growth rates of focal lesions of bone from radiographs. \u003cem\u003eRadiology\u003c/em\u003e. 1980;134(3):577-583. doi:10.1148/RADIOLOGY.134.3.6928321 \u003c/li\u003e\n\u003cli\u003eLodwick GS, Wilson AJ, Farrell C, Virtama P, Smeltzer FM, Dittrich F. Estimating rate of growth in bone lesions: observer performance and error. \u003cem\u003eRadiology\u003c/em\u003e. 1980;134(3):585-590. doi:10.1148/RADIOLOGY.134.3.6986621 \u003c/li\u003e\n\u003cli\u003eBenndorf M, Bamberg F, Jungmann PM. The Lodwick classification for grading growth rate of lytic bone tumors: a decision tree approach. \u003cem\u003eSkeletal Radiol\u003c/em\u003e. 2022;51(4):737-745. doi:10.1007/S00256-021-03868-8 \u003c/li\u003e\n\u003cli\u003eCaracciolo JT, Temple HT, Letson GD, Kransdorf MJ. A Modified Lodwick-Madewell Grading System for the Evaluation of Lytic Bone Lesions. \u003cem\u003eAJR Am J Roentgenol\u003c/em\u003e. 2016;207(1):150-156. doi:10.2214/AJR.15.14368 \u003c/li\u003e\n\u003cli\u003eEnneking WF. A System of Staging Musculoskeletal Neoplasms. \u003cem\u003eClin Orthop Relat Res\u003c/em\u003e. 1986;(204):9-24. \u003c/li\u003e\n\u003cli\u003eJawad MU, Scully SP. In Brief: Classifications in Brief: Enneking Classification: Benign and Malignant Tumors of the Musculoskeletal System. \u003cem\u003eClin Orthop Relat Res\u003c/em\u003e. 2010;468(7):2000. doi:10.1007/S11999-010-1315-7 \u003c/li\u003e\n\u003cli\u003eDrumond JMN. EFFICACY OF THE ENNEKING STAGING SYSTEM IN RELATION TO TREATING BENIGN BONE TUMORS AND TUMOR-LIKE BONE LESIONS. \u003cem\u003eRev Bras Ortop (Sao Paulo)\u003c/em\u003e. 2010;45(1):46. doi:10.1016/S2255-4971(15)30216-0\u003c/li\u003e\n\u003cli\u003eAlpuerto BB, Wang EHM. Interobserver and Intra-observer Reliability of the Enneking Classification in Plain Radiographic Staging of Benign Bone Tumors of the Extremities in Patients Seen at the Philippine General Hospital. \u003cem\u003eActa Med Philipp\u003c/em\u003e. 2021;55(3):341-348. doi:10.47895/AMP.VI0.1750 \u003c/li\u003e\n\u003cli\u003eMiller TT. Bone tumors and tumorlike conditions: analysis with conventional radiography. \u003cem\u003eRadiology\u003c/em\u003e. 2008;246(3):662-674. doi:10.1148/RADIOL.2463061038\u003c/li\u003e\n\u003cli\u003eHayes AF, Krippendorff K. Answering the Call for a Standard Reliability Measure for Coding Data. \u003cem\u003eCommun Methods Meas\u003c/em\u003e. 2007;1(1):77-89. doi:10.1080/19312450709336664 \u003c/li\u003e\n\u003cli\u003eLandis JR, Koch GG. The Measurement of Observer Agreement for Categorical Data. \u003cem\u003eBiometrics\u003c/em\u003e. 1977;33(1):159. doi:10.2307/2529310 \u003c/li\u003e\n\u003cli\u003eMcHugh ML. Interrater reliability: the kappa statistic. \u003cem\u003eBiochem Med (Zagreb)\u003c/em\u003e. 2012;22(3):276. doi:10.11613/bm.2012.031 \u003c/li\u003e\n\u003cli\u003ePark CW, Oh SJ, Kim KS, et al. Artificial intelligence-based classification of bone tumors in the proximal femur on plain radiographs: System development and validation. \u003cem\u003ePLoS One\u003c/em\u003e. 2022;17(2). doi:10.1371/JOURNAL.PONE.0264140 \u003c/li\u003e\n\u003cli\u003eHe Y, Pan I, Bao B, et al. Deep learning-based classification of primary bone tumors on radiographs: A preliminary study. \u003cem\u003eEBioMedicine\u003c/em\u003e. 2020;62. doi:10.1016/J.EBIOM.2020.103121 \u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"orthopedic oncology, radiolucent bone lesions, intra-observer reproducibility, interobserver reliability, Lodwick, modified Lodwick, Enneking","lastPublishedDoi":"10.21203/rs.3.rs-4301904/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4301904/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eRadiolucent bone lesions are encountered in all orthopedic specialties, and concise description is essential to inform evaluation and treatment. We studied the interobserver reliability and intra-observer reproducibility of three classification systems of radiographic radiolucent lesions: (1) original Lodwick classification, (2) modified Lodwick classification, and (3) Enneking classification for benign tumors. We hypothesized that intra-observer reproducibility would be good but interobserver reliability would be poor, improving with training level, and highest for the Enneking classification.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eForty-eight case sets of de-identified radiographs of radiolucent osseous lesions were selected from an orthopedic oncology practice. Each set included two orthogonal views of the lesion from initial presentation. Twenty participants (one third-year medical student, 18 residents, one orthopedic oncologist) classified each case twice, with a minimum two-week gap between sessions, according to the Lodwick classification, modified Lodwick classification, and Enneking classification. Interobserver reliability and intra-observer reproducibility were calculated using Fleiss\u0026rsquo; kappa and Krippendorff\u0026rsquo;s alpha, treating the classifications as nominal and ordinal rankings, respectively. Linear regression models were used to determine the effect of training level on reproducibility. Contingency tables were used to assess the accuracy of correctly identifying benign versus malignant lesions against their known diagnoses.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eInterobserver reliability was poor, as demonstrated by agreement of 39% (κ\u0026thinsp;=\u0026thinsp;0.23; α\u0026thinsp;=\u0026thinsp;0.54), 39% (κ\u0026thinsp;=\u0026thinsp;0.25; α\u0026thinsp;=\u0026thinsp;0.48), and 53% (κ\u0026thinsp;=\u0026thinsp;0.28; α\u0026thinsp;=\u0026thinsp;0.45) for the Lodwick, modified Lodwick, and Enneking classifications, respectively. Intra-observer reproducibility also lacked strong agreement (κ\u0026thinsp;=\u0026thinsp;0.42\u0026ndash;0.45). Training level had no effect on reproducibility (R\u003csup\u003e2\u003c/sup\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.2, p\u0026thinsp;\u0026gt;\u0026thinsp;0.05 for all classifications). Comparison of intra-observer reproducibility showed Krippendorff\u0026rsquo;s alpha for the Lodwick (α\u0026thinsp;=\u0026thinsp;0.72), modified Lodwick (α\u0026thinsp;=\u0026thinsp;0.69), and Enneking classification (α\u0026thinsp;=\u0026thinsp;0.63). Self-agreement for individuals ranged from 39\u0026ndash;78%. Lesions were correctly classified as malignant for 73.3%, 59.0%, and 62% of cases for the three classification systems, respectively.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eOur data demonstrate that three common classifications for osseous radiolucent lesions are neither reliable nor reproducible. Consistency of classification varied depending on lesion characteristics, with the strongest reproducibility demonstrated for the highest and lowest grades of the classification systems. There was no association between orthopedic experience and intra-observer reproducibility. These deficiencies may be improved with AI applications.\u003c/p\u003e","manuscriptTitle":"Classifications for Radiographic Evaluation of Radiolucent Bone Lesions have Poor Inter- and Intra-observer Agreement","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-07-13 02:12:06","doi":"10.21203/rs.3.rs-4301904/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e0043b11-fca1-44ae-b279-a1489993a6de","owner":[],"postedDate":"July 13th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-10-22T10:53:36+00:00","versionOfRecord":[],"versionCreatedAt":"2024-07-13 02:12:06","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4301904","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4301904","identity":"rs-4301904","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00