Semantic Entropy in Digital Health: When Ontological Mapping Reaches Its Computational Limits

doi:10.21203/rs.3.rs-9441335/v1

Semantic Entropy in Digital Health: When Ontological Mapping Reaches Its Computational Limits

2026 · doi:10.21203/rs.3.rs-9441335/v1

preprint OA: closed

Full text JSON View at publisher

Full text 156,766 characters · extracted from preprint-html · click to expand

Semantic Entropy in Digital Health: When Ontological Mapping Reaches Its Computational Limits | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Semantic Entropy in Digital Health: When Ontological Mapping Reaches Its Computational Limits Florian Odi Stummer, Franz Leisch This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9441335/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Health data interoperability depends on shared ontologies, principally SNOMED CT, ICD-11, and the UMLS Metathesaurus, to provide a common semantic foundation for clinical information exchange. These systems were designed for a world in which new clinical concepts emerged slowly and could be curated through manual or semi-automated processes. The rapid proliferation of artificial intelligence and machine learning terminology in clinical practice may violate this assumption. This paper presents a structural analysis of three converging limits in existing health ontology infrastructure: (1) an expressiveness ceiling imposed by SNOMED CT's description logic (EL++), which cannot represent probabilistic, conditional, or provenance-bearing semantics characteristic of AI clinical tools; (2) a curation scalability boundary, evidenced by the UMLS Metathesaurus at 214 source vocabularies and 15.5 million atoms, where even 90% alignment precision produces thousands of false positives; and (3) cascade fragility in inter-terminology mappings, where changes in one knowledge organisation system propagate unpredictably through dependent systems. Using a shared dataset of 1,194 terminology instances (70 unique terms) from Austrian and Swedish primary care, we identify a dual gap: 78.6% of terms are used without explicit definition in the source documents (a practice gap), while a separate ontological coverage analysis of the 44 eligible English-language health technology terms finds that 50.0% (n=22) are absent from all three reference ontologies (SNOMED CT, ICD-11, MeSH), with only 31.8% having an exact match in at least one system. Direct browser verification of eight AI-specific clinical workflow terms confirms that none have formal representation in SNOMED CT or ICD-11. We propose the concept of "semantic entropy" as a directional measure of definitional disorder in terminology systems approaching their governance capacity. We argue that Cimino's "graceful evolution" desideratum, articulated in 1998, remains unmet after 28 years, and that the AI terminology explosion may render it unachievable within current ontology paradigms, absent a qualitative shift in ontology design that accommodates rapid, AI-driven concept emergence. Medical Informatics Artificial Intelligence and Machine Learning semantic interoperability health ontology SNOMED CT ICD-11 UMLS EL + + description logic AI clinical terminology semantic entropy terminology governance ontological coverage Figures Figure 1 1. Introduction The promise of digital health rests on a deceptively simple assumption: that clinical information recorded in one system can be meaningfully interpreted in another. This assumption, known as semantic interoperability, depends on shared terminological foundations. Reference ontologies such as SNOMED CT, ICD-11, and the UMLS Metathesaurus provide the common vocabulary through which diagnoses, procedures, and clinical observations are exchanged across institutional, national, and international boundaries [1, 2]. For established clinical domains, this infrastructure functions adequately. Diagnoses, procedures, and medications have been encoded, curated, and mapped over decades of sustained standards development. The system, though imperfect, is mature enough to support electronic health record interoperability, clinical decision support, and population health surveillance for traditional clinical concepts [3]. The emergence of artificial intelligence and machine learning in clinical practice introduces a category of concepts for which this infrastructure was not designed. Terms such as "AI-assisted diagnosis," "algorithmic triage score," "large language model consultation," and "ambient clinical documentation" describe clinical activities that are already occurring in practice but that have no formal representation in any reference ontology [4]. This is not merely a gap waiting to be filled. It reflects a structural mismatch between the pace of technological terminology generation and the capacity of ontology governance systems to absorb new concepts. This paper argues that the mismatch is not a resourcing problem amenable to additional curation staff or faster release cycles. Rather, it arises from three design-level constraints that converge to create a structural ceiling on ontological absorption capacity: (a) an expressiveness ceiling in the description logic underpinning SNOMED CT, which cannot represent the semantic relationships characteristic of AI clinical tools; (b) a scalability boundary in the semi-automated alignment methods used to maintain the UMLS Metathesaurus, where precision degrades at the scale of 214 source vocabularies; and (c) cascade fragility in inter-terminology mappings, where changes in one system propagate through dependent systems in ways that cannot be predicted from the change operation alone. Research question: At what point does the rate of new health technology terminology exceed the structural capacity of existing ontology infrastructure, and what are the consequences for semantic interoperability? Positioning against prior work. Several studies have examined aspects of this problem in isolation. Amar, April, and Abran [5] conducted a systematic mapping review of FHIR semantic interoperability approaches, identifying six categories of semantic mapping strategies across 70 studies, but their analysis focused on implementation challenges rather than structural limits of the underlying ontologies. Da Silveira, Dos Reis, and Pruski [6] reviewed the management of dynamic biomedical terminologies and identified open challenges in change propagation, but found no formal mathematical model of terminology drift rate. Cimino [7, 8] articulated the foundational desiderata for controlled medical vocabularies, including "graceful evolution," but did not assess whether these desiderata remain achievable at current ontology scale. This paper's contribution lies in connecting the structural limits of existing ontology systems (EL++ expressiveness, UMLS scalability, cascade fragility) with the AI terminology explosion to argue that the gap between implemented vocabulary and reference ontology is widening, not closing. Scope. This is a structural analysis of health ontology infrastructure limits, using EU/DACH empirical data and direct terminology browser verification. The public health implications of terminology proliferation are addressed in a companion paper (Stummer 2026, forthcoming in Health Informatics Journal); the economic and market-structure dimensions are addressed in a separate analysis (Stummer 2026, forthcoming in Health Policy). This paper is the second of three pre-studies in a terminology governance series examining the same empirical phenomenon through complementary disciplinary lenses. Pre-Study 1 addresses the public health dimension ("terminology tipping point"), the present analysis examines ontological and computational limits ("semantic entropy"), and Pre-Study 3 analyses market failure mechanisms ("terminology tragedy of the commons"). A synthesis capstone integrating all three perspectives is forthcoming. 2. Background: The Architecture of Health Terminology Systems 2.1 SNOMED CT and Description Logic SNOMED CT is the most comprehensive clinical terminology system in use, containing approximately 350,000 active concepts in its International Edition [ 9 ]. Its formal foundation rests on EL++, a tractable fragment of the OWL 2 description logic family, chosen specifically because it permits polynomial-time reasoning [ 10 ]. This design choice represents a deliberate trade-off: tractability is purchased at the cost of expressiveness. EL + + supports concept conjunction, existential restriction, role hierarchy, and domain and range constraints. It does not support negation, disjunction, universal restriction, or cardinality constraints [ 10 ]. For traditional clinical concepts (diseases, anatomical structures, procedures, substances), this expressiveness is largely sufficient. The inability to represent negation or disjunction rarely imposes a practical barrier when encoding "type 2 diabetes mellitus" or "laparoscopic cholecystectomy." SNOMED CT releases biannually for the International Edition, with national extensions released on varying schedules. The Content Request System (CRS) accepts proposals for new concepts, but no published study reports the backlog size, average processing time, or rejection rate for new term requests [ 9 ]. 2.2 ICD-11 and the Proposal Platform ICD-11, the most recent revision of the International Classification of Diseases, introduced a Foundation layer and linearisation architecture designed to support multiple use cases from a single underlying knowledge base [ 11 ]. The WHO maintains an open feedback proposal platform through which the global health community can suggest revisions. Ibrahim et al. [ 12 ] reported that over 15,000 proposals were submitted between July 2014 and July 2024, originating from 72 countries and processed through a multi-step WHO committee review. Critically, the study does not report acceptance rates, average time-to-incorporation, or rejection reasons, leaving the actual throughput of the governance mechanism unmeasured. ICD-11 classifies diseases and health conditions. By scope, it does not represent interventions, technologies, or the mode by which a clinical assessment was generated. Any representation of AI assistance in the diagnostic process must therefore reside outside ICD-11 proper [ 13 ]. This separation complicates using ICD alone as a semantic backbone for AI-mediated care but does not preclude external provenance models that supplement ICD classification. 2.3 UMLS Metathesaurus The UMLS Metathesaurus integrates 214 source vocabularies into a single, cross-mapped knowledge resource containing approximately 15.5 million atoms [ 14 ]. Its construction is semi-automated: lexical and semantic matching algorithms propose alignments, which human curators then review. Nguyen, Yip, and Bodenreider [ 14 ] demonstrated that a deep learning approach to biomedical vocabulary alignment achieved F1 scores of 89 to 95%, outperforming rule-based methods by 14.1 percentage points. However, even at 90% precision, the scale of the Metathesaurus means that thousands of false positive alignments are generated per release cycle, each requiring human adjudication. The authors characterise UMLS Metathesaurus construction as "costly, time-consuming, and error-prone" [ 14 ]. 2.4 HL7 FHIR and Semantic Binding HL7 FHIR (Fast Healthcare Interoperability Resources) provides a resource-based standard for health data exchange, focused primarily on syntactic interoperability [ 1 ]. Semantic interoperability requires that FHIR resources are bound to controlled terminologies such as SNOMED CT, ICD, and LOINC. Amar et al. [ 5 ] found that across 70 studies, semantic approaches in FHIR implementations fell into six categories: mapping (24.6%), terminology services (14.3%), RDF/OWL (19.0%), annotation (14.3%), ML/NLP (15.9%), and ontology-based methods (11.9%). The review concluded that techniques to automate annotation and ontology comparison are urgently needed because human curation cannot sustain the pace of FHIR implementation guide proliferation. Domain-specific FHIR implementations further multiply the terminology surface area. Mantri et al. [ 15 ] reported that a single oncology screening implementation in India required 25 custom FHIR profiles and 50 standardised terminology value sets, illustrating how each new clinical domain creates additional mapping and maintenance burden. 2.5 Cimino's Desiderata In 1998, Cimino [ 7 ] articulated twelve desiderata for controlled medical vocabularies, establishing a normative framework that has guided terminology development for nearly three decades. Among these, "graceful evolution" stipulates that a terminology system must accommodate change without invalidating existing content, codes, or mappings. Eight years later, Cimino [ 8 ] defended and extended the desiderata, arguing that concepts and universals must coexist in controlled terminologies and that the original requirements needed expansion to address purpose, not merely structure. The need for this defence paper itself suggests that the field had not converged on fundamental design principles. That several desiderata, notably graceful evolution and recognised redundancy, remain unmet after 28 years is evidence of the problem's persistent intractability [ 7 , 8 ]. 2.6 Positioning Against Prior Work The three bodies of work closest to the present analysis are Amar et al. [ 5 ], Da Silveira et al. [ 6 ], and Cimino [ 7 , 8 ]. Each addresses a facet of the terminology governance challenge, but none combines the elements that constitute this paper's specific contribution. Da Silveira et al. [ 6 ] identified change-propagation challenges in dynamic biomedical terminologies but proposed no measure of drift rate or disorder severity; their analysis remained descriptive rather than structural. Cimino [ 7 , 8 ] articulated governance desiderata that remain the normative benchmark for the field but did not assess their achievability at current AI-driven scale or under the pressures of a terminology category that requires richer semantics than the existing description logic can express. Amar et al. [ 5 ] catalogued FHIR semantic strategies across 70 studies but treated the underlying ontology infrastructure as given, without examining the structural limits that constrain what those strategies can achieve. This paper's contribution is the combination of three elements: (1) structural limits analysis at the level of origin (expressiveness ceiling in the description logic, curation scalability boundary in alignment methods, cascade fragility in mapping graph topology), (2) empirical AI terminology coverage gap measurement using direct browser verification against reference ontologies, and (3) the "semantic entropy" concept as a named, directional measure of ontological governance failure. No prior work has unified these three elements. 3. Methods 3.1 Data The empirical basis for this analysis is a dataset of 1,194 terminology instances across 70 unique terms extracted from Austrian and Swedish primary care documentation: 24 stakeholder interviews and theoretical memos (209,239 characters) from Swedish telemedicine implementation research, and 32 academic and grey literature sources on Austrian telehealth terminology (59,302 characters). The dataset covers telemedicine adoption (2020-2023) and was extended to AI terminology (2023-2025) using parallel extraction methods. Of the 70 unique terms identified, 78.6% (n=55) were used without any explicit definition in the source documents. This figure measures source-document definitional practice, not ontological coverage, and should be interpreted as evidence of how researchers and practitioners deploy terminology without anchoring it to formal definitions. A separate ontological coverage analysis of all 44 eligible English-language health technology terms (excluding language variants, commercial platform names, and theoretical constructs) found that 50.0% (n=22) are absent from all three reference ontologies (SNOMED CT, ICD-11, MeSH), with only 31.8% (n=14) having an exact match in at least one system (Supplementary Table in Stummer 2026, forthcoming in Health Informatics Journal). Both figures should be interpreted as order-of-magnitude evidence of the scale of terminology fragmentation, not as precision estimates. Full dataset description in Stummer (2026, forthcoming in Health Informatics Journal); the present paper analyses this dataset through the lens of ontological coverage and semantic interoperability. Additionally, direct browser verification was performed against the SNOMED CT International Edition (January 2026 release) [4] and the ICD-11 Coding Tool (2025 Update, February 2025) [13] for a targeted set of AI-specific clinical workflow terms. 3.2 Analytical Framework The analysis assesses three structural limits of existing ontology infrastructure: (a) Expressiveness ceiling. The formal specification of EL++ [10] was examined to determine which semantic constructs are and are not supported. AI-specific clinical concepts were then characterised by their semantic requirements (probabilistic outputs, conditional recommendations, model provenance) and assessed for representability within EL++. (b) Curation scalability. Published evidence on alignment precision at UMLS scale [14], ontology matching benchmark performance [16], and ICD-11 proposal throughput [12] was synthesised to assess the relationship between vocabulary count, concept count, and curation feasibility. (c) Cascade fragility. Evidence from the DyKOSMap framework [17, 18] on inter-terminology mapping change propagation was used to assess the structural fragility introduced by each additional vocabulary integrated into a cross-mapped system. 3.3 Coverage Gap Analysis Direct verification was performed for the following AI-specific clinical workflow terms in both the SNOMED CT browser and ICD-11 Coding Tool: "AI-assisted diagnosis," "algorithmic triage," "clinical decision support output," "machine learning prediction," "large language model consultation," "automated clinical documentation," "AI-generated differential diagnosis," and "ambient clinical intelligence." Search results were recorded as present (exact or near-exact match), partial (related but semantically distinct concept exists), or absent (no relevant concept). 3.4 Threats to Validity Five threats to the validity of this analysis are identified here, before results are presented, to enable readers to calibrate their interpretation of the findings. T1. Expressiveness analysis is theoretical. No empirical study has attempted to encode AI clinical concepts in EL++ and measured failure rates. The assessment is based on formal properties of the description logic, not implementation testing. It is possible that practical workarounds (post-coordination, extension mechanisms) could accommodate some AI concepts within EL++ constraints. T2. Scalability extrapolation. The evidence for curation scalability limits derives from published benchmarks [14, 16] that may not generalise to operational UMLS curation environments. Real-world degradation curves may differ from those observed in controlled experimental settings. T3. Coverage gap is point-in-time. SNOMED CT and ICD-11 could add AI-specific terms in future releases. This analysis captures the state of affairs as of late 2025 and early 2026. The identified gaps may narrow over time, though the structural arguments about absorption capacity would remain relevant regardless of incremental additions. T4. Dataset scope. The empirical dataset is drawn from Austrian and Swedish primary care. Other clinical domains (radiology, pathology, intensive care) and other geographies may exhibit different terminology dynamics. Primary care was selected because it represents the broadest clinical interface with emerging health technologies, but the findings should not be generalised without domain-specific verification. T5. "Semantic entropy" is proposed, not validated. The concept is introduced here for the first time as a directional framework. It has not been formalised mathematically or validated empirically. Its utility depends on future work to operationalise measurable indicators and establish thresholds. 4. Computational Pre-Study: Semantic Entropy in Synthetic Clinical Data To move the concept of semantic entropy from a purely theoretical proposal toward empirical grounding, a computational pre-study was conducted using the Synthea synthetic patient generator [ 24 ]. This analysis provides a first quantitative estimate of semantic entropy across clinical domains, using Shannon entropy applied to the distribution of SNOMED-CT coded conditions. 4.1 Dataset and Method A Synthea-generated dataset of 11,475 synthetic patients was analysed, containing 309 unique SNOMED-CT coded conditions. Conditions were grouped into clinical domains (oncology, cardiovascular, respiratory, mental health, musculoskeletal, endocrine, and others) based on SNOMED CT hierarchy classification. For each domain, Shannon entropy H was computed over the distribution of condition frequencies: H = -sum(p_i * log2(p_i)) where p_i is the relative frequency of condition i within the domain. Normalised entropy (H / H_max, where H_max = log2(n) for n unique conditions in the domain) was computed to enable cross-domain comparison independent of domain size. 4.2 Results The overall Shannon entropy across the full condition vocabulary was 5.3547 bits, against a maximum possible entropy of 8.2715 bits (log2(309)), yielding a normalised entropy of 0.6474. This indicates moderate overall coding concentration: neither a uniform distribution (which would indicate maximum uncertainty) nor a highly concentrated distribution (which would indicate a small number of dominant codes). Domain-level analysis revealed substantial variation: Oncology exhibited the highest normalised entropy (0.903), indicating near-maximum coding uncertainty within the domain. The large number of distinct cancer types, each with relatively similar prevalence in the synthetic dataset, produces a distribution approaching uniformity. This finding is consistent with the clinical reality that oncology generates the broadest range of diagnostic codes and, correspondingly, the greatest semantic challenge for terminology governance. Mental health domains showed comparatively low entropy, reflecting a smaller number of high-frequency condition codes (e.g., major depressive disorder, generalised anxiety disorder) with more concentrated distributions. - One domain exceeded a normalised entropy of 0.8 (oncology). Three domains fell below a normalised entropy of 0.5, indicating concentrated coding distributions where a small number of conditions dominate. 4.3 Interpretation The entropy analysis provides a first formal quantification of semantic entropy across clinical domains, operationalising the concept proposed in Section 3.2 using information-theoretic measures. The finding that oncology exhibits the highest normalised entropy is consistent with the hypothesis that domains with greater terminological complexity produce greater coding uncertainty, and is relevant to the semantic entropy framework for two reasons. First, it demonstrates that the concept of semantic entropy, while proposed here as a theoretical framework, can be operationalised using standard information-theoretic measures applied to clinical coding data. The Shannon entropy calculation is reproducible, domain-comparable, and interpretable. Second, the domain-level variation suggests that semantic entropy is not a uniform property of a terminology system but varies substantially across clinical subdomains. This has implications for governance: domains with high normalised entropy may require more granular terminology governance than domains with concentrated coding distributions. If AI and digital health terminology were to be incorporated into SNOMED CT, the entropy analysis predicts that it would likely exhibit high normalised entropy (similar to oncology) given the large number of novel, relatively equiprobable concepts. The mean normalised entropy of 0.6474 across all domains suggests that the current SNOMED-CT condition vocabulary operates at approximately two-thirds of its maximum possible coding uncertainty, a level consistent with a system that is functional but not optimally organised for retrieval and comparison. 4.4 Limitations of the Computational Pre-Study Four limitations constrain the interpretation of these findings. First, Synthea generates synthetic data calibrated to U.S. healthcare delivery patterns; European clinical datasets with different disease prevalence profiles and coding practices may yield different entropy distributions. Second, the entropy analysis measures coding distribution, not semantic coherence: a domain could have high entropy (many distinct codes) without any semantic ambiguity if all codes are precisely defined. The analysis captures one dimension of the semantic entropy concept (distributional disorder) but not others (definitional ambiguity, mapping inconsistency). Third, the domain groupings used for the analysis are approximate and were derived from SNOMED CT hierarchy rather than a validated clinical domain taxonomy. Fourth, the analysis is limited to condition-level codes; procedure codes, medication codes, and observation codes would provide a more complete picture of semantic entropy across the full clinical vocabulary. Replication on European clinical datasets, ideally real-world data where ethically and legally permissible, would be required to confirm these findings beyond the synthetic U.S. context. 5. Results 5.1 Coverage Gap Findings The coverage gap analysis yields two complementary findings: a targeted AI-term verification and a broader ontological coverage assessment. AI-specific term verification. Direct browser verification against SNOMED CT (January 2026 International Edition) and ICD-11 (2025 Update, February 2025) revealed a consistent pattern of absence. None of the eight AI-specific clinical workflow terms searched returned an exact or near-exact match in either terminology system. Table 1 AI-Specific Clinical Term Coverage in Reference Ontologies Note: SNOMED CT International Edition, January 2026 release; ICD-11 2025 Update (February 2025); MeSH 2025/2026 edition. Term SNOMED CT ICD-11 MeSH Status AI-assisted diagnosis Absent Absent Partial* No formal definition Algorithmic triage Absent Absent Absent No formal definition Clinical decision support output Partial** Absent Partial* No AI-specific concept Machine learning prediction Absent Absent Partial* No clinical workflow concept LLM consultation Absent Absent Absent No formal definition Automated clinical documentation Absent Absent Absent No formal definition AI-generated differential Absent Absent Absent No formal definition Ambient clinical intelligence Absent Absent Absent No formal definition * MeSH contains general AI/ML terms but not clinical workflow concepts. ** SNOMED CT contains abstract decision support concepts but no AI/ML-specific outputs. The complete absence of AI-specific terms from SNOMED CT and ICD-11 (0 of 8) represents the sharpest edge of a broader ontological coverage failure. This targeted verification is complemented by a systematic coverage analysis of the full shared dataset. Broader ontological coverage analysis. Of the 70 unique terms in the shared dataset, 44 are eligible English-language health technology terms (after excluding 7 language variants, 8 commercial platform names, 9 theoretical constructs, and 2 non-health generic terms; see Supplementary Table in Stummer 2026, forthcoming in Health Informatics Journal, for classification rationale). Of these 44 terms, 22 (50.0%) are absent from all three reference ontologies. Only 14 (31.8%) have an exact match in at least one system. ICD-11 has effectively zero coverage of digital health terminology (0% exact match among the 44 terms), while MeSH performs best (27.3% exact match), largely through the Telemedicine descriptor (D017216) and its extensive entry terms. SNOMED CT covers 15.9% through core encounter-type concepts but lacks vocabulary for service delivery models, digital health infrastructure, and video-specific encounters. The coverage gap is not uniformly distributed. Blended and hybrid care models (blended care, hybrid care, digi-physical, mixed care) are entirely absent from all three ontologies, as is the "digital" family (digital health, digital care, digital platform). Video-specific terms (video visit, video meeting, video call), despite describing the dominant pandemic-era modality, have no formal coding in any reference system. The gap deepens further for AI-specific terminology: the 0/8 AI-term absence rate in Table 1 is worse than the already concerning 50.0% general absence rate, suggesting that ontological coverage degrades as terminology moves from established telehealth concepts toward newer AI-mediated workflow terms. The dual gap. These findings reveal two distinct but compounding problems. First, a practice gap: 78.6% of the 70 terms are used without explicit definition in the source documents, indicating that researchers and practitioners deploy terminology without anchoring it to formal definitions. Second, an ontology gap: even when researchers seek formal definitions, half the eligible health technology terms (50.0%) have no representation in reference ontologies to anchor to. The combination creates a terminology environment in which neither practice nor infrastructure provides definitional clarity. ICD-11, by scope, does not represent the mode of diagnosis generation; any representation of AI assistance must therefore reside outside ICD-11 proper [ 13 ]. This separation complicates using ICD alone as a semantic backbone for AI-mediated care but does not preclude external provenance models that supplement ICD classification. The absence of AI workflow concepts from ICD-11 is not an oversight in its design; it is a consequence of its classification scope, which was defined before AI-generated clinical outputs became a practical reality. 5.2 Expressiveness Ceiling Analysis The formal properties of EL + + were assessed against the semantic requirements of AI clinical concepts. Table 2 summarises the expressiveness boundary. Table 2 EL + + Expressiveness: Supported and Unsupported Constructs Construct EL + + Support Clinical Relevance Concept conjunction (A AND B) Yes Combining clinical features Existential restriction (some R.C) Yes "Has finding site: lung" Role hierarchy Yes Subsumption of relationships Domain/range constraints Yes Type checking Negation (NOT A) No "Not contraindicated" Disjunction (A OR B) No "Either condition X or Y" Universal restriction (all R.C) No "All results above threshold" Cardinality constraints No "At least 3 of 5 criteria met" AI clinical concepts characteristically require semantics that fall outside EL + + support: Probabilistic outputs. An AI diagnostic tool that reports "confidence score 0.87 for pneumonia" requires a numeric probability bound to a clinical finding, a construct that EL + + cannot represent natively. Conditional recommendations. "If the patient meets criteria set X and the model confidence exceeds threshold Y, suggest intervention Z" involves conditional logic, cardinality, and numerical thresholds simultaneously. Model provenance. "Diagnosis generated by model version 3.2, trained on dataset comprising 50,000 chest radiographs from institution A" requires provenance metadata that extends beyond clinical description into data lineage, a domain for which EL + + was not designed. These are not exotic or unusual requirements. They describe the routine operational characteristics of AI tools already deployed in clinical settings. The expressiveness ceiling is not a future concern; it describes a present-day mismatch between what AI tools produce and what the dominant clinical ontology can formally represent. To illustrate how this ceiling operates in practice, consider a concrete post-coordination attempt. The concept "AI-assisted diagnostic suggestion with confidence score above 0.85, derived from a convolutional neural network trained on dermatoscopic images from a European population" contains several semantic layers. SNOMED CT can represent the diagnosis (e.g., melanoma) and the body site, but cannot represent the mode of generation (AI-assisted), the confidence threshold (0.85), the model architecture (CNN), or the training population constraint (European). Post-coordination using SNOMED CT's defining relationships cannot bridge this gap because the required relationship types (hasConfidenceScore, hasModelArchitecture, hasTrainingPopulation) do not exist in the concept model. This is not a content gap that future releases could fill within EL++; it is a structural expressiveness limitation. While future extensions to the SNOMED CT concept model could in principle add relationships such as hasConfidenceScore or hasModelArchitecture, doing so would move the system beyond the current EL + + fragment and its polynomial-time tractability guarantees. 5.3 Curation Scalability Evidence Table 3 Scalability Indicators Across Health Terminology Systems System Vocabularies Scale Curation Method Known Limits SNOMED CT 1 (+ ext.) ~ 350,000 Manual + editorial CRS backlog unmeasured ICD-11 1 ~ 55,000 Committee review 15K proposals/10 year; throughput unreported UMLS 214 15.5M atoms Semi-automated + human 90% precision = thousands FP OAEI Biomed. 3–28 Variable Automated matching Degrades with ontology size This table warrants explicit attention to EMoT Rule 5: honest acknowledgment of where automated methods outperform manual curation. The evidence is clear that for large-scale vocabulary alignment, automated and semi-automated methods are not merely helpful but necessary. Nguyen et al. [ 14 ] demonstrated that deep learning approaches outperformed rule-based methods by 14.1 F1 percentage points at UMLS scale. Faria et al. [ 16 ] found that hash-based algorithmic matching was essential for tractable processing of large biomedical ontologies, achieving approximately 50% runtime reduction over naive pairwise comparison. Bada [ 23 ] framed automated concept mapping as a precondition for scalable biomedical literature analysis. However, the same evidence shows that automation does not eliminate the problem. At 90% precision across 15.5 million atoms, the absolute number of errors remains large enough to require human adjudication for every release cycle [ 14 ]. Automation shifts the bottleneck from initial alignment to error review, but does not remove it. The fundamental issue is that the number of pairwise comparisons grows quadratically with vocabulary count, and each new source vocabulary added to the UMLS creates O(n) new potential mapping conflicts with existing vocabularies. The ICD-11 proposal platform [ 12 ] presents a different scalability challenge. With 15,000 proposals over 10 years and no published throughput data, it is impossible to estimate the system's absorption capacity. If even a fraction of the AI-specific clinical terms now entering practice were submitted as proposals, the queue time for incorporation could exceed the innovation cycle of the technologies being described. 5.4 Cascade Fragility Evidence Dos Reis et al. [ 17 ] demonstrated that changes within one knowledge organisation system cascade through inter-terminology mappings in ways that cannot be predicted from the change operation alone. Their analysis of SNOMED CT and ICD-9-CM found that the semantic structure of concepts, the information used to define mappings, and the change operations must all be considered simultaneously to assess the impact of a single modification. The DyKOSMap framework [ 18 ] proposed heuristics for semi-automatic mapping adaptation, but the authors acknowledged that human review remains essential at each step. The framework "facilitates" rather than eliminates maintenance burden. Each new vocabulary integrated into the UMLS Metathesaurus introduces O(n) new potential mapping conflicts with existing vocabularies. The O(n) theoretical bound represents a graph-theoretic worst case for mapping conflict growth; the empirically observed impact reported by Dos Reis et al. [ 17 ], while substantial, has not been quantified at this scale. The distinction between theoretical bound and observed impact should be maintained when interpreting these findings. At 214 vocabularies, a single concept change in one source can theoretically affect alignments with all other sources. Historical precedent confirms the severity of this fragility: the transition from ICD-10 to ICD-11 required complete remapping rather than incremental update, invalidating years of accumulated cross-walk tables. Rodrigues, Schulz, et al. [ 22 ] found significant modelling issues in more than one third of cases when examining SNOMED CT concept model quality, where concept model instances contradicted the intuitive meaning of Fully Specified Names. This suggests that semantic drift occurs not only between terminologies but within a single terminology over time. 5.5 The Semantic Entropy Argument The three structural limits described above, expressiveness ceiling, curation scalability, and cascade fragility, do not operate independently. They interact to create a compounding effect. When a new AI clinical concept cannot be represented in EL++ (expressiveness ceiling), it either remains outside SNOMED CT or is approximated through post-coordination or extension mechanisms. Approximation introduces semantic imprecision. When that imprecise concept is then mapped to ICD-11 or other vocabularies through the UMLS (curation scalability), the imprecision propagates. When a subsequent release of any involved terminology modifies related concepts (cascade fragility), the accumulated imprecision may amplify in unpredictable ways. We propose the term "semantic entropy" to describe this directional phenomenon: the progressive increase in definitional disorder within a terminology system as the rate of new concept introduction exceeds the system's capacity for precise incorporation and maintenance. Semantic entropy is not a formal mathematical quantity at this stage. It is a conceptual framework identifying measurable indicators: Coverage gap rate : the proportion of terms in active clinical use that lack formal ontological representation. This indicator has two operationalisations in our data. The practice gap (78.6%) measures terms used without explicit definition in source documents; the ontology gap (50.0%) measures terms absent from all three reference ontologies (SNOMED CT, ICD-11, MeSH). Both are valid indicators of coverage failure, but they measure different dimensions: the former reflects researcher and practitioner behaviour, the latter reflects the structural limits of the ontological infrastructure itself. For AI-specific terms, the ontology gap is starker still: 0 of 8 tested terms have any formal representation. Mapping failure rate : the proportion of terms that cannot be unambiguously mapped to a reference terminology Curation backlog growth rate : the rate at which unprocessed term requests accumulate relative to processing capacity When all three indicators trend upward simultaneously, the system is experiencing increasing semantic entropy. Our data suggest that this is the current trajectory for AI-related health terminology. Figure 1 summarises the convergence model and the three measurable indicators derived from this analysis. 6. Discussion 6.1 Defining Semantic Entropy The concept of semantic entropy, as proposed here, draws an analogy from thermodynamics: just as physical entropy measures the degree of disorder in a system, semantic entropy measures the degree of definitional disorder in a terminology system. The analogy is imperfect and should be treated as heuristic rather than formal. Unlike physical entropy, semantic entropy is not (yet) measurable in standardised units, and its behaviour under different governance interventions has not been modelled. What the concept offers is a vocabulary for discussing a directional phenomenon that the literature has described qualitatively but not named. Da Silveira et al. [ 6 ] identified that domain knowledge evolution "directly impacts terminologies and generates inconsistencies in underlying biomedical information systems," but stopped short of proposing a measure for the rate or severity of this impact. The dual gap identified in this analysis, where 78.6% of terms are used without definition in practice and 50.0% of eligible health technology terms lack representation in reference ontologies, suggests that disorder is present at both levels simultaneously. The practice gap indicates that the research community does not anchor its terminology use to formal definitions; the ontology gap indicates that, even where the intent to formalise exists, the infrastructure cannot accommodate half of the relevant vocabulary. The scalability evidence (90% precision at 15.5M atoms still producing massive error volumes) and the cascade fragility evidence (unpredictable change propagation across mappings) suggest that the system is moving toward greater disorder, not less. The "coverage gap rate" operationalised here corresponds to what the companion public health analysis (Stummer 2026, forthcoming in Health Informatics Journal) terms the "definitional gap rate": the proportion of terms lacking formal definitions in reference ontologies. The same empirical measurement thus functions simultaneously as a public health indicator (evidence synthesis feasibility), an informatics indicator (ontological coverage), and a market indicator (proportion of the market operating outside standardised vocabulary). In the present analysis, both the practice-level operationalisation (78.6%) and the ontology-level operationalisation (50.0%) are relevant: they represent complementary dimensions of the same underlying governance failure. In the companion public health analysis this dual gap appears as the indicator for public health risk (evidence synthesis and surveillance); in the companion business administration analysis (Stummer 2026, forthcoming in Health Policy) it appears as a market-structure indicator of the share of vocabulary operating outside standardised frameworks. A candidate formalisation might define semantic entropy H(T) for a terminology system T as H(T) = -sum(p_i * log(p_i)) over the distribution of mapping outcomes for terms in T, where p_i represents the probability of each mapping state (unique match, ambiguous match, no match, deprecated match). A system approaching maximum entropy would show a near-uniform distribution across these states, indicating that mapping any given term is no more predictable than chance. Formal development of this metric is beyond the scope of the present paper but would provide a quantitative basis for monitoring ontological governance capacity over time. This sketch is provided only to illustrate that information-theoretic formalisation is feasible; this paper does not compute or validate H(T) empirically. Formalising semantic entropy as a mathematical quantity is explicitly beyond the scope of this paper. Candidate approaches might include information-theoretic measures applied to terminology version diffs, network-based measures of mapping graph instability, or growth-rate ratios comparing concept introduction to curation throughput. Each would require dedicated methodological development and empirical validation. 6.2 The 28-Year Gap Cimino's desiderata [ 7 ] were articulated in 1998, at a time when the primary challenge was unifying fragmented clinical vocabularies into comprehensive, principled systems. The "graceful evolution" desideratum assumed that change would be incremental: new diseases identified, new procedures developed, existing concepts refined. The biannual release cycle of SNOMED CT and the decade-long revision cycle of ICD were designed for this tempo. The AI terminology explosion violates the assumption of incremental change. When hundreds of AI-based clinical tools receive regulatory clearance within a few years, each introducing novel workflow concepts, the rate of semantic change exceeds what any committee-based or semi-automated governance process can absorb. This is not a critique of the standards bodies' competence or resources. It is an observation that the design parameters of the governance infrastructure presuppose a rate of terminology change that no longer obtains. After 28 years, "graceful evolution" remains an aspiration rather than an operational reality. The AI terminology explosion may convert it from an unmet desideratum to an unachievable one, at least within the current ontological architecture. 6.3 When Existing Ontology Infrastructure Is Sufficient It would be misleading to suggest that existing health ontology infrastructure is failing broadly. For established clinical domains, SNOMED CT, ICD-11, and the UMLS perform their intended functions with reasonable effectiveness. Diagnoses, procedures, medications, anatomical structures, and laboratory observations are well-represented in reference ontologies, supported by decades of curation, and connected through mature cross-terminology mappings. The expressiveness ceiling of EL + + is rarely constraining for these concept types. The curation scalability of the UMLS, while stressed, remains functional for the existing vocabulary base. Cascade fragility, while real, is manageable when terminology changes occur at the historical pace. The structural limits identified in this paper become salient specifically when novel concept categories emerge that require richer semantics than EL + + provides, that arrive faster than governance processes can absorb, and that create mapping dependencies with existing vocabularies that amplify the cascade fragility problem. AI/ML clinical terminology is arguably the first major concept category to satisfy all three conditions simultaneously. For traditional clinical terminology, the current architecture may remain adequate for the foreseeable future. This concession does not diminish the urgency of the problem. If AI clinical tools continue to proliferate at current rates, the proportion of clinical activity that falls outside formal ontological representation will grow. A system that works well for 80% of clinical vocabulary but cannot represent the fastest-growing 20% is not failing; it is becoming progressively less complete, and the incompleteness is concentrated precisely in the domain where semantic precision matters most for patient safety and regulatory compliance. 6.4 The Historical Parallel The current situation has a structural precedent. Before the emergence of HL7, DICOM, and related interoperability standards in the 1990s, electronic health record terminology was fragmented across proprietary vendor systems. Hammond [ 20 ] documented the pre-standardisation era in which hospital information systems created "islands of incompatible data." Hayrinen, Saranto, and Nykanen [ 21 ] found in their systematic review that "only very few papers offered descriptions of the structure of EHRs or the terminologies used," reflecting a period in which the concept of an electronic health record "comprised a wide range of information systems." Resolution required three concurrent developments: binding standards that defined common data structures, certification requirements that incentivised adoption, and regulatory mandates that made compliance non-optional [ 1 ]. The process took approximately 15 years from initial HL7 development to widespread adoption, and even then, Benson and Grieve [ 1 ] note that "persistent inconsistencies and divergent interpretations" remain. The AI terminology situation has the same tripartite structure (fragmented vocabularies, no binding standards, no certification requirements for terminology use) but operates on a compressed timescale with greater semantic complexity. Where the pre-HL7 fragmentation involved hundreds of proprietary systems generating thousands of non-standard terms over two decades, the AI terminology explosion involves hundreds of AI tools generating novel clinical concepts within a few years. The governance response that took 15 years for EHR interoperability may not be available as a timeline for AI terminology governance. 6.5 Implications for EHDS and EU AI Act Both the European Health Data Space (EHDS) regulation and the EU AI Act assume semantic interoperability as a prerequisite for their policy objectives but neither addresses the ontology infrastructure gap identified in this analysis. EHDS Article 7 mandates data quality standards for electronic health data, including requirements for semantic interoperability across member states. However, the regulation does not specify how AI tool terminology should map to existing reference ontologies, nor does it address the governance mechanisms needed to maintain such mappings as AI terminology evolves. The EU AI Act classifies AI systems by risk level and imposes transparency and documentation requirements for high-risk systems, a category that includes many clinical AI applications. These requirements implicitly assume that the outputs and processes of AI systems can be described in standardised terminology. If the terminology for describing AI clinical outputs does not exist in reference ontologies, the documentation requirements cannot be met in an interoperable manner. This regulatory gap creates a practical problem. As member states implement EHDS data quality standards and AI Act documentation requirements, the absence of standardised AI clinical terminology may force ad hoc solutions, precisely the kind of proprietary fragmentation that EHDS was designed to prevent. 7. Limitations L1. Expressiveness analysis is theoretical. The assessment of EL + + limitations is based on the formal specification of the description logic, not on empirical attempts to encode AI clinical concepts within SNOMED CT's authoring environment. It is possible that practical workarounds, including post-coordination, reference sets, or extension mechanisms, could accommodate some AI concepts within existing constraints. A concrete next step would be to attempt post-coordinated encoding of 20 or more AI clinical workflow concepts in EL + + and measure the failure rate, providing the first empirical estimate of the expressiveness ceiling's practical impact. L2. Scalability extrapolation from benchmarks. The evidence for curation scalability limits derives primarily from the UMLS alignment study by Nguyen et al. [ 14 ] and the OAEI biomedical track benchmarks analysed by Faria et al. [ 16 ]. These controlled experimental settings may not reflect the full complexity of operational UMLS curation, where institutional knowledge, editorial guidelines, and iterative review processes may partially mitigate the precision limitations observed in benchmarks. L3. Coverage gap is point-in-time. The SNOMED CT and ICD-11 browser verification was performed against releases current as of late 2025 and early 2026. Both systems undergo regular updates, and future releases may incorporate some or all of the AI-specific terms found absent in this analysis. The structural arguments about absorption capacity and governance latency remain relevant regardless of incremental additions, but the specific coverage gap figures reported here should be understood as a snapshot, not a permanent state. L4. Semantic entropy is conceptual, not formalised. The concept of semantic entropy is proposed as a directional framework with identified measurable indicators. It has not been formalised as a mathematical quantity, validated against historical terminology evolution data, or tested for predictive utility. Its current value is heuristic: it names a phenomenon and identifies dimensions along which it might be measured. Whether it can be operationalised into a rigorous metric is an open question for future work. L5. Geographic and domain scope. The empirical dataset is drawn from Austrian and Swedish primary care, covering telemedicine (2020 to 2023) and AI integration (2023 to 2025). Other clinical specialties, particularly those with higher AI adoption rates (radiology, pathology, dermatology), may exhibit different terminology dynamics. Other geographic and regulatory contexts (US FDA framework, UK MHRA, Asian regulatory environments) may impose different constraints on terminology governance. The findings should be considered as indicative rather than universally generalisable. L6. Health informatics framing only. This analysis examines terminology proliferation exclusively through the lens of ontological infrastructure and semantic interoperability. The public health consequences (disrupted surveillance, impaired evidence synthesis) and the economic dimensions (market failure, vendor lock-in, standardisation economics) are addressed in companion papers within this series. The public health and economic consequences of the informatics limits identified here are explored in companion analyses examining evidence synthesis barriers and market failure mechanisms respectively. Readers seeking a comprehensive governance framework should consult the capstone synthesis. This paper is designed to stand alone as an independent contribution; familiarity with the companion pre-studies is not required for interpretation of the findings. L7. Synthetic data limitations in the computational pre-study. The Synthea-based entropy analysis (Section 4 ) uses synthetic data calibrated to U.S. healthcare delivery patterns and does not incorporate European disease prevalence profiles, billing incentive structures, or coding practices. The condition vocabulary reflects Synthea's implemented disease modules rather than the full SNOMED CT hierarchy. Shannon entropy measures distributional disorder, not semantic ambiguity; a domain could have high entropy without any terminological confusion if all codes are precisely defined. Replication on European clinical datasets, ideally real-world data where ethically and legally permissible, would be required to confirm that the entropy patterns generalise beyond the synthetic U.S. context. 8. Conclusion This paper has identified three structural limits in existing health ontology infrastructure that converge to prevent the absorption of AI-specific clinical terminology: an expressiveness ceiling in SNOMED CT's description logic (EL++), a curation scalability boundary at UMLS Metathesaurus scale, and cascade fragility in inter-terminology mappings. Direct verification confirms that none of the eight AI-specific clinical workflow terms tested have formal representation in SNOMED CT or ICD-11 as of early 2026. The broader ontological coverage analysis reveals a dual gap: 78.6% of digital health terms are used without explicit definition in research practice, and 50.0% of eligible health technology terms lack any representation in SNOMED CT, ICD-11, or MeSH. Only 31.8% have an exact match in at least one reference ontology. The AI-specific coverage gap (0/8) is starker still, suggesting that ontological coverage degrades as terminology moves toward newer, AI-mediated concepts. The concept of "semantic entropy," proposed here as a directional framework for measuring definitional disorder in terminology systems, identifies three measurable indicators (coverage gap rate, mapping failure rate, curation backlog growth rate) that collectively suggest the system is moving toward greater disorder. The coverage gap rate itself has two complementary operationalisations: the practice gap (78.6%, measuring definitional behaviour in source documents) and the ontology gap (50.0%, measuring structural absence from reference ontologies). Cimino's "graceful evolution" desideratum, unmet after 28 years, may be unachievable within current ontology paradigms, absent a qualitative shift in ontology design that accommodates rapid, AI-driven concept emergence. Scope limitation. These findings address the health informatics dimension of a phenomenon that spans public health, informatics, and economics. The structural analysis is based on EU/DACH empirical data and direct browser verification; generalisability to other geographies and clinical domains requires dedicated investigation. Four directions for future work emerge from this analysis: (1) empirical expressiveness testing, in which researchers attempt to encode AI clinical concepts in EL + + and measure the failure rate; (2) standards body latency measurement, quantifying the time from term request to incorporation in SNOMED CT and ICD-11; (3) formal semantic entropy development, operationalising the proposed concept into a mathematically rigorous metric with validated thresholds; and (4) EHDS implementation guidance addressing the ontology infrastructure gap that currently undermines the regulation's semantic interoperability objectives. Declarations Funding: This research received no external funding. Conflicts of interest: The author declares no conflicts of interest. Ethics approval: Not applicable. This study analyses publicly available terminology systems and a previously collected dataset; no human subjects were involved. Data availability: The dataset of 1,194 terminology instances across 70 unique terms is described in the companion public health analysis (Stummer 2026, forthcoming in Health Informatics Journal). The complete term list, coverage matrix, and extraction protocol are available as supplementary material accompanying that publication. The Synthea synthetic patient data used in the computational pre-study (Section 4) were generated using the open-source Synthea Patient Generator (https://github.com/synthetichealth/synthea); the entropy analysis scripts and generated dataset are available from the corresponding author upon reasonable request. AI disclosure: Claude (Anthropic) was used for language editing and reference formatting. All analytical decisions, interpretations, and claims are the author's own. References Benson T, Grieve G. Principles of Health Interoperability: FHIR, HL7 and SNOMED CT. 4th ed. Springer; 2021. DOI: 10.1007/978-3-030-56883-2 Sass J, Essenwanger A, Luijten S, Vom Felde Genannt Imbusch P, Thun S. Standardizing Germany's electronic disease management program for bronchial asthma. Stud Health Technol Inform. 2019;267:81-85. DOI: 10.3233/SHTI190809 Oniki TA, Coyle JF, Parker CG, Huff SM. Lessons learned in detailed clinical modeling at Intermountain Healthcare. J Am Med Inform Assoc. 2014;21(6):1076-1081. DOI: 10.1136/amiajnl-2014-002875 SNOMED International. SNOMED CT Browser (January 2026 International Edition). Available from: https://browser.ihtsdotools.org/ Amar J, April A, Abran A. Electronic Health Record and semantic issues using Fast Healthcare Interoperability Resources: systematic mapping review. J Med Internet Res. 2024;26:e45209. DOI: 10.2196/45209 Da Silveira M, Dos Reis JC, Pruski C. Management of dynamic biomedical terminologies: current status and future challenges. Yearb Med Inform. 2015;10(1):125-133. DOI: 10.15265/IY-2015-002 Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med. 1998;37(4-5):394-403. PubMed: 9865037 Cimino JJ. In defense of the desiderata. J Biomed Inform. 2006;39(3):299-306. DOI: 10.1016/j.jbi.2005.11.008 SNOMED International. Release notes (2020-2025). Available from: https://www.snomed.org/releases Rector AL, Brandt S. Why do it the hard way? The case for an expressive description logic for SNOMED. J Am Med Inform Assoc. 2008;15(6):744-751. DOI: 10.1197/jamia.M2797 WHO. WHO releases 2025 update to the International Classification of Diseases (ICD-11). 14 February 2025. Available from: https://www.who.int/news/item/14-02-2025-who-releases-2025-update-to-the-international-classification-of-diseases-(icd-11) Ibrahim H, Southern D, Zhang M, Macpherson A, Alsokhn C, Krpelanova N, Kostanjsek N, Jakob R. ICD-11 'by the people for the people': the open feedback proposal platform. Health Inf Manag J. 2025. DOI: 10.1177/18333583251366915 WHO. ICD-11 Coding Tool. 2025. Available from: https://icd.who.int/ct11 Nguyen V, Yip HY, Bodenreider O. Biomedical vocabulary alignment at scale in the UMLS Metathesaurus. In: Proceedings of the Web Conference 2021 (WWW'21). ACM; 2021. DOI: 10.1145/3442381.3450128 Mantri S, Satokar KR, Tambe SB, Bhutad S. FHIR standard-based oncology data model for cancer screening. JMIR Cancer. 2025;11:e79011. DOI: 10.2196/79011 Faria D, Pesquita C, Mott I, Martins C, Couto FM, Cruz IF. Tackling the challenges of matching biomedical ontologies. J Biomed Semantics. 2018;9(1):4. DOI: 10.1186/s13326-017-0170-9 Dos Reis JC, Pruski C, Da Silveira M, Reynaud-Delaitre C. Characterizing semantic mappings adaptation via biomedical KOS evolution: a case study investigating SNOMED CT and ICD. AMIA Annu Symp Proc. 2013;2013:333-342. PubMed: 24551341 Dos Reis JC, Pruski C, Da Silveira M, Reynaud-Delaitre C. DyKOSMap: a framework for mapping adaptation between biomedical knowledge organization systems. J Biomed Inform. 2015;55:153-173. DOI: 10.1016/j.jbi.2015.04.001 Schulz S, Case JT, Hendler P, et al. SNOMED CT and Basic Formal Ontology: convergence or contradiction between standards? Appl Ontol. 2023;18(3):207-237. DOI: 10.3233/AO-230018 Hammond WE. eHealth interoperability. Stud Health Technol Inform. 2008;134:245-253. PubMed: 18376051 Hayrinen K, Saranto K, Nykanen P. Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int J Med Inform. 2008;77(5):291-304. DOI: 10.1016/j.ijmedinf.2007.09.001 Rodrigues JM, Schulz S, Mizen B, Rector A, Serir S. Is the application of SNOMED CT concept model sufficiently quality assured? AMIA Annu Symp Proc. 2018;2017:1488-1497. PubMed: 29854218 Bada M. Mapping of biomedical text to concepts of lexicons, terminologies, and ontologies. Methods Mol Biol. 2014;1159:33-45. DOI: 10.1007/978-1-4939-0709-0_3 Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, Duffett C, Dube K, Gallagher T, McLachlan S. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. 2018;25(3):230-238. DOI: 10.1093/jamia/ocx079 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9441335","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":624507669,"identity":"084d914d-455d-4793-b33d-426605b00ffe","order_by":0,"name":"Florian Odi Stummer","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABHklEQVRIie2QMUvDQBTHnwQuy2HXK5XcV3ihKA6CX+WFgF0SFATpUPCgEJfirJNfITg4Fw4ypXQVIpIpk0On0kGKF0qxaBpXh/sNN/zvfvzfOwCL5R/i3pmDdgIPODCAq9Mf+Tdcb6/YJuhvFBTtCuwogfpTcZxsUY7eQD5Njp9fh++DNJ+ycoXCAzecNissfKCsAsz4SRHl13E6U64/QWEmrBprzh3eB2IakBklTihO552l4CgCJSJsbqmVtQaZ1MqaBjgH1v1EcavE5WKvEiTmE+rBYkWEM8V6poVARHvWZyEE95pjdnFTRBn5j2b93hEKP+FV82DuWB+sltqTY/1SRCOShzmx7sfwTHbcsGys2aq/I9b23mKxWCytfAGj/FYLF+q9oAAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0002-6273-8014","institution":"Martin Luther University Halle-Wittenberg","correspondingAuthor":true,"prefix":"","firstName":"Florian","middleName":"Odi","lastName":"Stummer","suffix":""},{"id":624507670,"identity":"a0344b23-a3c2-449f-9e22-4e6f92c0afba","order_by":1,"name":"Franz Leisch","email":"","orcid":"","institution":"FH Wiener Neustadt","correspondingAuthor":false,"prefix":"","firstName":"Franz","middleName":"","lastName":"Leisch","suffix":""}],"badges":[],"createdAt":"2026-04-16 18:26:51","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9441335/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9441335/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107280776,"identity":"c5bd34ea-3569-4405-b318-688534115e92","added_by":"auto","created_at":"2026-04-20 02:03:52","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":360184,"visible":true,"origin":"","legend":"\u003cp\u003eSemantic entropy convergence model. Three structural limits in health ontology infrastructure (expressiveness ceiling, curation scalability boundary, cascade fragility) compound to produce increasing definitional disorder. Three measurable indicators track the trajectory: coverage gap rate (practice: 78.6%, ontology: 50.0%, AI-specific: 0/8), mapping failure rate (only 31.8% exact match in any reference ontology), and curation backlog growth (15,000 proposals over 10 years, throughput unreported). Data: 1,194 terminology instances, 70 unique terms, Austrian and Swedish primary care.\u003c/p\u003e","description":"","filename":"S2figure1semanticentropy.png","url":"https://assets-eu.researchsquare.com/files/rs-9441335/v1/e68c6c01c6e537a9d32ac311.png"},{"id":107485439,"identity":"617c06cd-da9a-4c95-8efe-395ac9203e4f","added_by":"auto","created_at":"2026-04-22 02:34:49","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":964711,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9441335/v1/63c6af60-9af7-439b-add4-776b1abc9c44.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eSemantic Entropy in Digital Health: When Ontological Mapping Reaches Its Computational Limits\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eThe promise of digital health rests on a deceptively simple assumption: that clinical information recorded in one system can be meaningfully interpreted in another. This assumption, known as semantic interoperability, depends on shared terminological foundations. Reference ontologies such as SNOMED CT, ICD-11, and the UMLS Metathesaurus provide the common vocabulary through which diagnoses, procedures, and clinical observations are exchanged across institutional, national, and international boundaries [1, 2].\u003c/p\u003e\n\u003cp\u003eFor established clinical domains, this infrastructure functions adequately. Diagnoses, procedures, and medications have been encoded, curated, and mapped over decades of sustained standards development. The system, though imperfect, is mature enough to support electronic health record interoperability, clinical decision support, and population health surveillance for traditional clinical concepts [3].\u003c/p\u003e\n\u003cp\u003eThe emergence of artificial intelligence and machine learning in clinical practice introduces a category of concepts for which this infrastructure was not designed. Terms such as \u0026quot;AI-assisted diagnosis,\u0026quot; \u0026quot;algorithmic triage score,\u0026quot; \u0026quot;large language model consultation,\u0026quot; and \u0026quot;ambient clinical documentation\u0026quot; describe clinical activities that are already occurring in practice but that have no formal representation in any reference ontology [4]. This is not merely a gap waiting to be filled. It reflects a structural mismatch between the pace of technological terminology generation and the capacity of ontology governance systems to absorb new concepts.\u003c/p\u003e\n\u003cp\u003eThis paper argues that the mismatch is not a resourcing problem amenable to additional curation staff or faster release cycles. Rather, it arises from three design-level constraints that converge to create a structural ceiling on ontological absorption capacity: (a) an expressiveness ceiling in the description logic underpinning SNOMED CT, which cannot represent the semantic relationships characteristic of AI clinical tools; (b) a scalability boundary in the semi-automated alignment methods used to maintain the UMLS Metathesaurus, where precision degrades at the scale of 214 source vocabularies; and (c) cascade fragility in inter-terminology mappings, where changes in one system propagate through dependent systems in ways that cannot be predicted from the change operation alone.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResearch question:\u0026nbsp;\u003c/strong\u003eAt what point does the rate of new health technology terminology exceed the structural capacity of existing ontology infrastructure, and what are the consequences for semantic interoperability?\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ePositioning against prior work.\u0026nbsp;\u003c/strong\u003eSeveral studies have examined aspects of this problem in isolation. Amar, April, and Abran [5] conducted a systematic mapping review of FHIR semantic interoperability approaches, identifying six categories of semantic mapping strategies across 70 studies, but their analysis focused on implementation challenges rather than structural limits of the underlying ontologies. Da Silveira, Dos Reis, and Pruski [6] reviewed the management of dynamic biomedical terminologies and identified open challenges in change propagation, but found no formal mathematical model of terminology drift rate. Cimino [7, 8] articulated the foundational desiderata for controlled medical vocabularies, including \u0026quot;graceful evolution,\u0026quot; but did not assess whether these desiderata remain achievable at current ontology scale. This paper\u0026apos;s contribution lies in connecting the structural limits of existing ontology systems (EL++ expressiveness, UMLS scalability, cascade fragility) with the AI terminology explosion to argue that the gap between implemented vocabulary and reference ontology is widening, not closing.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eScope.\u0026nbsp;\u003c/strong\u003eThis is a structural analysis of health ontology infrastructure limits, using EU/DACH empirical data and direct terminology browser verification. The public health implications of terminology proliferation are addressed in a companion paper (Stummer 2026, forthcoming in Health Informatics Journal); the economic and market-structure dimensions are addressed in a separate analysis (Stummer 2026, forthcoming in Health Policy).\u003c/p\u003e\n\u003cp\u003eThis paper is the second of three pre-studies in a terminology governance series examining the same empirical phenomenon through complementary disciplinary lenses. Pre-Study 1 addresses the public health dimension (\u0026quot;terminology tipping point\u0026quot;), the present analysis examines ontological and computational limits (\u0026quot;semantic entropy\u0026quot;), and Pre-Study 3 analyses market failure mechanisms (\u0026quot;terminology tragedy of the commons\u0026quot;). A synthesis capstone integrating all three perspectives is forthcoming.\u003c/p\u003e"},{"header":"2. Background: The Architecture of Health Terminology Systems","content":"\u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003e2.1 SNOMED CT and Description Logic\u003c/h2\u003e \u003cp\u003eSNOMED CT is the most comprehensive clinical terminology system in use, containing approximately 350,000 active concepts in its International Edition [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Its formal foundation rests on EL++, a tractable fragment of the OWL 2 description logic family, chosen specifically because it permits polynomial-time reasoning [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. This design choice represents a deliberate trade-off: tractability is purchased at the cost of expressiveness. EL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;supports concept conjunction, existential restriction, role hierarchy, and domain and range constraints. It does not support negation, disjunction, universal restriction, or cardinality constraints [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFor traditional clinical concepts (diseases, anatomical structures, procedures, substances), this expressiveness is largely sufficient. The inability to represent negation or disjunction rarely imposes a practical barrier when encoding \"type 2 diabetes mellitus\" or \"laparoscopic cholecystectomy.\" SNOMED CT releases biannually for the International Edition, with national extensions released on varying schedules. The Content Request System (CRS) accepts proposals for new concepts, but no published study reports the backlog size, average processing time, or rejection rate for new term requests [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.2 ICD-11 and the Proposal Platform\u003c/h2\u003e \u003cp\u003eICD-11, the most recent revision of the International Classification of Diseases, introduced a Foundation layer and linearisation architecture designed to support multiple use cases from a single underlying knowledge base [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. The WHO maintains an open feedback proposal platform through which the global health community can suggest revisions. Ibrahim et al. [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] reported that over 15,000 proposals were submitted between July 2014 and July 2024, originating from 72 countries and processed through a multi-step WHO committee review. Critically, the study does not report acceptance rates, average time-to-incorporation, or rejection reasons, leaving the actual throughput of the governance mechanism unmeasured.\u003c/p\u003e \u003cp\u003eICD-11 classifies diseases and health conditions. By scope, it does not represent interventions, technologies, or the mode by which a clinical assessment was generated. Any representation of AI assistance in the diagnostic process must therefore reside outside ICD-11 proper [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. This separation complicates using ICD alone as a semantic backbone for AI-mediated care but does not preclude external provenance models that supplement ICD classification.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.3 UMLS Metathesaurus\u003c/h2\u003e \u003cp\u003eThe UMLS Metathesaurus integrates 214 source vocabularies into a single, cross-mapped knowledge resource containing approximately 15.5\u0026nbsp;million atoms [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Its construction is semi-automated: lexical and semantic matching algorithms propose alignments, which human curators then review. Nguyen, Yip, and Bodenreider [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] demonstrated that a deep learning approach to biomedical vocabulary alignment achieved F1 scores of 89 to 95%, outperforming rule-based methods by 14.1 percentage points. However, even at 90% precision, the scale of the Metathesaurus means that thousands of false positive alignments are generated per release cycle, each requiring human adjudication. The authors characterise UMLS Metathesaurus construction as \"costly, time-consuming, and error-prone\" [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.4 HL7 FHIR and Semantic Binding\u003c/h2\u003e \u003cp\u003eHL7 FHIR (Fast Healthcare Interoperability Resources) provides a resource-based standard for health data exchange, focused primarily on syntactic interoperability [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Semantic interoperability requires that FHIR resources are bound to controlled terminologies such as SNOMED CT, ICD, and LOINC. Amar et al. [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] found that across 70 studies, semantic approaches in FHIR implementations fell into six categories: mapping (24.6%), terminology services (14.3%), RDF/OWL (19.0%), annotation (14.3%), ML/NLP (15.9%), and ontology-based methods (11.9%). The review concluded that techniques to automate annotation and ontology comparison are urgently needed because human curation cannot sustain the pace of FHIR implementation guide proliferation.\u003c/p\u003e \u003cp\u003eDomain-specific FHIR implementations further multiply the terminology surface area. Mantri et al. [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] reported that a single oncology screening implementation in India required 25 custom FHIR profiles and 50 standardised terminology value sets, illustrating how each new clinical domain creates additional mapping and maintenance burden.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.5 Cimino's Desiderata\u003c/h2\u003e \u003cp\u003eIn 1998, Cimino [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] articulated twelve desiderata for controlled medical vocabularies, establishing a normative framework that has guided terminology development for nearly three decades. Among these, \"graceful evolution\" stipulates that a terminology system must accommodate change without invalidating existing content, codes, or mappings. Eight years later, Cimino [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] defended and extended the desiderata, arguing that concepts and universals must coexist in controlled terminologies and that the original requirements needed expansion to address purpose, not merely structure. The need for this defence paper itself suggests that the field had not converged on fundamental design principles. That several desiderata, notably graceful evolution and recognised redundancy, remain unmet after 28 years is evidence of the problem's persistent intractability [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e2.6 Positioning Against Prior Work\u003c/h2\u003e \u003cp\u003eThe three bodies of work closest to the present analysis are Amar et al. [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], Da Silveira et al. [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], and Cimino [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Each addresses a facet of the terminology governance challenge, but none combines the elements that constitute this paper's specific contribution. Da Silveira et al. [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] identified change-propagation challenges in dynamic biomedical terminologies but proposed no measure of drift rate or disorder severity; their analysis remained descriptive rather than structural. Cimino [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] articulated governance desiderata that remain the normative benchmark for the field but did not assess their achievability at current AI-driven scale or under the pressures of a terminology category that requires richer semantics than the existing description logic can express. Amar et al. [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] catalogued FHIR semantic strategies across 70 studies but treated the underlying ontology infrastructure as given, without examining the structural limits that constrain what those strategies can achieve. This paper's contribution is the combination of three elements: (1) structural limits analysis at the level of origin (expressiveness ceiling in the description logic, curation scalability boundary in alignment methods, cascade fragility in mapping graph topology), (2) empirical AI terminology coverage gap measurement using direct browser verification against reference ontologies, and (3) the \"semantic entropy\" concept as a named, directional measure of ontological governance failure. No prior work has unified these three elements.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Methods","content":"\u003ch3\u003e3.1 Data\u003c/h3\u003e\n\u003cp\u003eThe empirical basis for this analysis is a dataset of 1,194 terminology instances across 70 unique terms extracted from Austrian and Swedish primary care documentation: 24 stakeholder interviews and theoretical memos (209,239 characters) from Swedish telemedicine implementation research, and 32 academic and grey literature sources on Austrian telehealth terminology (59,302 characters). The dataset covers telemedicine adoption (2020-2023) and was extended to AI terminology (2023-2025) using parallel extraction methods. Of the 70 unique terms identified, 78.6% (n=55) were used without any explicit definition in the source documents. This figure measures source-document definitional practice, not ontological coverage, and should be interpreted as evidence of how researchers and practitioners deploy terminology without anchoring it to formal definitions. A separate ontological coverage analysis of all 44 eligible English-language health technology terms (excluding language variants, commercial platform names, and theoretical constructs) found that 50.0% (n=22) are absent from all three reference ontologies (SNOMED CT, ICD-11, MeSH), with only 31.8% (n=14) having an exact match in at least one system (Supplementary Table in Stummer 2026, forthcoming in Health Informatics Journal). Both figures should be interpreted as order-of-magnitude evidence of the scale of terminology fragmentation, not as precision estimates. Full dataset description in Stummer (2026, forthcoming in Health Informatics Journal); the present paper analyses this dataset through the lens of ontological coverage and semantic interoperability.\u003c/p\u003e\n\u003cp\u003eAdditionally, direct browser verification was performed against the SNOMED CT International Edition (January 2026 release) [4] and the ICD-11 Coding Tool (2025 Update, February 2025) [13] for a targeted set of AI-specific clinical workflow terms.\u003c/p\u003e\n\u003ch3\u003e3.2 Analytical Framework\u003c/h3\u003e\n\u003cp\u003eThe analysis assesses three structural limits of existing ontology infrastructure:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(a) Expressiveness ceiling.\u0026nbsp;\u003c/strong\u003eThe formal specification of EL++ [10] was examined to determine which semantic constructs are and are not supported. AI-specific clinical concepts were then characterised by their semantic requirements (probabilistic outputs, conditional recommendations, model provenance) and assessed for representability within EL++.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(b) Curation scalability.\u0026nbsp;\u003c/strong\u003ePublished evidence on alignment precision at UMLS scale [14], ontology matching benchmark performance [16], and ICD-11 proposal throughput [12] was synthesised to assess the relationship between vocabulary count, concept count, and curation feasibility.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e(c) Cascade fragility.\u0026nbsp;\u003c/strong\u003eEvidence from the DyKOSMap framework [17, 18] on inter-terminology mapping change propagation was used to assess the structural fragility introduced by each additional vocabulary integrated into a cross-mapped system.\u003c/p\u003e\n\u003ch3\u003e3.3 Coverage Gap Analysis\u003c/h3\u003e\n\u003cp\u003eDirect verification was performed for the following AI-specific clinical workflow terms in both the SNOMED CT browser and ICD-11 Coding Tool: \u0026quot;AI-assisted diagnosis,\u0026quot; \u0026quot;algorithmic triage,\u0026quot; \u0026quot;clinical decision support output,\u0026quot; \u0026quot;machine learning prediction,\u0026quot; \u0026quot;large language model consultation,\u0026quot; \u0026quot;automated clinical documentation,\u0026quot; \u0026quot;AI-generated differential diagnosis,\u0026quot; and \u0026quot;ambient clinical intelligence.\u0026quot; Search results were recorded as present (exact or near-exact match), partial (related but semantically distinct concept exists), or absent (no relevant concept).\u003c/p\u003e\n\u003ch3\u003e3.4 Threats to Validity\u003c/h3\u003e\n\u003cp\u003eFive threats to the validity of this analysis are identified here, before results are presented, to enable readers to calibrate their interpretation of the findings.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eT1. Expressiveness analysis is theoretical.\u0026nbsp;\u003c/strong\u003eNo empirical study has attempted to encode AI clinical concepts in EL++ and measured failure rates. The assessment is based on formal properties of the description logic, not implementation testing. It is possible that practical workarounds (post-coordination, extension mechanisms) could accommodate some AI concepts within EL++ constraints.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eT2. Scalability extrapolation.\u0026nbsp;\u003c/strong\u003eThe evidence for curation scalability limits derives from published benchmarks [14, 16] that may not generalise to operational UMLS curation environments. Real-world degradation curves may differ from those observed in controlled experimental settings.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eT3. Coverage gap is point-in-time.\u0026nbsp;\u003c/strong\u003eSNOMED CT and ICD-11 could add AI-specific terms in future releases. This analysis captures the state of affairs as of late 2025 and early 2026. The identified gaps may narrow over time, though the structural arguments about absorption capacity would remain relevant regardless of incremental additions.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eT4. Dataset scope.\u0026nbsp;\u003c/strong\u003eThe empirical dataset is drawn from Austrian and Swedish primary care. Other clinical domains (radiology, pathology, intensive care) and other geographies may exhibit different terminology dynamics. Primary care was selected because it represents the broadest clinical interface with emerging health technologies, but the findings should not be generalised without domain-specific verification.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eT5. \u0026quot;Semantic entropy\u0026quot; is proposed, not validated.\u0026nbsp;\u003c/strong\u003eThe concept is introduced here for the first time as a directional framework. It has not been formalised mathematically or validated empirically. Its utility depends on future work to operationalise measurable indicators and establish thresholds.\u003c/p\u003e"},{"header":"4. Computational Pre-Study: Semantic Entropy in Synthetic Clinical Data","content":"\u003cp\u003eTo move the concept of semantic entropy from a purely theoretical proposal toward empirical grounding, a computational pre-study was conducted using the Synthea synthetic patient generator [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. This analysis provides a first quantitative estimate of semantic entropy across clinical domains, using Shannon entropy applied to the distribution of SNOMED-CT coded conditions.\u003c/p\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Dataset and Method\u003c/h2\u003e \u003cp\u003eA Synthea-generated dataset of 11,475 synthetic patients was analysed, containing 309 unique SNOMED-CT coded conditions. Conditions were grouped into clinical domains (oncology, cardiovascular, respiratory, mental health, musculoskeletal, endocrine, and others) based on SNOMED CT hierarchy classification. For each domain, Shannon entropy H was computed over the distribution of condition frequencies:\u003c/p\u003e \u003cp\u003e \u003cem\u003eH = -sum(p_i * log2(p_i))\u003c/em\u003e \u003c/p\u003e \u003cp\u003ewhere p_i is the relative frequency of condition i within the domain. Normalised entropy (H / H_max, where H_max\u0026thinsp;=\u0026thinsp;log2(n) for n unique conditions in the domain) was computed to enable cross-domain comparison independent of domain size.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Results\u003c/h2\u003e \u003cp\u003eThe overall Shannon entropy across the full condition vocabulary was 5.3547 bits, against a maximum possible entropy of 8.2715 bits (log2(309)), yielding a normalised entropy of 0.6474. This indicates moderate overall coding concentration: neither a uniform distribution (which would indicate maximum uncertainty) nor a highly concentrated distribution (which would indicate a small number of dominant codes).\u003c/p\u003e \u003cp\u003eDomain-level analysis revealed substantial variation:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eOncology\u003c/b\u003e exhibited the highest normalised entropy (0.903), indicating near-maximum coding uncertainty within the domain. The large number of distinct cancer types, each with relatively similar prevalence in the synthetic dataset, produces a distribution approaching uniformity. This finding is consistent with the clinical reality that oncology generates the broadest range of diagnostic codes and, correspondingly, the greatest semantic challenge for terminology governance.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eMental health\u003c/b\u003e domains showed comparatively low entropy, reflecting a smaller number of high-frequency condition codes (e.g., major depressive disorder, generalised anxiety disorder) with more concentrated distributions. - One domain exceeded a normalised entropy of 0.8 (oncology). Three domains fell below a normalised entropy of 0.5, indicating concentrated coding distributions where a small number of conditions dominate.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Interpretation\u003c/h2\u003e \u003cp\u003eThe entropy analysis provides a first formal quantification of semantic entropy across clinical domains, operationalising the concept proposed in Section \u003cspan refid=\"Sec10\" class=\"InternalRef\"\u003e3.2\u003c/span\u003e using information-theoretic measures. The finding that oncology exhibits the highest normalised entropy is consistent with the hypothesis that domains with greater terminological complexity produce greater coding uncertainty, and is relevant to the semantic entropy framework for two reasons.\u003c/p\u003e \u003cp\u003eFirst, it demonstrates that the concept of semantic entropy, while proposed here as a theoretical framework, can be operationalised using standard information-theoretic measures applied to clinical coding data. The Shannon entropy calculation is reproducible, domain-comparable, and interpretable.\u003c/p\u003e \u003cp\u003eSecond, the domain-level variation suggests that semantic entropy is not a uniform property of a terminology system but varies substantially across clinical subdomains. This has implications for governance: domains with high normalised entropy may require more granular terminology governance than domains with concentrated coding distributions. If AI and digital health terminology were to be incorporated into SNOMED CT, the entropy analysis predicts that it would likely exhibit high normalised entropy (similar to oncology) given the large number of novel, relatively equiprobable concepts.\u003c/p\u003e \u003cp\u003eThe mean normalised entropy of 0.6474 across all domains suggests that the current SNOMED-CT condition vocabulary operates at approximately two-thirds of its maximum possible coding uncertainty, a level consistent with a system that is functional but not optimally organised for retrieval and comparison.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e4.4 Limitations of the Computational Pre-Study\u003c/h2\u003e \u003cp\u003eFour limitations constrain the interpretation of these findings. First, Synthea generates synthetic data calibrated to U.S. healthcare delivery patterns; European clinical datasets with different disease prevalence profiles and coding practices may yield different entropy distributions. Second, the entropy analysis measures coding distribution, not semantic coherence: a domain could have high entropy (many distinct codes) without any semantic ambiguity if all codes are precisely defined. The analysis captures one dimension of the semantic entropy concept (distributional disorder) but not others (definitional ambiguity, mapping inconsistency). Third, the domain groupings used for the analysis are approximate and were derived from SNOMED CT hierarchy rather than a validated clinical domain taxonomy. Fourth, the analysis is limited to condition-level codes; procedure codes, medication codes, and observation codes would provide a more complete picture of semantic entropy across the full clinical vocabulary. Replication on European clinical datasets, ideally real-world data where ethically and legally permissible, would be required to confirm these findings beyond the synthetic U.S. context.\u003c/p\u003e \u003c/div\u003e"},{"header":"5. Results","content":"\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e5.1 Coverage Gap Findings\u003c/h2\u003e \u003cp\u003eThe coverage gap analysis yields two complementary findings: a targeted AI-term verification and a broader ontological coverage assessment.\u003c/p\u003e \u003cp\u003e \u003cb\u003eAI-specific term verification.\u003c/b\u003e Direct browser verification against SNOMED CT (January 2026 International Edition) and ICD-11 (2025 Update, February 2025) revealed a consistent pattern of absence. None of the eight AI-specific clinical workflow terms searched returned an exact or near-exact match in either terminology system.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cb\u003eAI-Specific Clinical Term Coverage in Reference Ontologies\u003c/b\u003e \u003cem\u003eNote: SNOMED CT International Edition, January 2026 release; ICD-11 2025 Update (February 2025); MeSH 2025/2026 edition.\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTerm\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSNOMED CT\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eICD-11\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eMeSH\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eStatus\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAI-assisted diagnosis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePartial*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo formal definition\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAlgorithmic triage\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo formal definition\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClinical decision support output\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePartial**\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePartial*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo AI-specific concept\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMachine learning prediction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePartial*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo clinical workflow concept\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLLM consultation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo formal definition\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAutomated clinical documentation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo formal definition\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAI-generated differential\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo formal definition\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAmbient clinical intelligence\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAbsent\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNo formal definition\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003e* MeSH contains general AI/ML terms but not clinical workflow concepts. ** SNOMED CT contains abstract decision support concepts but no AI/ML-specific outputs.\u003c/em\u003e \u003c/p\u003e \u003cp\u003eThe complete absence of AI-specific terms from SNOMED CT and ICD-11 (0 of 8) represents the sharpest edge of a broader ontological coverage failure. This targeted verification is complemented by a systematic coverage analysis of the full shared dataset.\u003c/p\u003e \u003cp\u003e\u003cb\u003eBroader ontological coverage analysis.\u003c/b\u003e Of the 70 unique terms in the shared dataset, 44 are eligible English-language health technology terms (after excluding 7 language variants, 8 commercial platform names, 9 theoretical constructs, and 2 non-health generic terms; see Supplementary Table in Stummer 2026, forthcoming in Health Informatics Journal, for classification rationale). Of these 44 terms, 22 (50.0%) are absent from all three reference ontologies. Only 14 (31.8%) have an exact match in at least one system. ICD-11 has effectively zero coverage of digital health terminology (0% exact match among the 44 terms), while MeSH performs best (27.3% exact match), largely through the Telemedicine descriptor (D017216) and its extensive entry terms. SNOMED CT covers 15.9% through core encounter-type concepts but lacks vocabulary for service delivery models, digital health infrastructure, and video-specific encounters.\u003c/p\u003e \u003cp\u003eThe coverage gap is not uniformly distributed. Blended and hybrid care models (blended care, hybrid care, digi-physical, mixed care) are entirely absent from all three ontologies, as is the \"digital\" family (digital health, digital care, digital platform). Video-specific terms (video visit, video meeting, video call), despite describing the dominant pandemic-era modality, have no formal coding in any reference system. The gap deepens further for AI-specific terminology: the 0/8 AI-term absence rate in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e is worse than the already concerning 50.0% general absence rate, suggesting that ontological coverage degrades as terminology moves from established telehealth concepts toward newer AI-mediated workflow terms.\u003c/p\u003e \u003cp\u003e \u003cb\u003eThe dual gap.\u003c/b\u003e These findings reveal two distinct but compounding problems. First, a practice gap: 78.6% of the 70 terms are used without explicit definition in the source documents, indicating that researchers and practitioners deploy terminology without anchoring it to formal definitions. Second, an ontology gap: even when researchers seek formal definitions, half the eligible health technology terms (50.0%) have no representation in reference ontologies to anchor to. The combination creates a terminology environment in which neither practice nor infrastructure provides definitional clarity.\u003c/p\u003e \u003cp\u003eICD-11, by scope, does not represent the mode of diagnosis generation; any representation of AI assistance must therefore reside outside ICD-11 proper [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. This separation complicates using ICD alone as a semantic backbone for AI-mediated care but does not preclude external provenance models that supplement ICD classification. The absence of AI workflow concepts from ICD-11 is not an oversight in its design; it is a consequence of its classification scope, which was defined before AI-generated clinical outputs became a practical reality.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003e5.2 Expressiveness Ceiling Analysis\u003c/h2\u003e \u003cp\u003eThe formal properties of EL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;were assessed against the semantic requirements of AI clinical concepts. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e summarises the expressiveness boundary.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eEL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;Expressiveness: Supported and Unsupported Constructs\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConstruct\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;Support\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eClinical Relevance\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConcept conjunction (A AND B)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCombining clinical features\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eExistential restriction (some R.C)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\"Has finding site: lung\"\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRole hierarchy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSubsumption of relationships\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDomain/range constraints\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eType checking\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNegation (NOT A)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\"Not contraindicated\"\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDisjunction (A OR B)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\"Either condition X or Y\"\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUniversal restriction (all R.C)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\"All results above threshold\"\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCardinality constraints\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\"At least 3 of 5 criteria met\"\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAI clinical concepts characteristically require semantics that fall outside EL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;support:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eProbabilistic outputs.\u003c/b\u003e An AI diagnostic tool that reports \"confidence score 0.87 for pneumonia\" requires a numeric probability bound to a clinical finding, a construct that EL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;cannot represent natively.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eConditional recommendations.\u003c/b\u003e \"If the patient meets criteria set X and the model confidence exceeds threshold Y, suggest intervention Z\" involves conditional logic, cardinality, and numerical thresholds simultaneously.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eModel provenance.\u003c/b\u003e \"Diagnosis generated by model version 3.2, trained on dataset comprising 50,000 chest radiographs from institution A\" requires provenance metadata that extends beyond clinical description into data lineage, a domain for which EL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;was not designed.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThese are not exotic or unusual requirements. They describe the routine operational characteristics of AI tools already deployed in clinical settings. The expressiveness ceiling is not a future concern; it describes a present-day mismatch between what AI tools produce and what the dominant clinical ontology can formally represent.\u003c/p\u003e \u003cp\u003eTo illustrate how this ceiling operates in practice, consider a concrete post-coordination attempt. The concept \"AI-assisted diagnostic suggestion with confidence score above 0.85, derived from a convolutional neural network trained on dermatoscopic images from a European population\" contains several semantic layers. SNOMED CT can represent the diagnosis (e.g., melanoma) and the body site, but cannot represent the mode of generation (AI-assisted), the confidence threshold (0.85), the model architecture (CNN), or the training population constraint (European). Post-coordination using SNOMED CT's defining relationships cannot bridge this gap because the required relationship types (hasConfidenceScore, hasModelArchitecture, hasTrainingPopulation) do not exist in the concept model. This is not a content gap that future releases could fill within EL++; it is a structural expressiveness limitation. While future extensions to the SNOMED CT concept model could in principle add relationships such as hasConfidenceScore or hasModelArchitecture, doing so would move the system beyond the current EL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;fragment and its polynomial-time tractability guarantees.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e5.3 Curation Scalability Evidence\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eScalability Indicators Across Health Terminology Systems\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSystem\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVocabularies\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eScale\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eCuration Method\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eKnown Limits\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSNOMED CT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1 (+\u0026thinsp;ext.)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e~\u0026thinsp;350,000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eManual\u0026thinsp;+\u0026thinsp;editorial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eCRS backlog unmeasured\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eICD-11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e~\u0026thinsp;55,000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eCommittee review\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e15K proposals/10\u0026nbsp;year; throughput unreported\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUMLS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e214\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e15.5M atoms\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSemi-automated\u0026thinsp;+\u0026thinsp;human\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e90% precision\u0026thinsp;=\u0026thinsp;thousands FP\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOAEI Biomed.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3\u0026ndash;28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eVariable\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAutomated matching\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDegrades with ontology size\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThis table warrants explicit attention to EMoT Rule 5: honest acknowledgment of where automated methods outperform manual curation. The evidence is clear that for large-scale vocabulary alignment, automated and semi-automated methods are not merely helpful but necessary. Nguyen et al. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] demonstrated that deep learning approaches outperformed rule-based methods by 14.1 F1 percentage points at UMLS scale. Faria et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] found that hash-based algorithmic matching was essential for tractable processing of large biomedical ontologies, achieving approximately 50% runtime reduction over naive pairwise comparison. Bada [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] framed automated concept mapping as a precondition for scalable biomedical literature analysis.\u003c/p\u003e \u003cp\u003eHowever, the same evidence shows that automation does not eliminate the problem. At 90% precision across 15.5\u0026nbsp;million atoms, the absolute number of errors remains large enough to require human adjudication for every release cycle [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Automation shifts the bottleneck from initial alignment to error review, but does not remove it. The fundamental issue is that the number of pairwise comparisons grows quadratically with vocabulary count, and each new source vocabulary added to the UMLS creates O(n) new potential mapping conflicts with existing vocabularies.\u003c/p\u003e \u003cp\u003eThe ICD-11 proposal platform [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] presents a different scalability challenge. With 15,000 proposals over 10 years and no published throughput data, it is impossible to estimate the system's absorption capacity. If even a fraction of the AI-specific clinical terms now entering practice were submitted as proposals, the queue time for incorporation could exceed the innovation cycle of the technologies being described.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e5.4 Cascade Fragility Evidence\u003c/h2\u003e \u003cp\u003eDos Reis et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] demonstrated that changes within one knowledge organisation system cascade through inter-terminology mappings in ways that cannot be predicted from the change operation alone. Their analysis of SNOMED CT and ICD-9-CM found that the semantic structure of concepts, the information used to define mappings, and the change operations must all be considered simultaneously to assess the impact of a single modification.\u003c/p\u003e \u003cp\u003eThe DyKOSMap framework [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] proposed heuristics for semi-automatic mapping adaptation, but the authors acknowledged that human review remains essential at each step. The framework \"facilitates\" rather than eliminates maintenance burden.\u003c/p\u003e \u003cp\u003eEach new vocabulary integrated into the UMLS Metathesaurus introduces O(n) new potential mapping conflicts with existing vocabularies. The O(n) theoretical bound represents a graph-theoretic worst case for mapping conflict growth; the empirically observed impact reported by Dos Reis et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e], while substantial, has not been quantified at this scale. The distinction between theoretical bound and observed impact should be maintained when interpreting these findings. At 214 vocabularies, a single concept change in one source can theoretically affect alignments with all other sources. Historical precedent confirms the severity of this fragility: the transition from ICD-10 to ICD-11 required complete remapping rather than incremental update, invalidating years of accumulated cross-walk tables.\u003c/p\u003e \u003cp\u003eRodrigues, Schulz, et al. [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] found significant modelling issues in more than one third of cases when examining SNOMED CT concept model quality, where concept model instances contradicted the intuitive meaning of Fully Specified Names. This suggests that semantic drift occurs not only between terminologies but within a single terminology over time.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec23\" class=\"Section2\"\u003e \u003ch2\u003e5.5 The Semantic Entropy Argument\u003c/h2\u003e \u003cp\u003eThe three structural limits described above, expressiveness ceiling, curation scalability, and cascade fragility, do not operate independently. They interact to create a compounding effect.\u003c/p\u003e \u003cp\u003eWhen a new AI clinical concept cannot be represented in EL++ (expressiveness ceiling), it either remains outside SNOMED CT or is approximated through post-coordination or extension mechanisms. Approximation introduces semantic imprecision. When that imprecise concept is then mapped to ICD-11 or other vocabularies through the UMLS (curation scalability), the imprecision propagates. When a subsequent release of any involved terminology modifies related concepts (cascade fragility), the accumulated imprecision may amplify in unpredictable ways.\u003c/p\u003e \u003cp\u003eWe propose the term \"semantic entropy\" to describe this directional phenomenon: the progressive increase in definitional disorder within a terminology system as the rate of new concept introduction exceeds the system's capacity for precise incorporation and maintenance. Semantic entropy is not a formal mathematical quantity at this stage. It is a conceptual framework identifying measurable indicators:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eCoverage gap rate\u003c/b\u003e: the proportion of terms in active clinical use that lack formal ontological representation. This indicator has two operationalisations in our data. The practice gap (78.6%) measures terms used without explicit definition in source documents; the ontology gap (50.0%) measures terms absent from all three reference ontologies (SNOMED CT, ICD-11, MeSH). Both are valid indicators of coverage failure, but they measure different dimensions: the former reflects researcher and practitioner behaviour, the latter reflects the structural limits of the ontological infrastructure itself. For AI-specific terms, the ontology gap is starker still: 0 of 8 tested terms have any formal representation.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eMapping failure rate\u003c/b\u003e: the proportion of terms that cannot be unambiguously mapped to a reference terminology\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eCuration backlog growth rate\u003c/b\u003e: the rate at which unprocessed term requests accumulate relative to processing capacity\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eWhen all three indicators trend upward simultaneously, the system is experiencing increasing semantic entropy. Our data suggest that this is the current trajectory for AI-related health terminology. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e summarises the convergence model and the three measurable indicators derived from this analysis.\u003c/p\u003e \u003c/div\u003e"},{"header":"6. Discussion","content":"\u003cdiv id=\"Sec25\" class=\"Section2\"\u003e \u003ch2\u003e6.1 Defining Semantic Entropy\u003c/h2\u003e \u003cp\u003eThe concept of semantic entropy, as proposed here, draws an analogy from thermodynamics: just as physical entropy measures the degree of disorder in a system, semantic entropy measures the degree of definitional disorder in a terminology system. The analogy is imperfect and should be treated as heuristic rather than formal. Unlike physical entropy, semantic entropy is not (yet) measurable in standardised units, and its behaviour under different governance interventions has not been modelled.\u003c/p\u003e \u003cp\u003eWhat the concept offers is a vocabulary for discussing a directional phenomenon that the literature has described qualitatively but not named. Da Silveira et al. [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] identified that domain knowledge evolution \"directly impacts terminologies and generates inconsistencies in underlying biomedical information systems,\" but stopped short of proposing a measure for the rate or severity of this impact. The dual gap identified in this analysis, where 78.6% of terms are used without definition in practice and 50.0% of eligible health technology terms lack representation in reference ontologies, suggests that disorder is present at both levels simultaneously. The practice gap indicates that the research community does not anchor its terminology use to formal definitions; the ontology gap indicates that, even where the intent to formalise exists, the infrastructure cannot accommodate half of the relevant vocabulary. The scalability evidence (90% precision at 15.5M atoms still producing massive error volumes) and the cascade fragility evidence (unpredictable change propagation across mappings) suggest that the system is moving toward greater disorder, not less.\u003c/p\u003e \u003cp\u003eThe \"coverage gap rate\" operationalised here corresponds to what the companion public health analysis (Stummer 2026, forthcoming in Health Informatics Journal) terms the \"definitional gap rate\": the proportion of terms lacking formal definitions in reference ontologies. The same empirical measurement thus functions simultaneously as a public health indicator (evidence synthesis feasibility), an informatics indicator (ontological coverage), and a market indicator (proportion of the market operating outside standardised vocabulary). In the present analysis, both the practice-level operationalisation (78.6%) and the ontology-level operationalisation (50.0%) are relevant: they represent complementary dimensions of the same underlying governance failure. In the companion public health analysis this dual gap appears as the indicator for public health risk (evidence synthesis and surveillance); in the companion business administration analysis (Stummer 2026, forthcoming in Health Policy) it appears as a market-structure indicator of the share of vocabulary operating outside standardised frameworks.\u003c/p\u003e \u003cp\u003eA candidate formalisation might define semantic entropy H(T) for a terminology system T as H(T) = -sum(p_i * log(p_i)) over the distribution of mapping outcomes for terms in T, where p_i represents the probability of each mapping state (unique match, ambiguous match, no match, deprecated match). A system approaching maximum entropy would show a near-uniform distribution across these states, indicating that mapping any given term is no more predictable than chance. Formal development of this metric is beyond the scope of the present paper but would provide a quantitative basis for monitoring ontological governance capacity over time. This sketch is provided only to illustrate that information-theoretic formalisation is feasible; this paper does not compute or validate H(T) empirically.\u003c/p\u003e \u003cp\u003eFormalising semantic entropy as a mathematical quantity is explicitly beyond the scope of this paper. Candidate approaches might include information-theoretic measures applied to terminology version diffs, network-based measures of mapping graph instability, or growth-rate ratios comparing concept introduction to curation throughput. Each would require dedicated methodological development and empirical validation.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec26\" class=\"Section2\"\u003e \u003ch2\u003e6.2 The 28-Year Gap\u003c/h2\u003e \u003cp\u003eCimino's desiderata [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] were articulated in 1998, at a time when the primary challenge was unifying fragmented clinical vocabularies into comprehensive, principled systems. The \"graceful evolution\" desideratum assumed that change would be incremental: new diseases identified, new procedures developed, existing concepts refined. The biannual release cycle of SNOMED CT and the decade-long revision cycle of ICD were designed for this tempo.\u003c/p\u003e \u003cp\u003eThe AI terminology explosion violates the assumption of incremental change. When hundreds of AI-based clinical tools receive regulatory clearance within a few years, each introducing novel workflow concepts, the rate of semantic change exceeds what any committee-based or semi-automated governance process can absorb. This is not a critique of the standards bodies' competence or resources. It is an observation that the design parameters of the governance infrastructure presuppose a rate of terminology change that no longer obtains.\u003c/p\u003e \u003cp\u003eAfter 28 years, \"graceful evolution\" remains an aspiration rather than an operational reality. The AI terminology explosion may convert it from an unmet desideratum to an unachievable one, at least within the current ontological architecture.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec27\" class=\"Section2\"\u003e \u003ch2\u003e6.3 When Existing Ontology Infrastructure Is Sufficient\u003c/h2\u003e \u003cp\u003eIt would be misleading to suggest that existing health ontology infrastructure is failing broadly. For established clinical domains, SNOMED CT, ICD-11, and the UMLS perform their intended functions with reasonable effectiveness.\u003c/p\u003e \u003cp\u003eDiagnoses, procedures, medications, anatomical structures, and laboratory observations are well-represented in reference ontologies, supported by decades of curation, and connected through mature cross-terminology mappings. The expressiveness ceiling of EL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;is rarely constraining for these concept types. The curation scalability of the UMLS, while stressed, remains functional for the existing vocabulary base. Cascade fragility, while real, is manageable when terminology changes occur at the historical pace.\u003c/p\u003e \u003cp\u003eThe structural limits identified in this paper become salient specifically when novel concept categories emerge that require richer semantics than EL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;provides, that arrive faster than governance processes can absorb, and that create mapping dependencies with existing vocabularies that amplify the cascade fragility problem. AI/ML clinical terminology is arguably the first major concept category to satisfy all three conditions simultaneously. For traditional clinical terminology, the current architecture may remain adequate for the foreseeable future.\u003c/p\u003e \u003cp\u003eThis concession does not diminish the urgency of the problem. If AI clinical tools continue to proliferate at current rates, the proportion of clinical activity that falls outside formal ontological representation will grow. A system that works well for 80% of clinical vocabulary but cannot represent the fastest-growing 20% is not failing; it is becoming progressively less complete, and the incompleteness is concentrated precisely in the domain where semantic precision matters most for patient safety and regulatory compliance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec28\" class=\"Section2\"\u003e \u003ch2\u003e6.4 The Historical Parallel\u003c/h2\u003e \u003cp\u003eThe current situation has a structural precedent. Before the emergence of HL7, DICOM, and related interoperability standards in the 1990s, electronic health record terminology was fragmented across proprietary vendor systems. Hammond [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] documented the pre-standardisation era in which hospital information systems created \"islands of incompatible data.\" Hayrinen, Saranto, and Nykanen [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] found in their systematic review that \"only very few papers offered descriptions of the structure of EHRs or the terminologies used,\" reflecting a period in which the concept of an electronic health record \"comprised a wide range of information systems.\"\u003c/p\u003e \u003cp\u003eResolution required three concurrent developments: binding standards that defined common data structures, certification requirements that incentivised adoption, and regulatory mandates that made compliance non-optional [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. The process took approximately 15 years from initial HL7 development to widespread adoption, and even then, Benson and Grieve [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] note that \"persistent inconsistencies and divergent interpretations\" remain.\u003c/p\u003e \u003cp\u003eThe AI terminology situation has the same tripartite structure (fragmented vocabularies, no binding standards, no certification requirements for terminology use) but operates on a compressed timescale with greater semantic complexity. Where the pre-HL7 fragmentation involved hundreds of proprietary systems generating thousands of non-standard terms over two decades, the AI terminology explosion involves hundreds of AI tools generating novel clinical concepts within a few years. The governance response that took 15 years for EHR interoperability may not be available as a timeline for AI terminology governance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec29\" class=\"Section2\"\u003e \u003ch2\u003e6.5 Implications for EHDS and EU AI Act\u003c/h2\u003e \u003cp\u003eBoth the European Health Data Space (EHDS) regulation and the EU AI Act assume semantic interoperability as a prerequisite for their policy objectives but neither addresses the ontology infrastructure gap identified in this analysis.\u003c/p\u003e \u003cp\u003eEHDS Article 7 mandates data quality standards for electronic health data, including requirements for semantic interoperability across member states. However, the regulation does not specify how AI tool terminology should map to existing reference ontologies, nor does it address the governance mechanisms needed to maintain such mappings as AI terminology evolves.\u003c/p\u003e \u003cp\u003eThe EU AI Act classifies AI systems by risk level and imposes transparency and documentation requirements for high-risk systems, a category that includes many clinical AI applications. These requirements implicitly assume that the outputs and processes of AI systems can be described in standardised terminology. If the terminology for describing AI clinical outputs does not exist in reference ontologies, the documentation requirements cannot be met in an interoperable manner.\u003c/p\u003e \u003cp\u003eThis regulatory gap creates a practical problem. As member states implement EHDS data quality standards and AI Act documentation requirements, the absence of standardised AI clinical terminology may force ad hoc solutions, precisely the kind of proprietary fragmentation that EHDS was designed to prevent.\u003c/p\u003e \u003c/div\u003e"},{"header":"7. Limitations","content":"\u003cp\u003e \u003cb\u003eL1. Expressiveness analysis is theoretical.\u003c/b\u003e The assessment of EL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;limitations is based on the formal specification of the description logic, not on empirical attempts to encode AI clinical concepts within SNOMED CT's authoring environment. It is possible that practical workarounds, including post-coordination, reference sets, or extension mechanisms, could accommodate some AI concepts within existing constraints. A concrete next step would be to attempt post-coordinated encoding of 20 or more AI clinical workflow concepts in EL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;and measure the failure rate, providing the first empirical estimate of the expressiveness ceiling's practical impact.\u003c/p\u003e \u003cp\u003e \u003cb\u003eL2. Scalability extrapolation from benchmarks.\u003c/b\u003e The evidence for curation scalability limits derives primarily from the UMLS alignment study by Nguyen et al. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] and the OAEI biomedical track benchmarks analysed by Faria et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. These controlled experimental settings may not reflect the full complexity of operational UMLS curation, where institutional knowledge, editorial guidelines, and iterative review processes may partially mitigate the precision limitations observed in benchmarks.\u003c/p\u003e \u003cp\u003e \u003cb\u003eL3. Coverage gap is point-in-time.\u003c/b\u003e The SNOMED CT and ICD-11 browser verification was performed against releases current as of late 2025 and early 2026. Both systems undergo regular updates, and future releases may incorporate some or all of the AI-specific terms found absent in this analysis. The structural arguments about absorption capacity and governance latency remain relevant regardless of incremental additions, but the specific coverage gap figures reported here should be understood as a snapshot, not a permanent state.\u003c/p\u003e \u003cp\u003e \u003cb\u003eL4. Semantic entropy is conceptual, not formalised.\u003c/b\u003e The concept of semantic entropy is proposed as a directional framework with identified measurable indicators. It has not been formalised as a mathematical quantity, validated against historical terminology evolution data, or tested for predictive utility. Its current value is heuristic: it names a phenomenon and identifies dimensions along which it might be measured. Whether it can be operationalised into a rigorous metric is an open question for future work.\u003c/p\u003e \u003cp\u003e \u003cb\u003eL5. Geographic and domain scope.\u003c/b\u003e The empirical dataset is drawn from Austrian and Swedish primary care, covering telemedicine (2020 to 2023) and AI integration (2023 to 2025). Other clinical specialties, particularly those with higher AI adoption rates (radiology, pathology, dermatology), may exhibit different terminology dynamics. Other geographic and regulatory contexts (US FDA framework, UK MHRA, Asian regulatory environments) may impose different constraints on terminology governance. The findings should be considered as indicative rather than universally generalisable.\u003c/p\u003e \u003cp\u003e \u003cb\u003eL6. Health informatics framing only.\u003c/b\u003e This analysis examines terminology proliferation exclusively through the lens of ontological infrastructure and semantic interoperability. The public health consequences (disrupted surveillance, impaired evidence synthesis) and the economic dimensions (market failure, vendor lock-in, standardisation economics) are addressed in companion papers within this series. The public health and economic consequences of the informatics limits identified here are explored in companion analyses examining evidence synthesis barriers and market failure mechanisms respectively. Readers seeking a comprehensive governance framework should consult the capstone synthesis. This paper is designed to stand alone as an independent contribution; familiarity with the companion pre-studies is not required for interpretation of the findings.\u003c/p\u003e \u003cp\u003e \u003cb\u003eL7. Synthetic data limitations in the computational pre-study.\u003c/b\u003e The Synthea-based entropy analysis (Section \u003cspan refid=\"Sec13\" class=\"InternalRef\"\u003e4\u003c/span\u003e) uses synthetic data calibrated to U.S. healthcare delivery patterns and does not incorporate European disease prevalence profiles, billing incentive structures, or coding practices. The condition vocabulary reflects Synthea's implemented disease modules rather than the full SNOMED CT hierarchy. Shannon entropy measures distributional disorder, not semantic ambiguity; a domain could have high entropy without any terminological confusion if all codes are precisely defined. Replication on European clinical datasets, ideally real-world data where ethically and legally permissible, would be required to confirm that the entropy patterns generalise beyond the synthetic U.S. context.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"8. Conclusion","content":"\u003cp\u003eThis paper has identified three structural limits in existing health ontology infrastructure that converge to prevent the absorption of AI-specific clinical terminology: an expressiveness ceiling in SNOMED CT's description logic (EL++), a curation scalability boundary at UMLS Metathesaurus scale, and cascade fragility in inter-terminology mappings. Direct verification confirms that none of the eight AI-specific clinical workflow terms tested have formal representation in SNOMED CT or ICD-11 as of early 2026. The broader ontological coverage analysis reveals a dual gap: 78.6% of digital health terms are used without explicit definition in research practice, and 50.0% of eligible health technology terms lack any representation in SNOMED CT, ICD-11, or MeSH. Only 31.8% have an exact match in at least one reference ontology. The AI-specific coverage gap (0/8) is starker still, suggesting that ontological coverage degrades as terminology moves toward newer, AI-mediated concepts.\u003c/p\u003e \u003cp\u003eThe concept of \"semantic entropy,\" proposed here as a directional framework for measuring definitional disorder in terminology systems, identifies three measurable indicators (coverage gap rate, mapping failure rate, curation backlog growth rate) that collectively suggest the system is moving toward greater disorder. The coverage gap rate itself has two complementary operationalisations: the practice gap (78.6%, measuring definitional behaviour in source documents) and the ontology gap (50.0%, measuring structural absence from reference ontologies). Cimino's \"graceful evolution\" desideratum, unmet after 28 years, may be unachievable within current ontology paradigms, absent a qualitative shift in ontology design that accommodates rapid, AI-driven concept emergence.\u003c/p\u003e \u003cp\u003e \u003cb\u003eScope limitation.\u003c/b\u003e These findings address the health informatics dimension of a phenomenon that spans public health, informatics, and economics. The structural analysis is based on EU/DACH empirical data and direct browser verification; generalisability to other geographies and clinical domains requires dedicated investigation.\u003c/p\u003e \u003cp\u003eFour directions for future work emerge from this analysis: (1) empirical expressiveness testing, in which researchers attempt to encode AI clinical concepts in EL\u0026thinsp;+\u0026thinsp;+\u0026thinsp;and measure the failure rate; (2) standards body latency measurement, quantifying the time from term request to incorporation in SNOMED CT and ICD-11; (3) formal semantic entropy development, operationalising the proposed concept into a mathematically rigorous metric with validated thresholds; and (4) EHDS implementation guidance addressing the ontology infrastructure gap that currently undermines the regulation's semantic interoperability objectives.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eFunding:\u0026nbsp;\u003c/strong\u003eThis research received no external funding.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflicts of interest:\u0026nbsp;\u003c/strong\u003eThe author declares no conflicts of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval:\u0026nbsp;\u003c/strong\u003eNot applicable. This study analyses publicly available terminology systems and a previously collected dataset; no human subjects were involved.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability:\u0026nbsp;\u003c/strong\u003eThe dataset of 1,194 terminology instances across 70 unique terms is described in the companion public health analysis (Stummer 2026, forthcoming in Health Informatics Journal). The complete term list, coverage matrix, and extraction protocol are available as supplementary material accompanying that publication. The Synthea synthetic patient data used in the computational pre-study (Section 4) were generated using the open-source Synthea Patient Generator (https://github.com/synthetichealth/synthea); the entropy analysis scripts and generated dataset are available from the corresponding author upon reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAI disclosure:\u0026nbsp;\u003c/strong\u003eClaude (Anthropic) was used for language editing and reference formatting. All analytical decisions, interpretations, and claims are the author\u0026apos;s own.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eBenson T, Grieve G. Principles of Health Interoperability: FHIR, HL7 and SNOMED CT. 4th ed. Springer; 2021. DOI: 10.1007/978-3-030-56883-2\u003c/li\u003e\n\u003cli\u003eSass J, Essenwanger A, Luijten S, Vom Felde Genannt Imbusch P, Thun S. Standardizing Germany\u0026apos;s electronic disease management program for bronchial asthma. Stud Health Technol Inform. 2019;267:81-85. DOI: 10.3233/SHTI190809\u003c/li\u003e\n\u003cli\u003eOniki TA, Coyle JF, Parker CG, Huff SM. Lessons learned in detailed clinical modeling at Intermountain Healthcare. J Am Med Inform Assoc. 2014;21(6):1076-1081. DOI: 10.1136/amiajnl-2014-002875\u003c/li\u003e\n\u003cli\u003eSNOMED International. SNOMED CT Browser (January 2026 International Edition). Available from: https://browser.ihtsdotools.org/\u003c/li\u003e\n\u003cli\u003eAmar J, April A, Abran A. Electronic Health Record and semantic issues using Fast Healthcare Interoperability Resources: systematic mapping review. J Med Internet Res. 2024;26:e45209. DOI: 10.2196/45209\u003c/li\u003e\n\u003cli\u003eDa Silveira M, Dos Reis JC, Pruski C. Management of dynamic biomedical terminologies: current status and future challenges. Yearb Med Inform. 2015;10(1):125-133. DOI: 10.15265/IY-2015-002\u003c/li\u003e\n\u003cli\u003eCimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med. 1998;37(4-5):394-403. PubMed: 9865037\u003c/li\u003e\n\u003cli\u003eCimino JJ. In defense of the desiderata. J Biomed Inform. 2006;39(3):299-306. DOI: 10.1016/j.jbi.2005.11.008\u003c/li\u003e\n\u003cli\u003eSNOMED International. Release notes (2020-2025). Available from: https://www.snomed.org/releases\u003c/li\u003e\n\u003cli\u003eRector AL, Brandt S. Why do it the hard way? The case for an expressive description logic for SNOMED. J Am Med Inform Assoc. 2008;15(6):744-751. DOI: 10.1197/jamia.M2797\u003c/li\u003e\n\u003cli\u003eWHO. WHO releases 2025 update to the International Classification of Diseases (ICD-11). 14 February 2025. Available from: https://www.who.int/news/item/14-02-2025-who-releases-2025-update-to-the-international-classification-of-diseases-(icd-11)\u003c/li\u003e\n\u003cli\u003eIbrahim H, Southern D, Zhang M, Macpherson A, Alsokhn C, Krpelanova N, Kostanjsek N, Jakob R. ICD-11 \u0026apos;by the people for the people\u0026apos;: the open feedback proposal platform. Health Inf Manag J. 2025. DOI: 10.1177/18333583251366915\u003c/li\u003e\n\u003cli\u003eWHO. ICD-11 Coding Tool. 2025. Available from: https://icd.who.int/ct11\u003c/li\u003e\n\u003cli\u003eNguyen V, Yip HY, Bodenreider O. Biomedical vocabulary alignment at scale in the UMLS Metathesaurus. In: Proceedings of the Web Conference 2021 (WWW\u0026apos;21). ACM; 2021. DOI: 10.1145/3442381.3450128\u003c/li\u003e\n\u003cli\u003eMantri S, Satokar KR, Tambe SB, Bhutad S. FHIR standard-based oncology data model for cancer screening. JMIR Cancer. 2025;11:e79011. DOI: 10.2196/79011\u003c/li\u003e\n\u003cli\u003eFaria D, Pesquita C, Mott I, Martins C, Couto FM, Cruz IF. Tackling the challenges of matching biomedical ontologies. J Biomed Semantics. 2018;9(1):4. DOI: 10.1186/s13326-017-0170-9\u003c/li\u003e\n\u003cli\u003eDos Reis JC, Pruski C, Da Silveira M, Reynaud-Delaitre C. Characterizing semantic mappings adaptation via biomedical KOS evolution: a case study investigating SNOMED CT and ICD. AMIA Annu Symp Proc. 2013;2013:333-342. PubMed: 24551341\u003c/li\u003e\n\u003cli\u003eDos Reis JC, Pruski C, Da Silveira M, Reynaud-Delaitre C. DyKOSMap: a framework for mapping adaptation between biomedical knowledge organization systems. J Biomed Inform. 2015;55:153-173. DOI: 10.1016/j.jbi.2015.04.001\u003c/li\u003e\n\u003cli\u003eSchulz S, Case JT, Hendler P, et al. SNOMED CT and Basic Formal Ontology: convergence or contradiction between standards? Appl Ontol. 2023;18(3):207-237. DOI: 10.3233/AO-230018\u003c/li\u003e\n\u003cli\u003eHammond WE. eHealth interoperability. Stud Health Technol Inform. 2008;134:245-253. PubMed: 18376051\u003c/li\u003e\n\u003cli\u003eHayrinen K, Saranto K, Nykanen P. Definition, structure, content, use and impacts of electronic health records: a review of the research literature. Int J Med Inform. 2008;77(5):291-304. DOI: 10.1016/j.ijmedinf.2007.09.001\u003c/li\u003e\n\u003cli\u003eRodrigues JM, Schulz S, Mizen B, Rector A, Serir S. Is the application of SNOMED CT concept model sufficiently quality assured? AMIA Annu Symp Proc. 2018;2017:1488-1497. PubMed: 29854218\u003c/li\u003e\n\u003cli\u003eBada M. Mapping of biomedical text to concepts of lexicons, terminologies, and ontologies. Methods Mol Biol. 2014;1159:33-45. DOI: 10.1007/978-1-4939-0709-0_3\u003c/li\u003e\n\u003cli\u003eWalonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, Duffett C, Dube K, Gallagher T, McLachlan S. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. 2018;25(3):230-238. DOI: 10.1093/jamia/ocx079\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Martin Luther University Halle-Wittenberg","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"semantic interoperability, health ontology, SNOMED CT, ICD-11, UMLS, EL + + description logic, AI clinical terminology, semantic entropy, terminology governance, ontological coverage","lastPublishedDoi":"10.21203/rs.3.rs-9441335/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9441335/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eHealth data interoperability depends on shared ontologies, principally SNOMED CT, ICD-11, and the UMLS Metathesaurus, to provide a common semantic foundation for clinical information exchange. These systems were designed for a world in which new clinical concepts emerged slowly and could be curated through manual or semi-automated processes. The rapid proliferation of artificial intelligence and machine learning terminology in clinical practice may violate this assumption. This paper presents a structural analysis of three converging limits in existing health ontology infrastructure: (1) an expressiveness ceiling imposed by SNOMED CT's description logic (EL++), which cannot represent probabilistic, conditional, or provenance-bearing semantics characteristic of AI clinical tools; (2) a curation scalability boundary, evidenced by the UMLS Metathesaurus at 214 source vocabularies and 15.5 million atoms, where even 90% alignment precision produces thousands of false positives; and (3) cascade fragility in inter-terminology mappings, where changes in one knowledge organisation system propagate unpredictably through dependent systems. Using a shared dataset of 1,194 terminology instances (70 unique terms) from Austrian and Swedish primary care, we identify a dual gap: 78.6% of terms are used without explicit definition in the source documents (a practice gap), while a separate ontological coverage analysis of the 44 eligible English-language health technology terms finds that 50.0% (n=22) are absent from all three reference ontologies (SNOMED CT, ICD-11, MeSH), with only 31.8% having an exact match in at least one system. Direct browser verification of eight AI-specific clinical workflow terms confirms that none have formal representation in SNOMED CT or ICD-11. We propose the concept of \"semantic entropy\" as a directional measure of definitional disorder in terminology systems approaching their governance capacity. We argue that Cimino's \"graceful evolution\" desideratum, articulated in 1998, remains unmet after 28 years, and that the AI terminology explosion may render it unachievable within current ontology paradigms, absent a qualitative shift in ontology design that accommodates rapid, AI-driven concept emergence.\u003c/p\u003e","manuscriptTitle":"Semantic Entropy in Digital Health: When Ontological Mapping Reaches Its Computational Limits","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-20 02:03:49","doi":"10.21203/rs.3.rs-9441335/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"6c1c044f-4934-4448-a3e7-d90ba42d1dab","owner":[],"postedDate":"April 20th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":66472091,"name":"Medical Informatics"},{"id":66472092,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2026-04-20T02:03:49+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-20 02:03:49","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9441335","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9441335","identity":"rs-9441335","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00