MAP-CARE: Enhancing Cross-Lingual Medical Intervention Terms Analysis Through LLM-supported Semantic Embeddings | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article MAP-CARE: Enhancing Cross-Lingual Medical Intervention Terms Analysis Through LLM-supported Semantic Embeddings Hugo Guillen-Ramirez, Karen Triep, Christophe Gaudet-Blavignac, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6848278/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 09 Jan, 2026 Read the published version in Scientific Reports → Version 1 posted 3 You are reading this latest preprint version Abstract Background: Cross-lingual information retrieval limits global exchange of data because of the high diversity in the methods to classify, document and encode medical procedures. Traditional keyword-based or single-language systems are not able to align data from surgical and interventional procedures, especially from non-English healthcare systems. This study aims to develop a pipeline for cross-lingual retrieval and integration of medical procedures data. Results: MAP-CARE is a novel framework that leverages Large Language Models (LLMs) for translating and transforming medical procedures into a unified multilingual embedding space. S emantic embeddings are used to enhance retrieval accuracy and interoperability across languages and healthcare systems. MAP-CARE demonstrated high accuracy in the translation and mapping of clinical terms. Its cross-language translation performance proved robust, achieving up to 90% accuracy in translating procedure classification codes across English, German, French, and Italian—when considering the correct term among the top five retrieved. The cross-classification mapping workflow also showed high accuracy in aligning two different national procedure classifications, with exact and near matches exceeding 53.8% at the most granular level. Conclusion: MAP-CARE offers a flexible, scalable, and robust solution for the multilingual and cross-system integration of medical procedural data. Its innovative use of large language models (LLMs) combined with semantic embeddings sets a new standard for the accessibility and utility of multilingual medical information. The framework is designed for easy extension from a terminology file in CSV format and is publicly available [1]. Biological sciences/Computational biology and bioinformatics/Classification and taxonomy Biological sciences/Computational biology and bioinformatics/Data integration Biological sciences/Computational biology and bioinformatics/Computational models semantic embedding interoperability LLM cross-language mapping medical procedures non-English healthcare systems terminology classification Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Harmonizing data from electronic health records (EHRs) remains a global challenge. Initiatives such as the Unified Medical Language System (UMLS) [ 2 ] and the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) [ 3 ] aim to standardize and integrate heterogeneous health data vocabularies and sources. These frameworks are crucial for ensuring interoperability in multilingual environments, where variations in language morphology can lead to significant inconsistencies and misinterpretations in medical data. The current linguistic and structural diversity of medical terminologies calls for robust tools that can not only translate but also semantically align concepts across languages and classification systems. While platforms like UMLS and OHDSI’s Athena tool offer comprehensive vocabularies and mappings, they primarily focus on established standardized terminologies and may lack the flexibility to accommodate country-specific or health-system-specific classifications. Significant challenges remain in aligning data from medical procedures and surgical procedures from non-English speaking regions. Classifications of medical procedures encode medical interventions and are essential for monitoring, billing, quality control, and research within the healthcare environments. Medical procedure codes are typically multiaxial by including abbreviations and free-text descriptors, which are often ambiguous, lacks specificity, or includes multiple possible interpretations, making semantic mapping to a standardized code difficult. Additionally, they often include complex inclusion and exclusion criteria and are context-sensitive, with modifiers such as material used or surgical approach further influencing their meaning. Retrieving a specific code relying only on keyword matches often fails to accommodate the complexity and diversity of medical terminologies and classifications, overlooking nuances and differences crucial for precision. For instance, a “gallbladder removal” query may not retrieve records labelled as “cholecystectomy,” the medical term for the same procedure. Similarly, a “knee replacement surgery” search might miss entries listed under more formal terms such as “total knee arthroplasty”. Given that English is widely regarded as the standard language for knowledge exchange, terminology standardization, and research, there is also a need for data accessibility through English for term searches to bridge linguistic and classificatory gaps. Contextual and national procedural classifications often rely on specialized terminology, and their integration into international standards such as International Classification of Health Interventions (ICHI) [ 4 ] is fraught with complexities due to their hierarchical structures and linguistic nuances. For example, procedural terminologies and coding systems widely used in the U.S., such as the Current Procedural Terminology (CPT) of American Medical Association [ 5 ], the Healthcare Common Procedure Coding System (HCPCS) [ 6 ], are not always directly translatable to systems used elsewhere. As a result, importing or aligning information from national systems such as Germany's Operations and Procedures Catalogue (OPS) [ 7 ], or France's Classification Commune des Actes Medicaux (CCAM) [ 8 ], Switzerland’s Operations and Procedures Catalogue (CHOP) [ 9 ], to name a few, remains challenging in the majority of countries worldwide. Several systems have been developed to facilitate the automated extraction and linking of clinical terms to standardized vocabularies. Over the past 20 years, rule-based mapping has been the most commonly used method, applied across various classifications and use cases. For example, MetaMap [ 10 ] was designed to identify UMLS concepts within free-text clinical narratives. Similarly, systems like those by Liu et al. [ 11 ], cTAKES [ 12 ], HeTOP [ 13 ], and FasTag [ 14 ] employ rule-based methods to extract structured information from clinical text and map it to international classifications, ontologies, and terminologies such as International Classification of Diseases (ICD) [ 15 ], RxNorm [ 16 ], the Human Phenotype Ontology (HPO) [ 17 ], and Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) [ 18 ]. The design of SNOMED CT supports its role as a universal language for health care and reflects features typical of natural language, including synonymy, hierarchical structure, compositionality, and contextual adaptability [ 19 , 20 ]. However, despite the effectiveness of these systems in identifying terms from free text, challenges remain in dealing with non-standardized or evolving procedural codes—particularly in non-English contexts. Recent advances in natural language processing (NLP) have introduced powerful tools to process, interpret, and standardize language in complex domains such as healthcare. Among these, word embeddings have emerged as a particularly promising technique. By transforming words into numerical representations that capture contextual and semantic relationships, embeddings approximate human-like understanding of language [ 21 – 24 ]. This capability enables computers to interpret medical texts with a nuance approaching that of human reasoning. Different methods, such as pretrained language models (PLMs) and key-word extractor were compared to populate defined procedural elements [ 25 – 28 ]. However, their application in healthcare remains a work in progress, due to the unique linguistic demands and conceptual complexity of biomedical terminology as pointed out by Chiu & Baker [ 29 ]. The rise of large language models (LLMs) has further advanced the field, providing AI systems capable of extracting insights, summarizing complex information, and supporting clinical decision-making. Nonetheless, effective deployment of LLMs in healthcare requires domain-specific adaptation to ensure both reliability and scalability [ 30 ]. Despite growing interest in clinical NLP, there remains a notable scarcity of research on terminology embeddings in non-English languages, which poses significant challenges for accurate information retrieval from multilingual electronic health records (EHRs). To address these longstanding challenges, this study introduces MAP-CARE (Multilingual Approach for Procedures in Clinical and Retrieval Embeddings)—a novel framework that redefines access to multilingual medical procedural data. MAP-CARE uniquely enables seamless integration of multilingual data and allows retrieval of medical information documented in non-English languages using English as the query language. By leveraging large language models (LLMs) and advanced multilingual embeddings, the framework encodes medical terminologies into a unified semantic space, capturing cross-lingual and cross-classification relationships with high precision. This approach moves beyond traditional rule-based mapping by offering a scalable, data-driven solution for semantic interoperability. The following sections outline the methods underpinning MAP-CARE and demonstrate how these conceptual innovations are operationalized into practical tools that significantly enhance cross-lingual alignment of surgical and diagnostic procedures. To our knowledge, this is the first comprehensive work on the multilingual representation of medical procedures within the full domain of surgical and diagnostic classifications. Methods Workflow overview The MAP-CARE framework, depicted in Fig. 1 , begins by processing a (non-English) medical classification, exemplified by the Swiss Classification of Operations and Procedures (CHOP). Each term from the classification undergoes a series of transformations. First, terms are sanitised (e.g. removing self-references, substitution of standard abbreviations. Second, the term is translated and contextually augmented using an LLM, Gemma2 [ 31 ]. Third, the enhanced description is transformed into a semantic numeric representation. The enhanced term’s text is encoded into a 1024-dimensional numeric vector that captures its semantic meaning using the embedding model mxbai-embed [ 32 , 33 ]. Finally, the generated vectors from the complete classification are stored in ChromaDB [ 34 ], a database designed for efficient vector management and retrieval. Translation and preprocessing of CHOP and OPS codes The Swiss Classification of Operations (CHOP) is a descendant of the ICD-9-CM developed by the Swiss Federal Statistics Office independently since 2008. CHOP is structured hierarchically in a tree-like architecture, organizing medical procedures across multiple levels of specificity. Anatomical regions are indicated at the first and second levels using two-digit codes, while procedural invasiveness and methods are detailed at the third and fourth levels with four-digit codes. The hierarchy culminates in highly specific terminal codes at the six-digit level, resulting in over 10,860 distinct entries. Although a hierarchical structure is present, it is inconsistently applied—particularly where child concepts do not inherit all explicitly defined features, and levels of specificity are not systematically represented. The classification is translated across three Swiss national languages: French, German, and Italian. Despite its comprehensive nature, the classification exhibits semantic inconsistencies and is limited to formats in PDF and CSV. CHOP version 2023 was used for this project. The German Operationen- und Prozedurenschlüssel (OPS) is an adaptation of the International Classification of Procedures in Medicine (ICPM) specifically tailored for the German healthcare system. Maintained by the Federal Institute for Drugs and Medical Devices (BfArM), the OPS categorizes surgical operations and medical procedures in a hierarchical structure, similar to the CHOP system used in Switzerland, and consists of ca 33’750 distinct terminal codes in the 2023 version [ 7 ]. The OPS codes are organized from broader categories to highly specific procedures, represented by a combination of digits and letters that detail procedural types and methods. The XML of the 2023 OPS was parsed into a CSV file for further analysis. The sanitised CSV files for CHOP and OPS were translated using Gemma2 accessed via the ollama interface [ 35 ]. Few-shot learning was employed to optimize the translation accuracy for the clinical context. The prompts (“model files”) engineered for this task are described in Supplementary Table 1. A manual cleaning was done after the LLM provided the preliminary English translations. This phase focused on ensuring the translations were appropriate, refining formatting, and removing extraneous outputs from the LLM. The quality of CHOP translations, was validated by using the OHDSI Usagi mapping workflow tool [ 36 ]. The English translation of the German version of CHOP was uploaded into the Usagi tool. First, the mapping of the translated terminal CHOP codes (10,860 in total) to corresponding vocabulary concepts (SNOMED CT) was done automatically. Subsequently, manual validation of the CHOP to SNOMED CT mapping was conducted for the representative subset of terms and the quality of translation was assessed by using OHDSI mapping equivalence framework. Embedding generation Following the translation and cleaning of the CHOP codes, embeddings were generated to encapsulate the semantic relationships and nuanced meanings embedded within the data. For this purpose, the mxbai-embed-large model was accessed via the ollama interface. This model was chosen for its capability to produce high-dimensional vector representations while maintaining a small memory footprint. The preparation of data for embedding involved encoding each CHOP entry into a structured format. Each entry was formatted into a single line, with key fields concatenated and separated by a vertical bar ('|'). Included in these fields were the unique identifier for each CHOP code (zcode), the code's title in its original language (German, French, or Italian), the translated title in English, and a translated description of the CHOP procedure. A similar format was used for OPS. To improve the search results, a prompt was required on each input string. "Represent this sentence for semantic searching" was used as a prefix for each row, guiding the embedding model to capture the intended semantic content more effectively. The resulting set of vectors, referred to as the "embedding space," was produced from this detailed preparation. This space now serves as a comprehensive semantic landscape, enabling advanced searches and analyses based on the contextual relationships defined by the CHOP codes. Expert validation of CHOP-to-OPS mappings MAP-CARE was evaluated for its ability to map Swiss CHOP codes to German OPS codes across a sample of 494 terms that included a diverse range of common and rare surgical procedures across various medical specialties, methods, used material and anatomic locations. The mapping follows the strategy of directly search the term from one language into the embedding space of another language, as opposed to comparing text directly (Fig. 2 ). Two manual expert evaluations were performed: Evaluation 1 (conducted by KT and OE, Supplementary Table 2) focused on the fidelity of German CHOP texts relative to their English translations, specifically whether the meaning of the original text was preserved. In order to assess the interrater reliability, the resulting values of both raters were compared. Evaluation 2 (conducted by KT, Supplementary Table 3) verified whether the translated CHOP codes semantically matched their translated OPS counterparts, using the OHDSI Usagi tool for SNOMED CT–based guidance, then manually reviewing correctness and procedural specificity of each suggested mapping for similarity. For both evaluations (1: German → English; 2: SNOMED CT of CHOP → SNOMED CT of OPS), each match was rated according to the OHDSI mapping equivalence framework: Equal : Original meaning is fully preserved. Equivalent : Slight inaccuracies, but essential meaning intact. Wider : Mapped term is a broader parent concept. Narrower : Mapped term is a more specific child concept. Inexact : Partial overlap, with some meaning lost. Unmatched : No suitable mapping was identified. Unreviewed : Cases not validated by both reviewers. In parallel, Evaluation 3 (Supplementary Table 4) was conducted using an LLM-as-a-judge strategy using the following rubric: Exact Match : The retrieved term perfectly corresponds with the intended term in both meaning and context. Near Match : The retrieved term is closely related but exhibits slight variations in specificity, context, or comprehensiveness. Partial Match : The retrieved term shares certain semantic features with the target term but lacks essential details necessary for an accurate match. Mismatch : The retrieved term has no semantic relation to the target term. Multilingual terminology mapping evaluation The terminal codes across the CHOP classifications in German, French, and Italian were compiled. Each code, potentially linked to multiple entries due to variations in descriptions and contextual details across languages, was retained for further analysis. The mapping process followed the strategy depicted in Fig. 2 . Each entry from the CHOP classification was queried against its counterparts in the other two languages using ChromaDB. Specifically, for each entry, a query was executed to retrieve the single most relevant result (the top-1 result) from the databases of the other languages. To broaden the scope of the semantic analysis, the query was expanded to also retrieve the top-5 results for each entry. The accuracy of this mapping strategy was assessed by quantifying the frequency with which codes from one language’s CHOP classification matched correctly with the other language. Results Machine translation The quality of the translations of CHOP codes was rigorously assessed using the similarity of terminal codes with mapped SNOMED CT terms as a proxy for semantic equivalence. Through the OHDSI Usagi mapping tool, a total of 10,860 CHOP terminal codes were linked to corresponding SNOMED CT concepts. This linkage was subjected to manual validation for a subset of codes, where each mapping was reviewed to assess similarity. Cross-system matching Expert evaluations The MAP-CARE system's functionality was tested through cross-system and cross-language matching tasks. First, the MAP-CARE system's performance was evaluated across a dataset of 494 medical procedure mappings, manually analysing the efficacy and accuracy of semantic matching between the Swiss CHOP and German OPS coding systems (Fig. 3 and Supplementary Table 1). A set of comparative analyses was conducted to determine how consistently the two human expert evaluations (Evaluation 1 for translation and Evaluation 2 for semantic matching) aligned with each other and with a semantic matching using an LLM-as-a-judge approach (Evaluation 3). Evaluation 1 (translation, 2 raters) The evaluation of translations (Fig. 3 a) was conducted using six categories to capture varying degrees of alignment between the original text and the translated content: equal, equivalent, inexact, narrower, wider, and unmatched. The distribution of these categories in the confusion matrix is shown in Fig. 3 . The rater percentage agreement was 75.9% and the interrater reliability (Cohen’s Kappa) between the 2 raters was 0.34 while considering the six labels. The majority of translated codes were classified as equal, with both evaluators agreeing on this category in 339 cases (68.6%). This suggests that a substantial portion of the translations preserved the original clinical meaning without deviation. In 23 cases, one evaluator marked a code as equal while the other classified it as equivalent, indicating slight inaccuracies or minor wording differences that did not compromise the essential meaning. A total of 21 cases were mutually identified as equivalent (4.7%), reinforcing that a small subset of translations contained subtle inaccuracies yet still conveyed the core concept accurately. The inexact category accounted for 27 instances (5.5%) of agreement between evaluators. These cases reflect translations where some degree of semantic overlap existed, but certain details or nuances were lost. Notably, 18 cases initially labelled as "equal" by one evaluator were instead considered "inexact" by the other, suggesting that some translations perceived as precise by one reviewer were seen as incomplete or ambiguous by the other. The presence of codes classified as inexact, narrower, or wider suggests that certain clinical terms may pose particular challenges in translation, requiring improved guidance or clearer criteria for distinguishing between these nuanced categories. Evaluation 2 (semantic matching) Evaluation 2 assessed the semantic accuracy of mappings, evaluating whether CHOP and OPS codes maintained conceptual alignment despite the absence of direct linguistic equivalence (Fig. 3 b). The evaluation was conducted using the OHDSI Usagi tool, followed by a manual review to ensure correctness and procedural specificity. Across the mappings, the system achieved 52 of 494 SNOMED CT concept pairs (10.5%) were rated as equal (identical SNOMED CT concept identifier) confirming that the procedures described in CHOP and OPS were fundamentally the same in both meaning and medical intent and 20 of 494 (4%) as semantically equivalent (different SNOMED CT concept identifier) indicating minor semantic variation that did not impact clinical interpretation. These mappings typically involved synonyms, slight phrasing differences, or minor variations in procedural description without altering the core medical concept. A total of 221 cases (44.7%) were classified as inexact, meaning that the mapped procedures shared some commonalities but differed in key details, such as anatomical specificity or procedural approach. Further distinctions were observed in 132 cases (26.7%) categorized as wider, where the CHOP term encompassed a broader procedural scope than the OPS counterpart. Conversely, 64 cases (13%) were narrower, meaning that the CHOP term was more restrictive than the corresponding OPS term, limiting the procedural scope. Notably, no mappings were categorized as unmatched. Evaluation 3 (LLM-based semantic matching) While evaluating the matching with an LLM (Fig. 3 c), the system achieved 127 exact matches, representing 25.7% of the evaluations, where procedures were correctly aligned with high fidelity in both procedural specificity and anatomical locations. For example, ‘coronary angioplasty with antibody-coated balloons’ was precisely matched, linking CHOP code Z00.66.22 to OPS code 8-83b.b1. Near matches, more frequent with 144 instances (29.1%), were correct in alignment but lacked critical details; for instance, ‘removal of an intracranial implant’ (CHOP Z01.39.50) was matched to ‘removal of a neuroprosthesis’ (OPS 5–029.b). While the core procedures align, their specificities differ. Partial matches were the most common, observed in 176 cases (35.6%), and indicated correct categorization but missing specific procedural or anatomical details. A notable example includes the difference in invasiveness between 'other craniotomy for evacuation of an epidural hematoma' (CHOP Z01.25.11) and 'therapeutic percutaneous puncture of an epidural hematoma' (OPS 8-159.4); the former is an open surgery, while the latter is a minimally invasive procedure. Mismatches occurred in 47 instances (9.5%), showing fundamental errors, such as confusing 'instillation of a uterine tube' (CHOP Z66.8) with 'foetal implantation of a pacemaker' (OPS 5-755.8). A chi-square test of goodness-of-fit was conducted to assess how well the observed distribution of these categories fits an expected uniform distribution across the four outcomes. The chi-square statistic was calculated at 73.21 ( \(\:p<8.78\times\:{10}^{-16}\) ), suggesting that the observed frequencies of match categories are not evenly distributed and thus indicating that the system performs differently across these categories. The system's performance varied significantly across different hierarchical levels (Fig. 4 ): Level 3 : Displayed an equal distribution of exact and near matches (41.67% each) but also presented a notable rate of mismatches (8.33%), pointing to difficulties at more abstract coding levels. Level 4 : Showed less success in achieving exact matches (21.43%) and more frequent partial matches (35.71%), suggesting challenges in maintaining specificity amidst generalization. Level 5 : achieved 50% exact and near matches, underscoring the system's strength in detailed procedural matching. Level 6 : Exhibited a balanced distribution across all categories, with exact matches at 26.36% and partial matches at 36.68%, though mismatches persisted at 9.51%. Cross-language matching In order to test the cross-language capabilities of MAP-CARE, we developed an automated strategy to assess the mapping accuracy of CHOP codes within the MAP-CARE embedding space across German, French, and Italian versions. Each term is mapped between language pairs to verify if the original CHOP code corresponds to the mapped code in another language’s vector store (Fig. 2 ). The evaluation of the mapping accuracy was conducted in two stages: top-1 match, and top-5 matches, visualized in Fig. 5 . The 'top-1 match' refers to the closest term retrieved, indicating the system's ability to directly match an entry with the highest accuracy. On the other hand, 'top-5 matches' considers whether the correct term appears within the first five terms retrieved. This measure allows for a broader assessment of the system’s capability to identify relevant terms, even if they are not the highest ranked, which is crucial for applications where multiple similar options may be clinically relevant. The performance at the top-1 result level demonstrates that the system achieves a moderate level of accuracy, with more than 59% accuracy across all language pairs. When the evaluation criteria were expanded to include the top five matches, the system’s performance notably improved, achieving over 70% accuracy across all language pairs. Upon manual review of the mismatches, it became apparent that many were synonymous entries within the CHOP classification. This broader matching criterion enables the system to more effectively capture semantic relationships between terms, even when direct matches are not the highest ranked. Discussion The MAP-CARE workflow facilitates the integration and accessibility of multilingual medical procedural data through the application of large language models (LLMs) and vector embeddings. In contrast to established systems such as UMLS, which align numerous terminologies, MAP-CARE introduces a dynamic approach to (i) automatically align previously unmapped terminologies, thereby reducing the costs associated with manual mapping efforts, and (ii) assist users in efficiently navigating extensive medical classifications by ensuring that query results include all semantically relevant codes, eliminating the need for exhaustive exploration of the entire classification. This is particularly critical for elemental classifications, typically available in CSV or PDF formats, which are often searchable only by explicit terms and lack synonym recognition. MAP-CARE’s capability to address linguistic complexities, coupled with the granularity and variability of non-English and less standardized terminologies, positions it as a valuable tool where conventional keyword-based search methods are inadequate. MAP-CARE effectively circumvents linguistic barriers inherent in medical terminology by enabling precise mapping across four languages and two different classifications. The system's demonstrates only 9.5% mismatches at the most detailed hierarchical levels, is therefore robust and particularly of interest since these final levels precisely describe a procedure, often including an exact anatomical region, approach, method and material. These results are promising because the system helps leverage the most granular and, therefore, the most work-intensive mapping level. It is important to note that no official mapping exists between the OPS and CHOP classification systems, despite the potential for valuable knowledge exchange and collaboration between countries. This lack of mapping arises from fundamental structural differences between the two systems, even though both are available in German, and the manual mapping process is highly burdensome due to their complexity. The level of aggregation can be achieved through explicit customisation and system training and by using cross-mapping to the refined hierarchical terminologies such as SNOMED CT [ 19 ]. However, using the Usagi tool does not provide a sufficient quality of the SNOMED mappings when executed automatically which led to only 273 (55.3%) concept pairs in total rated as equal, equivalent, narrower or wider. Next steps in the project include creating embeddings of established international medical lexical database, such as SNOMED CT, ICD-10, OPS, ICHI and Classification of Operations and Procedures (OPCS-4), National Health System NHS [ 37 ]. Through expanding the hierarchical nature of SNOMED, a knowledge graph representation enables more effective semantic reasoning, enhanced data interoperability, and improved clinical decision support by capturing complex relationships and contextual nuances among medical concepts [ 38 ]. While evaluating MAP-CARE on cross-language tasks, it showed high efficacy in French and Italian. Contrastingly, German presented unique challenges that underscore the complexities of semantic interoperability. The observed lower accuracy rates in mappings from German to French and Italian (61% and 59%, respectively) suggest an underlying structural linguistic divergence, for example a typical characteristic to condense complex ideas into single words. These findings might reflect the intricate nature of German medical nomenclature and CHOP, which often lacks direct equivalents in the more Latinate vocabularies of French and Italian. This divergence not only impacts the system’s efficiency but also highlights the broader challenges of standardizing medical terminology across languages with disparate etymological roots. Enhancements to MAP-CARE's performance were observed when the criteria were expanded to include the top five matches, where accuracy rates exceeded 70% across all language pairs. This improvement demonstrates the system's capability to capture broader semantic relationships, emphasizing the potential of flexible matching criteria to accommodate linguistic variations. However, challenges remain, particularly in integrating specific operational markers like '**' in OPS, which are critical in distinguishing between procedural nuances such as open versus laparoscopic approaches. Another challenge identified is when the nomenclature relies on mixed-term coding where multiple concepts are embedded within a single term, which is the case for CHOP. This lack of stringent compositionality which reduces semantic precision and impedes data interoperability across systems. For instance, procedural codes that fail to specify methods or surgical techniques obscure essential details, as evidenced by mismatches during CHOP-to-OPS mapping. Such ambiguity hinders cross-system integration and restricts the ability to conduct detailed procedural analyses. One possible solution to this issue can be segmenting procedures in granular, composable elements (e.g., device type, surgical approach, and anatomical site). Integrating this methodology into MAP-CARE would enhance its ability to represent and map medical procedures across classifications and languages semantically by moving beyond mixed-term coding and addressing marker code integration. Finally, prospective applications of MAP-CARE include its use in feature engineering for machine learning. For example, semantic cluster assignments could be utilized as a new feature to implicitly model the invasiveness of procedures, enriching predictive analytics and downstream decision-making. Furthermore, MAP-CARE's architecture allows for seamless extensions to facilitate natural language searches within the embedding space. Concept embedding can be used to harmonize heterogeneous datasets and mitigate local learning bias in federated learning [ 39 ]. By converting queries into their embedded vector representations, the system can efficiently identify and retrieve the most relevant terms, broadening its utility in diverse clinical and research settings. Conclusion MAP-CARE represents a transformative approach in healthcare informatics, successfully leveraging the power of large language models and advanced embedding techniques to address the complexities of multilingual medical terminology. Its ability to accurately align and interpret medical procedures across languages supports the accessibility and utility of medical procedural information across linguistic and healthcare domains. Future work will aim to expand the application of MAP-CARE to include additional medical terminologies and refine its analytical capabilities to extract even more detailed insights from complex medical data sets, thereby increasing accuracy. Enhancing the system’s ability to integrate with various medical terminologies, frameworks, and electronic health record systems could further amplify its impact. This continued development will optimise the system’s architecture and scalability to support a broader range of clinical and administrative applications. Declarations Ethics approval and consent to participate : Ethical approval was not required for this study as no human or animal data were used. Consent for publication : Not applicable' Availability of data and materials : GitHub [40] Competing interests : The authors have no competing interests to declare that are relevant to the content of this article. Funding: The research leading to these results received funding from Swiss Personalized Health Network SPHN under the Demonstrator Project INFRA: INFection Radar. Authors' contributions Conceptualization: HGR, OE, KT, GB; Methodology: HGR, OE, KT; Formal analysis and investigation: HGR, OE, KT, BP; Writing - original draft preparation: HGR; Writing - review and editing: HGR, OE, KT, GB, CGB; Funding acquisition: OE, GB. Acknowledgements: University of Bern for a kind permission to use the UBELIX, the central Linux High Performance Cluster HPC References Guillen, H. HugoGuillen/MAPCARE: v0.1.0-alpha. https://zenodo.org/records/15453911 . Accessed 8 Jun 2025. UMLS Terminology Services. https://uts.nlm.nih.gov/uts/ . Accessed 25 Apr 2025. Data Standardization – OHDSI. https://www.ohdsi.org/data-standardization/ . Accessed 25 Apr 2025. International Classification of Health Interventions (ICHI). https://www.who.int/standards/classifications/international-classification-of-health-interventions . Accessed 25 Apr 2025. CPT® (Current Procedural Terminology) | AMA. https://www.ama-assn.org/amaone/cpt-current-procedural-terminology . Accessed 24 Apr 2025. Healthcare Common Procedure Coding. System (HCPCS) | CMS. https://www.cms.gov/medicare/coding-billing/healthcare-common-procedure-system . Accessed 25 Apr 2025. BfArM & - OPS.. https://www.bfarm.de/EN/Code-systems/Classifications/OPS-ICHI/OPS/_node.html . Accessed 25 Apr 2025. CCAM en ligne - CCAM. https://www.ameli.fr/accueil-de-la-ccam/index.php . Accessed 25 Apr 2025. Schweizerische Operationsklassifikation, C. H. O. P. 2023 - Systematisches Verzeichnis - CSV - | Publikation. https://www.bfs.admin.ch/asset/de/22988091 . Accessed 25 Apr 2025. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program - PubMed. https://pubmed.ncbi.nlm.nih.gov/11825149/ . Accessed 24 Apr 2025. Liu, S., Ma, W., Moore, R., Ganesan, V. & Nelson, S. RxNorm: Prescription for electronic drug information exchange. IT Prof. 7 , 17–23. https://doi.org/10.1109/MITP.2005.122 (2005). Savova, G. K. et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17 , 507–513. https://doi.org/10.1136/JAMIA.2009.001560/3/M_AMIAJNL1560FIG1.JPEG (2010). Grosjean, J. et al. Health Multi-Terminology Portal: A Semantic Added-value for Patient Safety. 129–138. (2011). https://doi.org/10.3233/978-1-60750-740-6-129 Venkataraman, G. R. et al. FasTag: Automatic text classification of unstructured medical narratives. PLoS One . 15 https://doi.org/10.1371/JOURNAL.PONE.0234647 (2020). International Classification of Diseases (ICD). https://www.who.int/standards/classifications/classification-of-diseases . Accessed 25 Apr 2025. RxNorm https://www.nlm.nih.gov/research/umls/rxnorm/index.html . Accessed 25 Apr 2025. Human, P. & Ontology https://hpo.jax.org/ . Accessed 25 Apr 2025. SNOMED CT - Home. https://browser.ihtsdotools.org/? Accessed 25 Apr 2025. Arbabi, A., Adams, D. R., Fidler, S. & Brudno, M. Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning. JMIR Med Inform 2019;7(2):e12596 https://medinform.jmir.org/2019/2/e12596 7:e12596 . (2019). https://doi.org/10.2196/12596 Gaudet-Blavignac, C., Foufi, V., Bjelogrlic, M. & Lovis, C. Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review. J Med Internet Res 2021;23(1):e24594 https://www.jmir.org/2021/1/e24594 23:e24594 . (2021). https://doi.org/10.2196/24594 Li, Y. & Yang, T. Word Embedding for Understanding Natural Language: A Survey. Stud. Big Data . 26 , 83–104. https://doi.org/10.1007/978-3-319-53817-4_4 (2018). Böhringer, D. et al. Automatic inference of ICD-10 codes from German ophthalmologic physicians’ letters using natural language processing. Sci. Rep. 14 , 9035. https://doi.org/10.1038/S41598-024-59926-3 (2024). Kugic, A., Pfeifer, B., Schulz, S. & Kreuzthaler, M. Embedding-based terminology expansion via secondary use of large clinical real-world datasets. J. Biomed. Inf. 147 , 104497. https://doi.org/10.1016/J.JBI.2023.104497 (2023). Tariq, A. et al. Contrastive diagnostic embedding (CDE) model for automated coding - A case study using emergency department encounters. Int. J. Med. Inf. 179 https://doi.org/10.1016/J.IJMEDINF.2023.105212 (2023). Lee, J. et al. Automating surgical procedure extraction for society of surgeons adult cardiac surgery registry using pretrained language models. JAMIA Open. 7 , ooae054. https://doi.org/10.1093/jamiaopen/ooae054 (2024). Tavabi, N. et al. Building large-scale registries from unstructured clinical notes using a low-resource natural language processing pipeline. Artif. Intell. Med. 151 , 102847. https://doi.org/10.1016/J.ARTMED.2024.102847 (2024). Percha, B., Pisapati, K., Gao, C. & Schmidt, H. Natural language inference for curation of structured clinical registries from unstructured text. J. Am. Med. Inf. Assoc. 29 , 97. https://doi.org/10.1093/JAMIA/OCAB243 (2021). Kim, J. S. et al. Can Natural Language Processing and Artificial Intelligence Automate The Generation of Billing Codes From Operative Note Dictations? Global Spine J. 13 :1946–1955. https://doi.org/10.1177/21925682211062831/ASSET/7D49F803-1927-47A9-A748-5A82F1C1C4D3/ASSETS/IMAGES/LARGE/10.1177_21925682211062831-FIG5.JPG (2023). Chiu, B. & Baker, S. Word embeddings for biomedical natural language processing: A survey. Lang. Linguist Compass . 14 https://doi.org/10.1111/LNC3.12402 (2020). Shah, N. H., Entwistle, D. & Pfeffer, M. A. Creation and Adoption of Large Language Models in Medicine. JAMA 330 , 866–869. https://doi.org/10.1001/JAMA.2023.14217 (2023). Welcome Gemma 2 - Google’s new open LLM. https://huggingface.co/blog/gemma2 . Accessed 25 Apr 2025. Open Source Strikes Bread - New Fluffy Embedding Model - Mixedbread. https://www.mixedbread.com/blog/mxbai-embed-large-v1 . Accessed 25 Apr 2025. Li, X. & (李婧), J. L. AoE: Angle-optimized Embeddings for Semantic Textual Similarity. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 1:1825–1839. (2024). https://doi.org/10.18653/V1/2024.ACL-LONG.101 GitHub - chroma-core/chroma: the AI-native open-source embedding database. https://github.com/chroma-core/chroma . Accessed 25 Apr 2025. GitHub - ollama/ollama. Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models. https://github.com/ollama/ollama . Accessed 25 Apr 2025. GitHub - OHDSI/Usagi. Usagi is an application to help create mappings between coding systems and the Vocabulary standard concepts. https://github.com/OHDSI/Usagi . Accessed 25 Apr 2025. NHS Classifications OPCS-4 - TRUD. https://isd.digital.nhs.uk/trud/user/guest/group/0/pack/10 . Accessed 25 Apr 2025. Chaturvedi, J., Wang, T., Velupillai, S., Stewart, R. & Roberts, A. Development of a Knowledge Graph Embeddings Model for Pain. AMIA Annual Symposium Proceedings 2023:299 (2024). Zhu, M., Yang, Q., Gao, Z., Yuan, Y. & Liu, J. FedBM: Stealing knowledge from pre-trained language models for heterogeneous federated learning. Med. Image Anal. 102 https://doi.org/10.1016/J.MEDIA.2025.103524 (2025). GitHub - HugoGuillen/MAPCARE: MAP-CARE: Multilingual Approach for Procedures in Clinical and Retrieval Embeddings. https://github.com/HugoGuillen/MAPCARE . Accessed 16 May 2025. Additional Declarations No competing interests reported. Supplementary Files Supplementaryfile1.xlsx Cite Share Download PDF Status: Published Journal Publication published 09 Jan, 2026 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 13 Jun, 2025 Submission checks completed at journal 10 Jun, 2025 First submitted to journal 10 Jun, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6848278","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":469246841,"identity":"bf92e2bf-f201-4937-84fc-a1676c673de7","order_by":0,"name":"Hugo Guillen-Ramirez","email":"","orcid":"","institution":"University of Bern","correspondingAuthor":false,"prefix":"","firstName":"Hugo","middleName":"","lastName":"Guillen-Ramirez","suffix":""},{"id":469246842,"identity":"ba01c53c-4b43-43e9-bb0a-cb8c381035eb","order_by":1,"name":"Karen Triep","email":"","orcid":"","institution":"University Hospital of Bern","correspondingAuthor":false,"prefix":"","firstName":"Karen","middleName":"","lastName":"Triep","suffix":""},{"id":469246843,"identity":"70e4b199-2e32-467f-b901-bf1e29a3babb","order_by":2,"name":"Christophe Gaudet-Blavignac","email":"","orcid":"","institution":"University Hospital of Geneva","correspondingAuthor":false,"prefix":"","firstName":"Christophe","middleName":"","lastName":"Gaudet-Blavignac","suffix":""},{"id":469246844,"identity":"64d1e09a-c6f6-42d2-a83c-8d71fc07af17","order_by":3,"name":"Baljit Phull","email":"","orcid":"","institution":"University Hospital of Bern","correspondingAuthor":false,"prefix":"","firstName":"Baljit","middleName":"","lastName":"Phull","suffix":""},{"id":469246850,"identity":"47e0f1e1-5c1f-4380-af38-f5ba91e87e78","order_by":4,"name":"Guido Beldi","email":"","orcid":"","institution":"University Hospital of Bern","correspondingAuthor":false,"prefix":"","firstName":"Guido","middleName":"","lastName":"Beldi","suffix":""},{"id":469246851,"identity":"a3196900-43b2-4dc1-a6d7-7190e674a5d0","order_by":5,"name":"Olga Endrich","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA6UlEQVRIiWNgGAWjYDACdgjF2MDMwMbwwQAiwoxXCzOSFsYZBhARIrUwMLAx8zAQoYWfmf3iB4aKO7Lb25mfPbYpuGPPz8zA+LkAjxbJZp5iCYYzz4znHGYzN84xeJY4s5mBWXoGHi0Gh3kSJBjbDifOYOZhk84xOJxgcBjmQtxakn8w/oNqsTA4bG9PWAv7MQnGBqgWIJdxAzMBLUC/sFkkHDtsPIOZzUyyxwCo9zBjszQ+Lfzs7Y9vfKg5LDuD//AziR9/Dtvztzcf/IxPCwMDjwFDAqoIKI7wAvYHBBSMglEwCkbBiAcAXDNBFEbIhioAAAAASUVORK5CYII=","orcid":"","institution":"University Hospital of Bern","correspondingAuthor":true,"prefix":"","firstName":"Olga","middleName":"","lastName":"Endrich","suffix":""}],"badges":[],"createdAt":"2025-06-08 15:23:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6848278/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6848278/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-34778-7","type":"published","date":"2026-01-09T15:57:14+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":84546691,"identity":"f8a20cc2-bcff-4a54-89a9-1473d0a13b37","added_by":"auto","created_at":"2025-06-13 09:23:34","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":159112,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMAP-CARE workflow for multilingual classification integration.\u003c/strong\u003e In the MAP-CARE pipeline for integrating non-English medical classifications, input terms such as \"\u003cem\u003eDiagnostischer Ultraschall des Auges\u003c/em\u003e,\" are sanitised and translated into English using the LLM Gemma2 with contextual augmentation. The augmented terms, containing detailed descriptions, are then converted into a 1024-dimensional vector using the mxbai-embed model. These vectors are stored in ChromaDB for semantic search and retrieval, enabling seamless access to multilingual procedural data.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-6848278/v1/6e1d557e4b338bda13d2c43f.png"},{"id":84546770,"identity":"5ad141bc-4ccc-4e38-a6e7-266e8ccfae84","added_by":"auto","created_at":"2025-06-13 09:23:36","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":90169,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCross-system mapping workflow.\u003c/strong\u003e For mapping terms between two classification systems (Classification A and Classification B) using MAP-CARE, terms from Classification A are represented as embeddings in a high-dimensional vector space. A similarity search is then performed against the embedding representations in Classification B, resulting in the identification of the mapped term from Classification B that is semantically closest to the input term from Classification A.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-6848278/v1/743e16ccfea3fe1f539ad95e.png"},{"id":84546706,"identity":"bb53d5b0-3a43-4d1b-ab44-dd677ed8a3ba","added_by":"auto","created_at":"2025-06-13 09:23:35","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":376012,"visible":true,"origin":"","legend":"\u003cp\u003eMapping evaluation results. \u003cstrong\u003e(a)\u003c/strong\u003e Translation evaluation conducted by two human evaluators, who categorized the CHOP to OPS translations into six categories: equal, equivalent, inexact, narrower, wider, and unmatched. The confusion matrix displays the distribution of agreed and disagreed labels across evaluators. \u003cstrong\u003e(b)\u003c/strong\u003e Distribution of translated CHOP to OPS mappings according to the OHDSI mapping equivalence framework, with percentages shown on the y-axis and absolute counts displayed above each bar. \u003cstrong\u003e(c)\u003c/strong\u003e Evaluation conducted using a large language model (LLM) acting as an expert, with mappings classified into four categories: Exact Match, Near Match, Partial Match, and Mismatch.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-6848278/v1/189c4bd19840447ee3e784a2.png"},{"id":84546886,"identity":"9a496852-4758-4fa8-8a50-089052e494b1","added_by":"auto","created_at":"2025-06-13 09:23:40","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":63203,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMapping accuracy across hierarchical levels of CHOP and OPS codes. \u003c/strong\u003eThe accuracy outcomes are categorized into exact match, near match, partial match, and mismatch at four hierarchy levels.\u003c/p\u003e","description":"","filename":"Figure4.png","url":"https://assets-eu.researchsquare.com/files/rs-6848278/v1/718f9be7b950de29c2feb498.png"},{"id":84546777,"identity":"ddae1b9c-7bd7-48ac-96ef-a564edaeb696","added_by":"auto","created_at":"2025-06-13 09:23:36","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":194654,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCross-language mapping performance for CHOP codes. \u003c/strong\u003eThe performance of MAP-CARE for mapping CHOP codes across German, French, and Italian was evaluated. The left panel shows accuracy for top-1 results, with mapping success ranging from 59% (German to Italian) to 75% (French to Italian). The right panel shows improved accuracy when considering the top-5 results, exceeding 70% for all pairs and peaking at 90% (French to German).\u003c/p\u003e","description":"","filename":"Figure5.png","url":"https://assets-eu.researchsquare.com/files/rs-6848278/v1/275869558d1131430f5e983b.png"},{"id":100069052,"identity":"146ed77f-acff-4884-a0d8-30787c7d5684","added_by":"auto","created_at":"2026-01-12 16:07:51","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1515243,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6848278/v1/33bf13f9-42cf-4038-b965-2fb53f089706.pdf"},{"id":84546950,"identity":"4ce19dd3-1e6d-4367-8376-d4a63835dc89","added_by":"auto","created_at":"2025-06-13 09:23:42","extension":"xlsx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":281417,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryfile1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6848278/v1/a2d33bffca86f9ab70299c2c.xlsx"}],"financialInterests":"No competing interests reported.","formattedTitle":"MAP-CARE: Enhancing Cross-Lingual Medical Intervention Terms Analysis Through LLM-supported Semantic Embeddings","fulltext":[{"header":"Introduction","content":"\u003cp\u003eHarmonizing data from electronic health records (EHRs) remains a global challenge. Initiatives such as the Unified Medical Language System (UMLS) [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] and the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] aim to standardize and integrate heterogeneous health data vocabularies and sources. These frameworks are crucial for ensuring interoperability in multilingual environments, where variations in language morphology can lead to significant inconsistencies and misinterpretations in medical data. The current linguistic and structural diversity of medical terminologies calls for robust tools that can not only translate but also semantically align concepts across languages and classification systems. While platforms like UMLS and OHDSI’s Athena tool offer comprehensive vocabularies and mappings, they primarily focus on established standardized terminologies and may lack the flexibility to accommodate country-specific or health-system-specific classifications.\u003c/p\u003e \u003cp\u003eSignificant challenges remain in aligning data from medical procedures and surgical procedures from non-English speaking regions. Classifications of medical procedures encode medical interventions and are essential for monitoring, billing, quality control, and research within the healthcare environments. Medical procedure codes are typically multiaxial by including abbreviations and free-text descriptors, which are often ambiguous, lacks specificity, or includes multiple possible interpretations, making semantic mapping to a standardized code difficult. Additionally, they often include complex inclusion and exclusion criteria and are context-sensitive, with modifiers such as material used or surgical approach further influencing their meaning. Retrieving a specific code relying only on keyword matches often fails to accommodate the complexity and diversity of medical terminologies and classifications, overlooking nuances and differences crucial for precision. For instance, a “gallbladder removal” query may not retrieve records labelled as “cholecystectomy,” the medical term for the same procedure. Similarly, a “knee replacement surgery” search might miss entries listed under more formal terms such as “total knee arthroplasty”. Given that English is widely regarded as the standard language for knowledge exchange, terminology standardization, and research, there is also a need for data accessibility through English for term searches to bridge linguistic and classificatory gaps.\u003c/p\u003e \u003cp\u003eContextual and national procedural classifications often rely on specialized terminology, and their integration into international standards such as International Classification of Health Interventions (ICHI) [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] is fraught with complexities due to their hierarchical structures and linguistic nuances. For example, procedural terminologies and coding systems widely used in the U.S., such as the Current Procedural Terminology (CPT) of American Medical Association [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], the Healthcare Common Procedure Coding System (HCPCS) [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], are not always directly translatable to systems used elsewhere. As a result, importing or aligning information from national systems such as Germany's Operations and Procedures Catalogue (OPS) [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e], or France's Classification Commune des Actes Medicaux (CCAM) [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e], Switzerland’s Operations and Procedures Catalogue (CHOP) [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], to name a few, remains challenging in the majority of countries worldwide.\u003c/p\u003e \u003cp\u003eSeveral systems have been developed to facilitate the automated extraction and linking of clinical terms to standardized vocabularies. Over the past 20 years, rule-based mapping has been the most commonly used method, applied across various classifications and use cases. For example, MetaMap [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] was designed to identify UMLS concepts within free-text clinical narratives. Similarly, systems like those by Liu et al. [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e], cTAKES [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e], HeTOP [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], and FasTag [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] employ rule-based methods to extract structured information from clinical text and map it to international classifications, ontologies, and terminologies such as International Classification of Diseases (ICD) [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e], RxNorm [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], the Human Phenotype Ontology (HPO) [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e], and Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. The design of SNOMED CT supports its role as a universal language for health care and reflects features typical of natural language, including synonymy, hierarchical structure, compositionality, and contextual adaptability [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. However, despite the effectiveness of these systems in identifying terms from free text, challenges remain in dealing with non-standardized or evolving procedural codes—particularly in non-English contexts.\u003c/p\u003e \u003cp\u003eRecent advances in natural language processing (NLP) have introduced powerful tools to process, interpret, and standardize language in complex domains such as healthcare. Among these, word embeddings have emerged as a particularly promising technique. By transforming words into numerical representations that capture contextual and semantic relationships, embeddings approximate human-like understanding of language [\u003cspan additionalcitationids=\"CR22 CR23\" citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e–\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. This capability enables computers to interpret medical texts with a nuance approaching that of human reasoning. Different methods, such as pretrained language models (PLMs) and key-word extractor were compared to populate defined procedural elements [\u003cspan additionalcitationids=\"CR26 CR27\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e–\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. However, their application in healthcare remains a work in progress, due to the unique linguistic demands and conceptual complexity of biomedical terminology as pointed out by Chiu \u0026amp; Baker [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. The rise of large language models (LLMs) has further advanced the field, providing AI systems capable of extracting insights, summarizing complex information, and supporting clinical decision-making. Nonetheless, effective deployment of LLMs in healthcare requires domain-specific adaptation to ensure both reliability and scalability [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. Despite growing interest in clinical NLP, there remains a notable scarcity of research on terminology embeddings in non-English languages, which poses significant challenges for accurate information retrieval from multilingual electronic health records (EHRs).\u003c/p\u003e \u003cp\u003eTo address these longstanding challenges, this study introduces MAP-CARE (Multilingual Approach for Procedures in Clinical and Retrieval Embeddings)—a novel framework that redefines access to multilingual medical procedural data. MAP-CARE uniquely enables seamless integration of multilingual data and allows retrieval of medical information documented in non-English languages using English as the query language. By leveraging large language models (LLMs) and advanced multilingual embeddings, the framework encodes medical terminologies into a unified semantic space, capturing cross-lingual and cross-classification relationships with high precision. This approach moves beyond traditional rule-based mapping by offering a scalable, data-driven solution for semantic interoperability. The following sections outline the methods underpinning MAP-CARE and demonstrate how these conceptual innovations are operationalized into practical tools that significantly enhance cross-lingual alignment of surgical and diagnostic procedures. To our knowledge, this is the first comprehensive work on the multilingual representation of medical procedures within the full domain of surgical and diagnostic classifications.\u003c/p\u003e "},{"header":"Methods","content":"\u003cp\u003eWorkflow overview\u003c/p\u003e\u003cp\u003eThe MAP-CARE framework, depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, begins by processing a (non-English) medical classification, exemplified by the Swiss Classification of Operations and Procedures (CHOP). Each term from the classification undergoes a series of transformations. First, terms are sanitised (e.g. removing self-references, substitution of standard abbreviations. Second, the term is translated and contextually augmented using an LLM, \u003cem\u003eGemma2\u003c/em\u003e [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. Third, the enhanced description is transformed into a semantic numeric representation. The enhanced term’s text is encoded into a 1024-dimensional numeric vector that captures its semantic meaning using the embedding model \u003cem\u003emxbai-embed\u003c/em\u003e [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. Finally, the generated vectors from the complete classification are stored in \u003cem\u003eChromaDB\u003c/em\u003e [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e], a database designed for efficient vector management and retrieval.\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cp\u003eTranslation and preprocessing of CHOP and OPS codes\u003c/p\u003e\u003cp\u003eThe Swiss Classification of Operations (CHOP) is a descendant of the ICD-9-CM developed by the Swiss Federal Statistics Office independently since 2008. CHOP is structured hierarchically in a tree-like architecture, organizing medical procedures across multiple levels of specificity. Anatomical regions are indicated at the first and second levels using two-digit codes, while procedural invasiveness and methods are detailed at the third and fourth levels with four-digit codes. The hierarchy culminates in highly specific terminal codes at the six-digit level, resulting in over 10,860 distinct entries. Although a hierarchical structure is present, it is inconsistently applied—particularly where child concepts do not inherit all explicitly defined features, and levels of specificity are not systematically represented. The classification is translated across three Swiss national languages: French, German, and Italian. Despite its comprehensive nature, the classification exhibits semantic inconsistencies and is limited to formats in PDF and CSV. CHOP version 2023 was used for this project.\u003c/p\u003e\u003cp\u003eThe German \u003cem\u003eOperationen- und Prozedurenschlüssel\u003c/em\u003e (OPS) is an adaptation of the International Classification of Procedures in Medicine (ICPM) specifically tailored for the German healthcare system. Maintained by the Federal Institute for Drugs and Medical Devices (BfArM), the OPS categorizes surgical operations and medical procedures in a hierarchical structure, similar to the CHOP system used in Switzerland, and consists of ca 33’750 distinct terminal codes in the 2023 version [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. The OPS codes are organized from broader categories to highly specific procedures, represented by a combination of digits and letters that detail procedural types and methods. The XML of the 2023 OPS was parsed into a CSV file for further analysis.\u003c/p\u003e\u003cp\u003eThe sanitised CSV files for CHOP and OPS were translated using Gemma2 accessed via the ollama interface [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]. Few-shot learning was employed to optimize the translation accuracy for the clinical context. The prompts (“model files”) engineered for this task are described in Supplementary Table\u0026nbsp;1. A manual cleaning was done after the LLM provided the preliminary English translations. This phase focused on ensuring the translations were appropriate, refining formatting, and removing extraneous outputs from the LLM. The quality of CHOP translations, was validated by using the OHDSI Usagi mapping workflow tool [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]. The English translation of the German version of CHOP was uploaded into the Usagi tool. First, the mapping of the translated terminal CHOP codes (10,860 in total) to corresponding vocabulary concepts (SNOMED CT) was done automatically. Subsequently, manual validation of the CHOP to SNOMED CT mapping was conducted for the representative subset of terms and the quality of translation was assessed by using OHDSI mapping equivalence framework.\u003c/p\u003e\u003cp\u003eEmbedding generation\u003c/p\u003e\u003cp\u003eFollowing the translation and cleaning of the CHOP codes, embeddings were generated to encapsulate the semantic relationships and nuanced meanings embedded within the data. For this purpose, the \u003cem\u003emxbai-embed-large model\u003c/em\u003e was accessed via the ollama interface. This model was chosen for its capability to produce high-dimensional vector representations while maintaining a small memory footprint.\u003c/p\u003e\u003cp\u003eThe preparation of data for embedding involved encoding each CHOP entry into a structured format. Each entry was formatted into a single line, with key fields concatenated and separated by a vertical bar ('|'). Included in these fields were the unique identifier for each CHOP code (zcode), the code's title in its original language (German, French, or Italian), the translated title in English, and a translated description of the CHOP procedure. A similar format was used for OPS. To improve the search results, a prompt was required on each input string. \"Represent this sentence for semantic searching\" was used as a prefix for each row, guiding the embedding model to capture the intended semantic content more effectively. The resulting set of vectors, referred to as the \"embedding space,\" was produced from this detailed preparation. This space now serves as a comprehensive semantic landscape, enabling advanced searches and analyses based on the contextual relationships defined by the CHOP codes.\u003c/p\u003e\u003cp\u003eExpert validation of CHOP-to-OPS mappings\u003c/p\u003e\u003cp\u003eMAP-CARE was evaluated for its ability to map Swiss CHOP codes to German OPS codes across a sample of 494 terms that included a diverse range of common and rare surgical procedures across various medical specialties, methods, used material and anatomic locations. The mapping follows the strategy of directly search the term from one language into the embedding space of another language, as opposed to comparing text directly (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Two manual expert evaluations were performed: Evaluation 1 (conducted by KT and OE, Supplementary Table\u0026nbsp;2) focused on the fidelity of German CHOP texts relative to their English translations, specifically whether the meaning of the original text was preserved. In order to assess the interrater reliability, the resulting values of both raters were compared. Evaluation 2 (conducted by KT, Supplementary Table\u0026nbsp;3) verified whether the translated CHOP codes semantically matched their translated OPS counterparts, using the OHDSI Usagi tool for SNOMED CT–based guidance, then manually reviewing correctness and procedural specificity of each suggested mapping for similarity.\u003c/p\u003e\u003cp\u003eFor both evaluations (1: German \u003cb\u003e→\u003c/b\u003e English; 2: SNOMED CT of CHOP \u003cb\u003e→\u003c/b\u003e SNOMED CT of OPS), each match was rated according to the OHDSI mapping equivalence framework:\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eEqual\u003c/b\u003e: Original meaning is fully preserved.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eEquivalent\u003c/b\u003e: Slight inaccuracies, but essential meaning intact.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eWider\u003c/b\u003e: Mapped term is a broader parent concept.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eNarrower\u003c/b\u003e: Mapped term is a more specific child concept.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eInexact\u003c/b\u003e: Partial overlap, with some meaning lost.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eUnmatched\u003c/b\u003e: No suitable mapping was identified.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eUnreviewed\u003c/b\u003e: Cases not validated by both reviewers.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eIn parallel, Evaluation 3 (Supplementary Table\u0026nbsp;4) was conducted using an LLM-as-a-judge strategy using the following rubric:\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eExact Match\u003c/b\u003e: The retrieved term perfectly corresponds with the intended term in both meaning and context.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eNear Match\u003c/b\u003e: The retrieved term is closely related but exhibits slight variations in specificity, context, or comprehensiveness.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003ePartial Match\u003c/b\u003e: The retrieved term shares certain semantic features with the target term but lacks essential details necessary for an accurate match.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eMismatch\u003c/b\u003e: The retrieved term has no semantic relation to the target term.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cp\u003eMultilingual terminology mapping evaluation\u003c/p\u003e\u003cp\u003eThe terminal codes across the CHOP classifications in German, French, and Italian were compiled. Each code, potentially linked to multiple entries due to variations in descriptions and contextual details across languages, was retained for further analysis.\u003c/p\u003e\u003cp\u003eThe mapping process followed the strategy depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. Each entry from the CHOP classification was queried against its counterparts in the other two languages using ChromaDB. Specifically, for each entry, a query was executed to retrieve the single most relevant result (the top-1 result) from the databases of the other languages. To broaden the scope of the semantic analysis, the query was expanded to also retrieve the top-5 results for each entry. The accuracy of this mapping strategy was assessed by quantifying the frequency with which codes from one language’s CHOP classification matched correctly with the other language.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eMachine translation\u003c/p\u003e \u003cp\u003eThe quality of the translations of CHOP codes was rigorously assessed using the similarity of terminal codes with mapped SNOMED CT terms as a proxy for semantic equivalence. Through the OHDSI Usagi mapping tool, a total of 10,860 CHOP terminal codes were linked to corresponding SNOMED CT concepts. This linkage was subjected to manual validation for a subset of codes, where each mapping was reviewed to assess similarity.\u003c/p\u003e \u003cp\u003eCross-system matching\u003c/p\u003e \u003cp\u003eExpert evaluations\u003c/p\u003e \u003cp\u003eThe MAP-CARE system's functionality was tested through cross-system and cross-language matching tasks. First, the MAP-CARE system's performance was evaluated across a dataset of 494 medical procedure mappings, manually analysing the efficacy and accuracy of semantic matching between the Swiss CHOP and German OPS coding systems (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e and Supplementary Table\u0026nbsp;1). A set of comparative analyses was conducted to determine how consistently the two human expert evaluations (Evaluation 1 for translation and Evaluation 2 for semantic matching) aligned with each other and with a semantic matching using an LLM-as-a-judge approach (Evaluation 3).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eEvaluation 1 (translation, 2 raters)\u003c/p\u003e \u003cp\u003eThe evaluation of translations (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea) was conducted using six categories to capture varying degrees of alignment between the original text and the translated content: equal, equivalent, inexact, narrower, wider, and unmatched. The distribution of these categories in the confusion matrix is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. The rater percentage agreement was 75.9% and the interrater reliability (Cohen\u0026rsquo;s Kappa) between the 2 raters was 0.34 while considering the six labels.\u003c/p\u003e \u003cp\u003eThe majority of translated codes were classified as equal, with both evaluators agreeing on this category in 339 cases (68.6%). This suggests that a substantial portion of the translations preserved the original clinical meaning without deviation. In 23 cases, one evaluator marked a code as equal while the other classified it as equivalent, indicating slight inaccuracies or minor wording differences that did not compromise the essential meaning. A total of 21 cases were mutually identified as equivalent (4.7%), reinforcing that a small subset of translations contained subtle inaccuracies yet still conveyed the core concept accurately. The inexact category accounted for 27 instances (5.5%) of agreement between evaluators. These cases reflect translations where some degree of semantic overlap existed, but certain details or nuances were lost. Notably, 18 cases initially labelled as \"equal\" by one evaluator were instead considered \"inexact\" by the other, suggesting that some translations perceived as precise by one reviewer were seen as incomplete or ambiguous by the other. The presence of codes classified as inexact, narrower, or wider suggests that certain clinical terms may pose particular challenges in translation, requiring improved guidance or clearer criteria for distinguishing between these nuanced categories.\u003c/p\u003e \u003cp\u003eEvaluation 2 (semantic matching)\u003c/p\u003e \u003cp\u003eEvaluation 2 assessed the semantic accuracy of mappings, evaluating whether CHOP and OPS codes maintained conceptual alignment despite the absence of direct linguistic equivalence (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb). The evaluation was conducted using the OHDSI Usagi tool, followed by a manual review to ensure correctness and procedural specificity.\u003c/p\u003e \u003cp\u003eAcross the mappings, the system achieved 52 of 494 SNOMED CT concept pairs (10.5%) were rated as equal (identical SNOMED CT concept identifier) confirming that the procedures described in CHOP and OPS were fundamentally the same in both meaning and medical intent and 20 of 494 (4%) as semantically equivalent (different SNOMED CT concept identifier) indicating minor semantic variation that did not impact clinical interpretation. These mappings typically involved synonyms, slight phrasing differences, or minor variations in procedural description without altering the core medical concept. A total of 221 cases (44.7%) were classified as inexact, meaning that the mapped procedures shared some commonalities but differed in key details, such as anatomical specificity or procedural approach. Further distinctions were observed in 132 cases (26.7%) categorized as wider, where the CHOP term encompassed a broader procedural scope than the OPS counterpart. Conversely, 64 cases (13%) were narrower, meaning that the CHOP term was more restrictive than the corresponding OPS term, limiting the procedural scope. Notably, no mappings were categorized as unmatched.\u003c/p\u003e \u003cp\u003eEvaluation 3 (LLM-based semantic matching)\u003c/p\u003e \u003cp\u003eWhile evaluating the matching with an LLM (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec), the system achieved 127 exact matches, representing 25.7% of the evaluations, where procedures were correctly aligned with high fidelity in both procedural specificity and anatomical locations. For example, \u0026lsquo;coronary angioplasty with antibody-coated balloons\u0026rsquo; was precisely matched, linking CHOP code Z00.66.22 to OPS code 8-83b.b1. Near matches, more frequent with 144 instances (29.1%), were correct in alignment but lacked critical details; for instance, \u0026lsquo;removal of an intracranial implant\u0026rsquo; (CHOP Z01.39.50) was matched to \u0026lsquo;removal of a neuroprosthesis\u0026rsquo; (OPS 5\u0026ndash;029.b). While the core procedures align, their specificities differ. Partial matches were the most common, observed in 176 cases (35.6%), and indicated correct categorization but missing specific procedural or anatomical details. A notable example includes the difference in invasiveness between 'other craniotomy for evacuation of an epidural hematoma' (CHOP Z01.25.11) and 'therapeutic percutaneous puncture of an epidural hematoma' (OPS 8-159.4); the former is an open surgery, while the latter is a minimally invasive procedure. Mismatches occurred in 47 instances (9.5%), showing fundamental errors, such as confusing 'instillation of a uterine tube' (CHOP Z66.8) with 'foetal implantation of a pacemaker' (OPS 5-755.8). A chi-square test of goodness-of-fit was conducted to assess how well the observed distribution of these categories fits an expected uniform distribution across the four outcomes. The chi-square statistic was calculated at 73.21 (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:p\u0026lt;8.78\\times\\:{10}^{-16}\\)\u003c/span\u003e\u003c/span\u003e), suggesting that the observed frequencies of match categories are not evenly distributed and thus indicating that the system performs differently across these categories.\u003c/p\u003e \u003cp\u003eThe system's performance varied significantly across different hierarchical levels (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e):\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLevel 3\u003c/b\u003e: Displayed an equal distribution of exact and near matches (41.67% each) but also presented a notable rate of mismatches (8.33%), pointing to difficulties at more abstract coding levels.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLevel 4\u003c/b\u003e: Showed less success in achieving exact matches (21.43%) and more frequent partial matches (35.71%), suggesting challenges in maintaining specificity amidst generalization.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLevel 5\u003c/b\u003e: achieved 50% exact and near matches, underscoring the system's strength in detailed procedural matching.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e \u003cb\u003eLevel 6\u003c/b\u003e: Exhibited a balanced distribution across all categories, with exact matches at 26.36% and partial matches at 36.68%, though mismatches persisted at 9.51%.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eCross-language matching\u003c/p\u003e \u003cp\u003eIn order to test the cross-language capabilities of MAP-CARE, we developed an automated strategy to assess the mapping accuracy of CHOP codes within the MAP-CARE embedding space across German, French, and Italian versions. Each term is mapped between language pairs to verify if the original CHOP code corresponds to the mapped code in another language\u0026rsquo;s vector store (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe evaluation of the mapping accuracy was conducted in two stages: top-1 match, and top-5 matches, visualized in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e. The 'top-1 match' refers to the closest term retrieved, indicating the system's ability to directly match an entry with the highest accuracy. On the other hand, 'top-5 matches' considers whether the correct term appears within the first five terms retrieved. This measure allows for a broader assessment of the system\u0026rsquo;s capability to identify relevant terms, even if they are not the highest ranked, which is crucial for applications where multiple similar options may be clinically relevant. The performance at the top-1 result level demonstrates that the system achieves a moderate level of accuracy, with more than 59% accuracy across all language pairs.\u003c/p\u003e \u003cp\u003eWhen the evaluation criteria were expanded to include the top five matches, the system\u0026rsquo;s performance notably improved, achieving over 70% accuracy across all language pairs. Upon manual review of the mismatches, it became apparent that many were synonymous entries within the CHOP classification. This broader matching criterion enables the system to more effectively capture semantic relationships between terms, even when direct matches are not the highest ranked.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe MAP-CARE workflow facilitates the integration and accessibility of multilingual medical procedural data through the application of large language models (LLMs) and vector embeddings. In contrast to established systems such as UMLS, which align numerous terminologies, MAP-CARE introduces a dynamic approach to (i) automatically align previously unmapped terminologies, thereby reducing the costs associated with manual mapping efforts, and (ii) assist users in efficiently navigating extensive medical classifications by ensuring that query results include all semantically relevant codes, eliminating the need for exhaustive exploration of the entire classification. This is particularly critical for elemental classifications, typically available in CSV or PDF formats, which are often searchable only by explicit terms and lack synonym recognition. MAP-CARE\u0026rsquo;s capability to address linguistic complexities, coupled with the granularity and variability of non-English and less standardized terminologies, positions it as a valuable tool where conventional keyword-based search methods are inadequate.\u003c/p\u003e \u003cp\u003eMAP-CARE effectively circumvents linguistic barriers inherent in medical terminology by enabling precise mapping across four languages and two different classifications. The system's demonstrates only 9.5% mismatches at the most detailed hierarchical levels, is therefore robust and particularly of interest since these final levels precisely describe a procedure, often including an exact anatomical region, approach, method and material. These results are promising because the system helps leverage the most granular and, therefore, the most work-intensive mapping level. It is important to note that no official mapping exists between the OPS and CHOP classification systems, despite the potential for valuable knowledge exchange and collaboration between countries. This lack of mapping arises from fundamental structural differences between the two systems, even though both are available in German, and the manual mapping process is highly burdensome due to their complexity.\u003c/p\u003e \u003cp\u003eThe level of aggregation can be achieved through explicit customisation and system training and by using cross-mapping to the refined hierarchical terminologies such as SNOMED CT [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. However, using the Usagi tool does not provide a sufficient quality of the SNOMED mappings when executed automatically which led to only 273 (55.3%) concept pairs in total rated as equal, equivalent, narrower or wider. Next steps in the project include creating embeddings of established international medical lexical database, such as SNOMED CT, ICD-10, OPS, ICHI and Classification of Operations and Procedures (OPCS-4), National Health System NHS [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. Through expanding the hierarchical nature of SNOMED, a knowledge graph representation enables more effective semantic reasoning, enhanced data interoperability, and improved clinical decision support by capturing complex relationships and contextual nuances among medical concepts [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eWhile evaluating MAP-CARE on cross-language tasks, it showed high efficacy in French and Italian. Contrastingly, German presented unique challenges that underscore the complexities of semantic interoperability. The observed lower accuracy rates in mappings from German to French and Italian (61% and 59%, respectively) suggest an underlying structural linguistic divergence, for example a typical characteristic to condense complex ideas into single words. These findings might reflect the intricate nature of German medical nomenclature and CHOP, which often lacks direct equivalents in the more Latinate vocabularies of French and Italian. This divergence not only impacts the system\u0026rsquo;s efficiency but also highlights the broader challenges of standardizing medical terminology across languages with disparate etymological roots. Enhancements to MAP-CARE's performance were observed when the criteria were expanded to include the top five matches, where accuracy rates exceeded 70% across all language pairs. This improvement demonstrates the system's capability to capture broader semantic relationships, emphasizing the potential of flexible matching criteria to accommodate linguistic variations.\u003c/p\u003e \u003cp\u003eHowever, challenges remain, particularly in integrating specific operational markers like '**' in OPS, which are critical in distinguishing between procedural nuances such as open versus laparoscopic approaches. Another challenge identified is when the nomenclature relies on mixed-term coding where multiple concepts are embedded within a single term, which is the case for CHOP. This lack of stringent compositionality which reduces semantic precision and impedes data interoperability across systems. For instance, procedural codes that fail to specify methods or surgical techniques obscure essential details, as evidenced by mismatches during CHOP-to-OPS mapping. Such ambiguity hinders cross-system integration and restricts the ability to conduct detailed procedural analyses. One possible solution to this issue can be segmenting procedures in granular, composable elements (e.g., device type, surgical approach, and anatomical site). Integrating this methodology into MAP-CARE would enhance its ability to represent and map medical procedures across classifications and languages semantically by moving beyond mixed-term coding and addressing marker code integration.\u003c/p\u003e \u003cp\u003eFinally, prospective applications of MAP-CARE include its use in feature engineering for machine learning. For example, semantic cluster assignments could be utilized as a new feature to implicitly model the invasiveness of procedures, enriching predictive analytics and downstream decision-making. Furthermore, MAP-CARE's architecture allows for seamless extensions to facilitate natural language searches within the embedding space. Concept embedding can be used to harmonize heterogeneous datasets and mitigate local learning bias in federated learning [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. By converting queries into their embedded vector representations, the system can efficiently identify and retrieve the most relevant terms, broadening its utility in diverse clinical and research settings.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eMAP-CARE represents a transformative approach in healthcare informatics, successfully leveraging the power of large language models and advanced embedding techniques to address the complexities of multilingual medical terminology. Its ability to accurately align and interpret medical procedures across languages supports the accessibility and utility of medical procedural information across linguistic and healthcare domains. Future work will aim to expand the application of MAP-CARE to include additional medical terminologies and refine its analytical capabilities to extract even more detailed insights from complex medical data sets, thereby increasing accuracy. Enhancing the system\u0026rsquo;s ability to integrate with various medical terminologies, frameworks, and electronic health record systems could further amplify its impact. This continued development will optimise the system\u0026rsquo;s architecture and scalability to support a broader range of clinical and administrative applications.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e: Ethical approval was not required for this study as no human or animal data were used.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e: Not applicable'\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e: GitHub [40]\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e:\u0026nbsp;The authors have no competing interests to declare that are relevant to the content of this article.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding:\u003c/strong\u003e The research leading to these results received funding from Swiss Personalized Health Network SPHN under the Demonstrator Project INFRA: INFection Radar.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors' contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConceptualization: HGR, OE, KT, GB; Methodology: HGR, OE, KT; Formal analysis and investigation: HGR, OE, KT, BP; Writing - original draft preparation: HGR; Writing - review and editing: HGR, OE, KT, GB, CGB; Funding acquisition: OE, GB.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements:\u003c/strong\u003e University of Bern for a kind permission to use the UBELIX, the central Linux High Performance Cluster HPC\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eGuillen, H. HugoGuillen/MAPCARE: v0.1.0-alpha. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://zenodo.org/records/15453911\u003c/span\u003e\u003cspan address=\"https://zenodo.org/records/15453911\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 8 Jun 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUMLS Terminology Services. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://uts.nlm.nih.gov/uts/\u003c/span\u003e\u003cspan address=\"https://uts.nlm.nih.gov/uts/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eData Standardization \u0026ndash; OHDSI. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ohdsi.org/data-standardization/\u003c/span\u003e\u003cspan address=\"https://www.ohdsi.org/data-standardization/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eInternational Classification of Health Interventions (ICHI). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.who.int/standards/classifications/international-classification-of-health-interventions\u003c/span\u003e\u003cspan address=\"https://www.who.int/standards/classifications/international-classification-of-health-interventions\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCPT\u0026reg; (Current Procedural Terminology) | AMA. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ama-assn.org/amaone/cpt-current-procedural-terminology\u003c/span\u003e\u003cspan address=\"https://www.ama-assn.org/amaone/cpt-current-procedural-terminology\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 24 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHealthcare Common Procedure Coding. System (HCPCS) | CMS. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.cms.gov/medicare/coding-billing/healthcare-common-procedure-system\u003c/span\u003e\u003cspan address=\"https://www.cms.gov/medicare/coding-billing/healthcare-common-procedure-system\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBfArM \u0026amp; - OPS.. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.bfarm.de/EN/Code-systems/Classifications/OPS-ICHI/OPS/_node.html\u003c/span\u003e\u003cspan address=\"https://www.bfarm.de/EN/Code-systems/Classifications/OPS-ICHI/OPS/_node.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCCAM en ligne - CCAM. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ameli.fr/accueil-de-la-ccam/index.php\u003c/span\u003e\u003cspan address=\"https://www.ameli.fr/accueil-de-la-ccam/index.php\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchweizerische Operationsklassifikation, C. H. O. P. 2023 - Systematisches Verzeichnis - CSV - | Publikation. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.bfs.admin.ch/asset/de/22988091\u003c/span\u003e\u003cspan address=\"https://www.bfs.admin.ch/asset/de/22988091\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEffective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program - PubMed. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubmed.ncbi.nlm.nih.gov/11825149/\u003c/span\u003e\u003cspan address=\"https://pubmed.ncbi.nlm.nih.gov/11825149/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 24 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, S., Ma, W., Moore, R., Ganesan, V. \u0026amp; Nelson, S. RxNorm: Prescription for electronic drug information exchange. \u003cem\u003eIT Prof.\u003c/em\u003e \u003cb\u003e7\u003c/b\u003e, 17\u0026ndash;23. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/MITP.2005.122\u003c/span\u003e\u003cspan address=\"10.1109/MITP.2005.122\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2005).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSavova, G. K. et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, component evaluation and applications. \u003cem\u003eJ. Am. Med. Inform. Assoc.\u003c/em\u003e \u003cb\u003e17\u003c/b\u003e, 507\u0026ndash;513. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1136/JAMIA.2009.001560/3/M_AMIAJNL1560FIG1.JPEG\u003c/span\u003e\u003cspan address=\"10.1136/JAMIA.2009.001560/3/M_AMIAJNL1560FIG1.JPEG\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrosjean, J. et al. Health Multi-Terminology Portal: A Semantic Added-value for Patient Safety. 129\u0026ndash;138. (2011). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3233/978-1-60750-740-6-129\u003c/span\u003e\u003cspan address=\"10.3233/978-1-60750-740-6-129\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVenkataraman, G. R. et al. FasTag: Automatic text classification of unstructured medical narratives. \u003cem\u003ePLoS One\u003c/em\u003e. \u003cb\u003e15\u003c/b\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/JOURNAL.PONE.0234647\u003c/span\u003e\u003cspan address=\"10.1371/JOURNAL.PONE.0234647\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eInternational Classification of Diseases (ICD). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.who.int/standards/classifications/classification-of-diseases\u003c/span\u003e\u003cspan address=\"https://www.who.int/standards/classifications/classification-of-diseases\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRxNorm \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.nlm.nih.gov/research/umls/rxnorm/index.html\u003c/span\u003e\u003cspan address=\"https://www.nlm.nih.gov/research/umls/rxnorm/index.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuman, P. \u0026amp; Ontology \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://hpo.jax.org/\u003c/span\u003e\u003cspan address=\"https://hpo.jax.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSNOMED CT - Home. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://browser.ihtsdotools.org/?\u003c/span\u003e\u003cspan address=\"https://browser.ihtsdotools.org/?\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eArbabi, A., Adams, D. R., Fidler, S. \u0026amp; Brudno, M. Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning. JMIR Med Inform 2019;7(2):e12596 \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://medinform.jmir.org/2019/2/e12596 7:e12596\u003c/span\u003e\u003cspan address=\"https://medinform.jmir.org/2019/2/e12596 7:e12596\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. (2019). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/12596\u003c/span\u003e\u003cspan address=\"10.2196/12596\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGaudet-Blavignac, C., Foufi, V., Bjelogrlic, M. \u0026amp; Lovis, C. Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review. J Med Internet Res 2021;23(1):e24594 \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.jmir.org/2021/1/e24594 23:e24594\u003c/span\u003e\u003cspan address=\"https://www.jmir.org/2021/1/e24594 23:e24594\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/24594\u003c/span\u003e\u003cspan address=\"10.2196/24594\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, Y. \u0026amp; Yang, T. Word Embedding for Understanding Natural Language: A Survey. \u003cem\u003eStud. Big Data\u003c/em\u003e. \u003cb\u003e26\u003c/b\u003e, 83\u0026ndash;104. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-319-53817-4_4\u003c/span\u003e\u003cspan address=\"10.1007/978-3-319-53817-4_4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eB\u0026ouml;hringer, D. et al. Automatic inference of ICD-10 codes from German ophthalmologic physicians\u0026rsquo; letters using natural language processing. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e14\u003c/b\u003e, 9035. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/S41598-024-59926-3\u003c/span\u003e\u003cspan address=\"10.1038/S41598-024-59926-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKugic, A., Pfeifer, B., Schulz, S. \u0026amp; Kreuzthaler, M. Embedding-based terminology expansion via secondary use of large clinical real-world datasets. \u003cem\u003eJ. Biomed. Inf.\u003c/em\u003e \u003cb\u003e147\u003c/b\u003e, 104497. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/J.JBI.2023.104497\u003c/span\u003e\u003cspan address=\"10.1016/J.JBI.2023.104497\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTariq, A. et al. Contrastive diagnostic embedding (CDE) model for automated coding - A case study using emergency department encounters. \u003cem\u003eInt. J. Med. Inf.\u003c/em\u003e \u003cb\u003e179\u003c/b\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/J.IJMEDINF.2023.105212\u003c/span\u003e\u003cspan address=\"10.1016/J.IJMEDINF.2023.105212\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee, J. et al. Automating surgical procedure extraction for society of surgeons adult cardiac surgery registry using pretrained language models. \u003cem\u003eJAMIA Open.\u003c/em\u003e \u003cb\u003e7\u003c/b\u003e, ooae054. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/jamiaopen/ooae054\u003c/span\u003e\u003cspan address=\"10.1093/jamiaopen/ooae054\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTavabi, N. et al. Building large-scale registries from unstructured clinical notes using a low-resource natural language processing pipeline. \u003cem\u003eArtif. Intell. Med.\u003c/em\u003e \u003cb\u003e151\u003c/b\u003e, 102847. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/J.ARTMED.2024.102847\u003c/span\u003e\u003cspan address=\"10.1016/J.ARTMED.2024.102847\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePercha, B., Pisapati, K., Gao, C. \u0026amp; Schmidt, H. Natural language inference for curation of structured clinical registries from unstructured text. \u003cem\u003eJ. Am. Med. Inf. Assoc.\u003c/em\u003e \u003cb\u003e29\u003c/b\u003e, 97. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/JAMIA/OCAB243\u003c/span\u003e\u003cspan address=\"10.1093/JAMIA/OCAB243\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim, J. S. et al. Can Natural Language Processing and Artificial Intelligence Automate The Generation of Billing Codes From Operative Note Dictations? \u003cem\u003eGlobal Spine J.\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e:1946\u0026ndash;1955. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1177/21925682211062831/ASSET/7D49F803-1927-47A9-A748-5A82F1C1C4D3/ASSETS/IMAGES/LARGE/10.1177_21925682211062831-FIG5.JPG\u003c/span\u003e\u003cspan address=\"10.1177/21925682211062831/ASSET/7D49F803-1927-47A9-A748-5A82F1C1C4D3/ASSETS/IMAGES/LARGE/10.1177_21925682211062831-FIG5.JPG\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChiu, B. \u0026amp; Baker, S. Word embeddings for biomedical natural language processing: A survey. \u003cem\u003eLang. Linguist Compass\u003c/em\u003e. \u003cb\u003e14\u003c/b\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/LNC3.12402\u003c/span\u003e\u003cspan address=\"10.1111/LNC3.12402\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2020).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShah, N. H., Entwistle, D. \u0026amp; Pfeffer, M. A. Creation and Adoption of Large Language Models in Medicine. \u003cem\u003eJAMA\u003c/em\u003e \u003cb\u003e330\u003c/b\u003e, 866\u0026ndash;869. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1001/JAMA.2023.14217\u003c/span\u003e\u003cspan address=\"10.1001/JAMA.2023.14217\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWelcome Gemma 2 - Google\u0026rsquo;s new open LLM. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://huggingface.co/blog/gemma2\u003c/span\u003e\u003cspan address=\"https://huggingface.co/blog/gemma2\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOpen Source Strikes Bread - New Fluffy Embedding Model - Mixedbread. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.mixedbread.com/blog/mxbai-embed-large-v1\u003c/span\u003e\u003cspan address=\"https://www.mixedbread.com/blog/mxbai-embed-large-v1\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, X. \u0026amp; (李婧), J. L. AoE: Angle-optimized Embeddings for Semantic Textual Similarity. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 1:1825\u0026ndash;1839. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.18653/V1/2024.ACL-LONG.101\u003c/span\u003e\u003cspan address=\"10.18653/V1/2024.ACL-LONG.101\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGitHub - chroma-core/chroma: the AI-native open-source embedding database. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/chroma-core/chroma\u003c/span\u003e\u003cspan address=\"https://github.com/chroma-core/chroma\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGitHub - ollama/ollama. Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ollama/ollama\u003c/span\u003e\u003cspan address=\"https://github.com/ollama/ollama\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGitHub - OHDSI/Usagi. Usagi is an application to help create mappings between coding systems and the Vocabulary standard concepts. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/OHDSI/Usagi\u003c/span\u003e\u003cspan address=\"https://github.com/OHDSI/Usagi\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNHS Classifications OPCS-4 - TRUD. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://isd.digital.nhs.uk/trud/user/guest/group/0/pack/10\u003c/span\u003e\u003cspan address=\"https://isd.digital.nhs.uk/trud/user/guest/group/0/pack/10\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 25 Apr 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChaturvedi, J., Wang, T., Velupillai, S., Stewart, R. \u0026amp; Roberts, A. Development of a Knowledge Graph Embeddings Model for Pain. AMIA Annual Symposium Proceedings 2023:299 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu, M., Yang, Q., Gao, Z., Yuan, Y. \u0026amp; Liu, J. FedBM: Stealing knowledge from pre-trained language models for heterogeneous federated learning. \u003cem\u003eMed. Image Anal.\u003c/em\u003e \u003cb\u003e102\u003c/b\u003e \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/J.MEDIA.2025.103524\u003c/span\u003e\u003cspan address=\"10.1016/J.MEDIA.2025.103524\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGitHub - HugoGuillen/MAPCARE: MAP-CARE: Multilingual Approach for Procedures in Clinical and Retrieval Embeddings. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/HugoGuillen/MAPCARE\u003c/span\u003e\u003cspan address=\"https://github.com/HugoGuillen/MAPCARE\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 16 May 2025.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"semantic embedding, interoperability, LLM, cross-language mapping, medical procedures, non-English healthcare systems, terminology, classification","lastPublishedDoi":"10.21203/rs.3.rs-6848278/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6848278/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground: \u003c/strong\u003eCross-lingual information retrieval\u003cstrong\u003e \u003c/strong\u003elimits global exchange of data because of the high diversity in the methods to classify, document and encode medical procedures. Traditional keyword-based or single-language systems are not able to align data from surgical and interventional procedures, especially from non-English healthcare systems. This study aims to develop a pipeline for cross-lingual retrieval and integration of medical procedures data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults: \u003c/strong\u003eMAP-CARE is a novel framework that leverages Large Language Models (LLMs) for translating and transforming medical procedures into a unified multilingual embedding space. \u003cstrong\u003eS\u003c/strong\u003eemantic embeddings are used to enhance retrieval accuracy and interoperability across languages and healthcare systems. MAP-CARE demonstrated high accuracy in the translation and mapping of clinical terms. Its cross-language translation performance proved robust, achieving up to 90% accuracy in translating procedure classification codes across English, German, French, and Italian—when considering the correct term among the top five retrieved. The cross-classification mapping workflow also showed high accuracy in aligning two different national procedure classifications, with exact and near matches exceeding 53.8% at the most granular level.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion: \u003c/strong\u003eMAP-CARE offers a flexible, scalable, and robust solution for the multilingual and cross-system integration of medical procedural data. Its innovative use of large language models (LLMs) combined with semantic embeddings sets a new standard for the accessibility and utility of multilingual medical information. The framework is designed for easy extension from a terminology file in CSV format and is publicly available [1].\u003c/p\u003e","manuscriptTitle":"MAP-CARE: Enhancing Cross-Lingual Medical Intervention Terms Analysis Through LLM-supported Semantic Embeddings","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-13 09:23:12","doi":"10.21203/rs.3.rs-6848278/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-06-13T17:47:34+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-06-10T13:03:46+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-06-10T13:00:11+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e37c0473-0deb-4583-bcfa-7eb9af53be76","owner":[],"postedDate":"June 13th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":49820703,"name":"Biological sciences/Computational biology and bioinformatics/Classification and taxonomy"},{"id":49820704,"name":"Biological sciences/Computational biology and bioinformatics/Data integration"},{"id":49820705,"name":"Biological sciences/Computational biology and bioinformatics/Computational models"}],"tags":[],"updatedAt":"2026-01-12T16:00:24+00:00","versionOfRecord":{"articleIdentity":"rs-6848278","link":"https://doi.org/10.1038/s41598-025-34778-7","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2026-01-09 15:57:14","publishedOnDateReadable":"January 9th, 2026"},"versionCreatedAt":"2025-06-13 09:23:12","video":"","vorDoi":"10.1038/s41598-025-34778-7","vorDoiUrl":"https://doi.org/10.1038/s41598-025-34778-7","workflowStages":[]},"version":"v1","identity":"rs-6848278","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6848278","identity":"rs-6848278","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.