Full text
6,926 characters
· extracted from
preprint-html
· click to expand
Semantic Traceability Across Software Artifacts: A Domain-Centric Approach Using Embedding-Based Mapping | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 12 May 2025 V1 Latest version Share on Semantic Traceability Across Software Artifacts: A Domain-Centric Approach Using Embedding-Based Mapping Authors : Zaki Pauzi 0000-0003-4032-4766 [email protected] , Cezar Sas 0000-0002-3018-0140 , and Andrea Capiluppi 0000-0001-9469-6050 Authors Info & Affiliations https://doi.org/10.22541/au.174704970.03040102/v1 358 views 225 downloads Contents Abstract Supplementary Material Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract As software artifacts continuously evolve and increase in number, failure to manage traceability leads to feature inconsistencies, untested critical functionality, and increased maintenance costs. Automated traceability is therefore essential to prevent gaps in coverage between source code, documentation, and tests. The need for tracing to application domains is critical to understand the classification of semantics and the coverage (i.e., which application domain is present in each artifact?). In this paper, we propose the notion of using NLP to map concepts emerging from software artifacts to application domains, and tracing these between artifacts. We extracted the corpus keywords from source code, documentation, and tests. We ran an optimised Latent Dirichlet Allocation (LDA) to generate the concepts emerging from each artifact. We then calculated the similarity scores of each concept against each application domain, and ranked the difference of these scores between pairwise artifacts. Results show that the ranking of the inverse of the difference represents the strength of tracing in semantics, and different embeddings show varying results. We observed the strong applicability of our method and its replicability by other researchers and practitioners, particularly in detecting synchronised application domains that are traced between artifacts. Supplementary Material File (spe__concept_coverage_between_artifacts.pdf) Download 453.50 KB Information & Authors Information Version history V1 Version 1 12 May 2025 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords information retrieval large language model natural language processing software traceability Authors Affiliations Zaki Pauzi 0000-0003-4032-4766 [email protected] Rijksuniversiteit Groningen Bernoulli Institute for Mathematics Computer Science and Artificial Intelligence View all articles by this author Cezar Sas 0000-0002-3018-0140 Rijksuniversiteit Groningen Bernoulli Institute for Mathematics Computer Science and Artificial Intelligence View all articles by this author Andrea Capiluppi 0000-0001-9469-6050 Rijksuniversiteit Groningen Bernoulli Institute for Mathematics Computer Science and Artificial Intelligence View all articles by this author Metrics & Citations Metrics Article Usage 358 views 225 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Zaki Pauzi, Cezar Sas, Andrea Capiluppi. Semantic Traceability Across Software Artifacts: A Domain-Centric Approach Using Embedding-Based Mapping. Authorea . 12 May 2025. DOI: https://doi.org/10.22541/au.174704970.03040102/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.174704970.03040102/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9ffbce519ede06e7',t:'MTc3OTQ1Mjc1MQ=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.