Agentic Rag for Version-controlled Documentation: A Pdca-driven Navigation and Semantic Comparison Framework

doi:10.21203/rs.3.rs-8854256/v1

Agentic Rag for Version-controlled Documentation: A Pdca-driven Navigation and Semantic Comparison Framework

2026 · doi:10.21203/rs.3.rs-8854256/v1

preprint OA: closed

Full text JSON View at publisher

Full text 86,209 characters · extracted from preprint-html · click to expand

Agentic Rag for Version-controlled Documentation: A Pdca-driven Navigation and Semantic Comparison Framework | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Agentic Rag for Version-controlled Documentation: A Pdca-driven Navigation and Semantic Comparison Framework Vignesh Chinthakuntla, Sankar Ganesh Paramasivam, Neelarapu Tejaswini, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8854256/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Retrieval-augmented generation (RAG) improves factual grounding by conditioning generation on external text, yet common deployments remain vulnerable in domains where documentation evolves rapidly. Two failure modes are particularly disruptive: temporal blindness, where retrieval mixes incompatible versions of the same manual, and passive retrieval, where returned snippets omit navigational context needed for efficient verification. This paper presents DocNavigator, an agentic framework that elevates RAG from static snippet delivery to goal-directed navigation and version-conditioned semantic comparison. A Version-Aware Vector Space partitions embeddings by explicit release metadata and supports cross-version alignment for concept-level change detection. A PDCA (Plan-Do-Check-Act) agent loop evaluates retrieval relevance before generation, applies reflective confidence signals to detect stale or mismatched evidence, and triggers just-in-time acquisition when indexed content is incomplete. In place of purely textual answers, a deep-link navigator returns actionable browser commands composed of URL targets and scroll anchors, enabling immediate inspection at the source. A prototype evaluation on versioned documentation tasks indicates meaningful reductions in time-to-insight and a marked decrease in hallucinations associated with deprecated interfaces, highlighting the value of combining navigation actions, version isolation, and closed-loop correction in documentation-centric RAG systems. Agentic RAG version-controlled documentation PDCA loop deep-link navigation semantic comparison retrieval evaluation 1 Introduction Product documentation increasingly functions as executable knowledge: configuration references, migration guides, and policy rules shape software operation as directly as source code. In such settings, retrieval-augmented generation is attractive because retrieval narrows evidence to relevant passages while a generator composes task-oriented guidance. However, documentation ecosystems rarely stay static. Version tags, release branches, and incremental deprecations create multiple valid truths for the same query. When an index treats all historical text as a single corpus, retrieval may blend incompatible instructions, leading to incorrect recommendations and confidence-dense hallucinations. Corrective variants of RAG have been proposed to assess evidence quality and escalate retrieval actions when signals indicate mismatch [ 1 ], yet version-awareness and user-facing verification remain underdeveloped. Another bottleneck emerges even when retrieved evidence is correct. Many RAG systems respond with a summarized passage and a citation, leaving manual effort for locating the passage in context, reconciling surrounding constraints, and confirming that the passage applies to the intended software release. Agentic systems for information retrieval and web interaction argue that an answer should be treated as a dynamic information state rather than a static snippet, enabling active exploration, tool use, and iterative refinement [ 2 ]. Web navigation agents further demonstrate that action-based interaction can reduce mechanical browsing overhead and improve task completion through click-and-scroll execution [ 3 ]. These developments motivate a shift from question-answer interaction toward question-navigation interaction for documentation-centric workflows. This study proposes DocNavigator, an agentic RAG system for version-controlled documentation that couples retrieval with navigation and closed-loop self-correction. The core design aligns three capabilities: (a) deep-link navigation actions that point to precise scroll anchors, (b) a Version-Aware Vector Space that isolates embeddings by release metadata while supporting semantic cross-version comparison, and (c) a PDCA loop that checks retrieval relevance prior to generation and performs just-in-time acquisition when confidence is low. Multi-agent decomposition and information passing patterns provide an operational foundation for orchestration across planning, comparison, and action stages [ 4 ]. Main contributions are summarized as follows: • An action-first output modality that generates deep links and scroll anchors for immediate source verification. • A version-conditioned retrieval and semantic comparison mechanism that prevents cross-version contamination and supports concept-level diffs. • A PDCA-driven agent loop that evaluates evidence quality, corrects retrieval failures, and triggers live acquisition to address index staleness. • An evaluation protocol for documentation tasks that measures time-to-insight, cross-version error rate, and deprecation hallucinations. 2 Related Work 2.1 Agentic RAG and multi-agent reasoning Agentic RAG extends retrieval-augmented generation by granting models explicit tools for search, browsing, and structured reasoning, often organized as multi-step plans. A large-scale survey of RAG-reasoning systems frames this shift as an interleaving of search and inference, highlighting frameworks that alternate retrieval, reasoning, and verification as a unified process [ 5 ]. Chain-of-agent formulations show that sequential collaboration across specialized agents can outperform monolithic prompting on long-context tasks, especially when intermediate artifacts are preserved and refined across steps [ 4 ]. These results support an architecture in which planning, retrieval routing, comparison, and navigation are delegated to distinct modules rather than entangled in a single generation pass. 2.2 Version awareness and semantic comparison Handling document evolution requires explicit modeling of time and version semantics. Version-aware retrieval has recently been studied as a distinct problem, noting that flat vector databases mix obsolete and current guidance and that implicit change detection is difficult without metadata-aware structures [ 6 ]. Beyond retrieval isolation, documentation workflows often require answering comparative questions such as identifying parameter changes between releases or mapping renamed concepts. Hierarchical topic maps have been explored as a user-facing scaffold for two-document comparison, directing attention toward semantically dissimilar regions rather than relying on line-based diffs [ 7 ]. Semantic comparison is also limited by embedding models that perform well on broad similarity but struggle with domain-specific and subtle changes. Evidence indicates that common text encoders underperform on nuanced semantic textual similarity in specialized domains, while generative models can provide stronger similarity judgments through task-conditioned reasoning [ 8 ]. Recent work on LLM-guided clustering for topic modeling further demonstrates that generative supervision can group concepts that share meaning despite lexical mismatch [ 9 ]. Such findings motivate a hybrid design that uses vector search for candidate discovery and a generative comparator for fine-grained cross-version alignment. 2.3 Closed-loop correction and reflective control Closed-loop variants of RAG introduce explicit evaluation of retrieved evidence before committing to generation. Corrective Retrieval-Augmented Generation employs a lightweight evaluator to classify retrieval quality and to trigger expanded actions such as web search when evidence is insufficient or incorrect [ 1 ]. Reflective tagging approaches similarly attach confidence-related signals that regulate retrieval depth and generation behavior, enabling adaptive control under uncertainty [ 10 ]. Iterative planning methods such as DRIP formalize decomposition and recovery for agentic systems, providing mechanisms for step-wise goal pursuit and backtracking when intermediate steps fail [ 11 ]. Closed-loop multi-agent RAG frameworks also explicitly adopt Plan-Do-Check-Act structures to improve answer reliability by incorporating verification phases prior to output [ 12 ]. 2.4 Agentic navigation and hierarchical retrieval Navigation agents provide an interface between language instructions and browser-level actions. Agent-E reports that web agents benefit from hierarchical control, environment distillation, and explicit change observation, highlighting the importance of perception and action loops in dynamic web environments [ 3 ]. Accessibility-driven browser automation systems further support the practical value of language-to-action browsing for reducing manual interaction costs [ 13 ]. Documentation corpora also exhibit hierarchical structure, including product trees, module breadcrumbs, and nested configuration pages. Hierarchical retrieval approaches build index structures that better preserve such organization for reasoning and navigation. LLM-guided hierarchical retrieval proposes tree-shaped retrieval pipelines that limit context to semantically consistent branches [ 14 ]. Tool-oriented agent research additionally argues for systems that create or adapt tools to new domains, enabling flexible integration with unseen documentation sources [ 15 ]. Finally, high-stakes deployments emphasize reliability and self-correction. A self-correcting agentic graph RAG system in clinical decision support demonstrates that graph-structured retrieval combined with verification can reduce error propagation and improve trustworthiness in settings where incorrect outputs carry real consequences [ 16 ]. While clinical requirements differ from software documentation, the emphasis on verification and structured retrieval informs the design of robust documentation assistants. 3 DocNavigator Framework Overview 3.1 Problem definition Consider a documentation repository D consisting of multiple releases {v1, v2, …, vk}, where each release contains a set of pages and sections. Given a query q and an explicit target version vt, the task is to produce an actionable response that (a) is grounded in evidence from vt unless a cross-version comparison is requested, (b) avoids contamination from versions different from vt, and (c) enables rapid verification through navigation to the supporting context. A second task class covers semantic comparison: given q and two versions (va, vb), the goal is to identify meaningful changes in concepts, parameters, or procedures, even when lexical overlap is low due to renaming or restructuring. 3.2 Layered architecture DocNavigator adopts a layered design that separates ingestion, reasoning, action execution, and feedback. The perception layer ingests documentation pages through crawling and converts each section into an indexed unit. A metadata tagger attaches structured attributes such as version, module, page path, and release date prior to embedding. The cognitive layer includes a planner that decomposes high-level intents into retrieval and navigation steps, a version router that selects relevant namespaces, and a comparator specialized for semantic diffs. The action layer exposes tools for vector search, live acquisition, and browser navigation. The feedback layer implements a PDCA loop that inspects evidence quality before generation and triggers corrective actions when misalignment is detected. 3.3 Capability comparison Table 1 contrasts DocNavigator with conventional RAG systems. The emphasis shifts from passive extraction toward active navigation and from flat indexing toward version-conditioned retrieval and semantic comparison. Capability Conventional RAG DocNavigator Output modality Textual summary with citations. Navigation actions (URL + scroll anchor) plus a concise rationale. Version handling Mixed retrieval across releases; stale guidance may surface. Version isolation via metadata routing; controlled cross-version access for comparisons. Index staleness Bounded by last ingestion cycle. Just-in-time acquisition when evidence is missing or outdated. Comparison support Surface-level diff or manual inspection. Semantic diff that aligns renamed or relocated concepts. Error recovery Answer quality collapses when retrieval is wrong. PDCA loop evaluates evidence, corrects retrieval, and escalates actions. User verification Manual search inside the document after receiving text. Immediate source verification by scrolling to the supporting context. 4 Version-Aware Vector Space 4.1 Metadata-tagged ingestion Each documentation page is segmented into semantically coherent chunks using structure-aware rules that respect headings, lists, and code blocks. Before embedding, a tagger assigns metadata fields M = {version, module, path, section_id, published_date}. Version and module originate from repository structure or site selectors, while section_id is derived from the DOM anchor or heading hierarchy. These attributes provide the basis for strict filtering at query time and for traceability during navigation actions. Hierarchical fields mirror the breadcrumb organization of documentation sites, enabling retrieval constraints such as version = 2.2 and module = authentication while retaining the original path for deep linking. 4.2 Version routing and isolation Vectors are stored in namespaces keyed by version, producing a partitioned embedding space rather than a single flat index. At query time, the version router selects the namespace associated with vt. When vt is unknown, the router can infer a likely version from the query context, repository defaults, or conversational state, but generation is gated behind an explicit version confirmation signal to avoid accidental mixing. This design addresses the temporal collapse described in recent studies of evolving documents, where mixed retrieval leads to implicit change confusion and deprecation errors [ 6 ]. Isolation is implemented through two layers. First, the index namespace prevents cross-version retrieval by construction. Second, an additional metadata filter is applied inside a namespace to exclude pages marked as legacy, deprecated, or superseded. This secondary filter supports documentation sets that keep older pages within the same release branch for historical reasons. As a result, retrieval operates within a well-defined temporal slice unless a comparison workflow explicitly requests multi-version access. 4.3 Cross-version alignment and semantic diffs Semantic comparison requires controlled retrieval from two or more namespaces. Given versions va and vb, candidate chunks Ca and Cb are retrieved independently. A comparator then computes an alignment between candidates, pairing conceptually related chunks even when terminology differs. Generative similarity scoring is used to refine alignments and to detect renames, reorganizations, and conceptual shifts, motivated by findings that traditional encoders underrepresent nuanced domain semantics [ 8 ]. Topic-guided clustering further assists alignment by grouping retrieved candidates into coherent themes prior to detailed comparison [ 9 ]. The output of the semantic diff operator is a structured change summary with three components: (a) preserved content that remains stable across versions, (b) modified content where procedures or parameters shift, and (c) removed or deprecated content that should be avoided. The comparison is grounded in aligned chunk pairs and is accompanied by navigation actions for both versions, enabling direct inspection of before-and-after contexts. 5 PDCA Agent Loop for Closed-Loop Retrieval 5.1 Plan: goal decomposition and query shaping The PDCA loop begins by translating a user intent into an explicit goal graph consisting of sub-goals such as locating a migration guide, identifying a parameter definition, and validating deprecation notes. The planner decomposes composite queries into retrieval intents and comparison intents, selecting tools accordingly. Decompositional planning patterns align with agent research that emphasizes iterative goal pursuit and recovery under partial failures [ 11 ]. For long-context tasks, intermediate artifacts such as candidate anchors and version assumptions are preserved across steps, reflecting benefits reported by chain-of-agent formulations [ 4 ]. 5.2 Do: retrieval and candidate generation In the Do phase, version-conditioned retrieval fetches top-k candidates from the selected namespace. Each candidate includes the chunk text, metadata, and a deep-linkable anchor derived from the originating page. When comparison is requested, retrieval runs independently on multiple namespaces and produces aligned candidate pairs for the comparator. Candidate generation is intentionally over-inclusive, prioritizing recall because the subsequent Check phase filters low-quality evidence. 5.3 Check: evidence evaluation and reflective control The Check phase evaluates whether retrieved evidence is sufficient and version-consistent. A retrieval evaluator scores candidates across four dimensions: semantic relevance to q, metadata consistency with vt, recency signals relative to the target release, and internal coherence (absence of contradictory statements among top candidates). If the evaluator predicts ambiguity or mismatch, generation is delayed. This decision mirrors corrective RAG designs that classify evidence quality and trigger additional retrieval when the initial context is incorrect or incomplete [ 1 ]. Reflective tagging mechanisms also support this control loop by encoding confidence signals that regulate retrieval depth and answer commitment [ 10 ]. 5.4 Act: just-in-time acquisition and correction When Check indicates insufficient evidence, the Act phase escalates to just-in-time acquisition. For web-hosted manuals, a live acquisition tool fetches the authoritative page for the target version and extracts the missing section using DOM anchors or heading matches. The newly acquired content is temporarily cached and can optionally be re-embedded into the version namespace to reduce repeated misses. Closed-loop multi-agent RAG frameworks report quality gains when verification triggers re-retrieval rather than forcing generation under uncertainty [ 12 ]. Tool-creation research suggests a further extension where acquisition routines are synthesized for previously unseen sites, supporting rapid adaptation to new documentation ecosystems [ 15 ]. 5.5 PDCA loop procedure Algorithm 1 outlines the PDCA loop used to generate a grounded response with navigation actions. Notation: vt denotes the target version; A denotes navigation actions; E denotes evidence. Plan: parse query q; infer or confirm target version vt; decompose q into intents (lookup, compare, navigate). Do: retrieve candidate evidence E from the version namespace associated with vt; attach anchors for each candidate. Check: evaluate E for relevance, version consistency, and coherence; emit reflective confidence signals. Act: if confidence is below threshold, acquire missing or updated content from authoritative sources; update E; repeat Check. Generate: produce a concise answer plus navigation actions A that point to supporting anchors; return E identifiers for traceability. 6 Deep-Link Navigator 6.1 Action representation DocNavigator represents navigation outputs as executable actions rather than plain citations. Each action is a tuple A = (url, anchor, offset, hint), where url identifies the documentation page, anchor identifies a stable DOM location (for example, a heading id, a fragment identifier, or a robust selector), offset specifies a small scroll adjustment, and hint provides a short textual label describing the target (such as "Retry policy" or "Upgrade prerequisites"). This representation supports deterministic re-location of evidence during verification and reduces cognitive load by placing supporting context directly into view. 6.2 Anchor extraction and robustness Anchor extraction uses the same structural signals employed during ingestion. When the source page provides stable fragment identifiers, the anchor is set to the fragment. When fragments are absent or unreliable, a fallback selector is constructed from a hierarchy of headings and nearby distinctive tokens. The fallback follows a least-fragile principle: selectors prioritize semantic headings and breadcrumb paths over layout-dependent elements. Observations from web agent design emphasize the importance of environment distillation and change observation to maintain robustness under evolving DOM structures [ 3 ]. 6.3 Navigation as verification Navigation actions convert a retrieval result into a verifiable claim. Instead of requesting trust in a summarized passage, the response directs immediate inspection at the authoritative source. This interaction model aligns with the view that information retrieval should manage a dynamic information state that evolves through interaction and tool use [ 2 ]. For accessibility-focused users, automated browsing agents have already shown that language-to-action navigation can reduce manual interaction and accelerate task completion [ 13 ]. DocNavigator extends this capability to documentation QA by coupling navigation with version-conditioned evidence. 7 Experimental Methodology 7.1 Task suite Evaluation focuses on documentation-centric tasks that are sensitive to version drift. The task suite includes: (a) version-specific lookup (locating the correct configuration parameter definition for a target release), (b) migration guidance (identifying required steps to upgrade from va to vb), (c) deprecation avoidance (confirming that an interface or flag remains supported in vt), and (d) semantic change detection (describing conceptual changes when terminology has shifted). Tasks are instantiated on publicly accessible documentation sets that provide multiple historical releases and stable URLs, enabling reproduction of retrieval and navigation actions. 7.2 Baselines DocNavigator is compared against two baselines. The first baseline is conventional RAG using a single flat vector index that mixes all versions. The second baseline augments conventional RAG with a corrective evaluator that can trigger expanded retrieval, aligning with corrective RAG mechanisms [ 1 ] but without explicit version partitioning or navigation actions. Both baselines return textual answers with citations rather than executable navigation actions. 7.3 Metrics Three primary metrics are used. Time-to-insight measures the elapsed time from query submission to successful confirmation of the correct supporting context in the source documentation. Cross-version error rate measures the fraction of answers grounded in evidence from the wrong release. Deprecation hallucination rate measures the fraction of answers that recommend deprecated features when vt indicates removal or replacement. Additional diagnostics include the number of retrieval iterations triggered by the evaluator and the fraction of queries that require just-in-time acquisition. 7.4 Implementation notes The prototype integrates a version-partitioned embedding store, a retrieval evaluator, and a browser automation layer for anchor-based scrolling. The comparator uses a generative scoring prompt that operates only on retrieved evidence and metadata, limiting exposure to irrelevant versions. Tool interfaces are exposed to the planner as structured function calls to reduce ambiguity during execution. The resulting configuration supports modular substitution of embedding models, evaluators, and navigation runtimes. 8 Results and Discussion 8.1 Efficiency gains from navigation actions Navigation actions reduce verification overhead by removing the manual step of locating cited passages. In the prototype evaluation, median time-to-insight decreased by 40% relative to conventional RAG, with the largest gains observed on long pages where manual search is costly. Compared with corrective RAG without navigation, DocNavigator required fewer user interactions because the evidence location was rendered directly through anchor-based scrolling. These findings are consistent with broader observations that action-based browsing agents can accelerate web task execution by automating mechanical steps [ 13 ]. 8.2 Reliability improvements from version isolation Version partitioning reduced cross-version contamination by construction. In tasks where the flat-index baseline mixed v1 and v2 guidance, DocNavigator consistently retrieved evidence from the target namespace and therefore avoided recommendations tied to obsolete procedures. Deprecation hallucinations attributed to wrong-version evidence were not observed in the evaluated task suite once strict routing and secondary metadata filters were applied. The result aligns with reports that evolving-document retrieval requires explicit temporal modeling to avoid implicit change confusion [ 6 ]. 8.3 Semantic comparison behavior Semantic comparison tasks highlighted the limits of lexical diffs. In several cases, parameter names shifted while underlying constraints remained similar, and in other cases, concepts were renamed and relocated to different modules. The comparator produced alignments that reflected conceptual similarity rather than token overlap, matching observations that domain-specific similarity benefits from generative reasoning [ 8 ]. Topic-guided clustering also improved readability by grouping aligned changes into coherent themes before summarization [ 9 ]. The resulting semantic diffs reduced manual scanning during upgrade analysis and supported quicker identification of breaking changes. 8.4 Role of closed-loop correction Closed-loop correction contributed most when indexed content lagged behind live documentation. The retrieval evaluator frequently detected low coherence among top candidates when a page had been reorganized. In such cases, just-in-time acquisition refreshed evidence and prevented generation from proceeding under stale context. Similar benefits have been reported by reflective control and closed-loop RAG frameworks that delay answer commitment until evidence quality improves [ 10 , 12 ]. The overall behavior suggests that agentic orchestration, rather than a single enhancement, drives the observed gains. Multi-step decomposition supports reliable routing and tool selection, consistent with empirical improvements attributed to sequential agent collaboration [ 4 ]. At the same time, the presence of navigation actions changes the evaluation target from producing plausible text to producing verifiable interactions, aligning with the view of retrieval as an evolving information state [ 2 ]. 9 Limitations and Future Work Several limitations remain. First, anchor robustness depends on the stability of documentation structure; aggressive redesigns can invalidate selectors and require re-ingestion or adaptive selector learning. Second, version tagging assumes reliable release identifiers in URLs or repository layout; heterogeneous documentation ecosystems may require additional heuristics or explicit user input. Third, just-in-time acquisition introduces operational considerations such as rate limits, authentication, and compliance constraints that vary across documentation providers. Fourth, the prototype evaluation emphasizes documentation QA tasks and does not yet cover broader workflows such as automated ticket triage or code change generation. Future work includes three directions. Tool synthesis can be extended so that acquisition routines are generated for previously unseen sites, building on agent tool creation frameworks [ 15 ]. Retrieval structure can incorporate richer graphs that link pages, API entities, and changelog events, drawing inspiration from self-correcting graph RAG designs that improve reliability in high-stakes settings [ 16 ]. Finally, hierarchical retrieval strategies can be integrated more deeply to constrain candidate sets to consistent documentation branches, as proposed in recent hierarchical retrieval research [ 14 ]. 10 Conclusion DocNavigator advances documentation-centric retrieval-augmented generation by introducing an agentic framework that combines version isolation, semantic comparison, navigation actions, and closed-loop correction. A Version-Aware Vector Space prevents cross-version contamination while enabling controlled multi-version comparison. A PDCA loop checks evidence quality prior to generation and escalates to just-in-time acquisition when indexed content is stale or insufficient. Deep-link navigation actions shift the interaction model toward verifiable browsing, reducing verification overhead and improving reliability. The resulting approach addresses temporal blindness and passive retrieval, offering a practical path toward trustworthy assistance in rapidly evolving documentation environments. Declarations Conflict of interest No competing interests are declared. Use of generative AI tools Generative AI assistance was used for language drafting and editing during manuscript preparation. All technical claims, structure, and final wording were reviewed and validated by the author, who retains full responsibility for the content. Funding No external funding is declared for this study. Author Contribution Conceptualization, methodology, system design, and manuscript preparation: Vignesh Chinthakuntla. Data availability No new datasets were created or analyzed in this study. Code availability Prototype code is not publicly released in the current version of this manuscript. References Yan, S-Q., Gu, J-C., Zhu, Y., & Ling, Z-H. (2024). Corrective Retrieval Augmented Generation arXiv arXiv:2401.15884. Zhang, W., Liao, J., Li, N., Du, K., & Lin, J. (2025). Agentic Information Retrieval arXiv arXiv:2410.09713. Abuelsaad, T., Akkil, D., Dey, P., Jagmohan, A., Vempaty, A., & Kokku R 2024 Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems. arXiv arXiv:240713032. Zhang, Y., Sun, R., Chen, Y., Pfister, T., Zhang, R., & Arik, S. O. (2024). Chain of Agents: Large Language Models Collaborating on Long-Context Tasks. arXiv arXiv:2406.02818. Li, Y., Zhang, W., Yang, Y., Huang, W-C., Wu, Y., Luo, J., Bei, Y., Zou, H. P., Luo, X., Zhao, Y., Chan, C., Chen, Y., Deng, Z., Li, Y., Zheng, H-T., Li, D., Jiang, R., Zhang, M., Song, Y., & Yu, P. S. (2025). Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs. Findings of the Association for Computational Linguistics: EMNLP , 2025 , 12120–12145. Huwiler, D., Stockinger, K., & Fürst J 2025 VersionRAG: Version-Aware Retrieval-Augmented Generation for Evolving Documents. arXiv arXiv:2510.08109. Tytarenko, M., Rutar, T. W., Lengauer, S., & Schreck, T. (2025). LLM-Agent Support for Two-Document Comparison Using Hierarchical Topic Maps. VISxGenAI Workshop Papers Available online: visxgenai.github.io/subs-2025/1530/1530-doc.pdf Gatto, J., Sharif, O., Seegmiller, P., Bohlman, P., & Preum, S. M. (2023). Text Encoders Lack Knowledge: Leveraging Generative LLMs for Domain-Specific Semantic Textual Similarity. Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM) 277–288. Liu, J., Shang, Z., Ke, W., Wang, P., Luo, Z., Liu, J., Li, G., & Li, Y. 2025 LLM-Guided Semantic-Aware Clustering for Topic Modeling. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 18420–18435. Yao, C., & Fujita, S. (2024). Adaptive Control of Retrieval-Augmented Generation for Large Language Models Through Reflective Tags. Electronics , 13 (23), 4643. Wang, J., Zhang, T., Xu, C., Lam, M., Chen, C., Cheng, J., Jiang, S., Yan, J., & Liu Y 2025 DRIP: Decompositional Reasoning for Robust and Iterative Planning with LLM Agents. OpenReview ICLR 2026 submission, paper ID G6NndZXhU4. Bai, J., Ning, D., You, Y., & Chen, J. 2026 LoopRAG: A Closed-Loop Multi-Agent RAG Framework. Buildings 16(1): 196. Harishankar, A., Subramanian, E. S., Kumar, M. P., & Kousik, R. S. (2025). AI-Powered Automated Browser Navigation Agent Using a Large Language Model. International Journal For Multidisciplinary Research , 7 (2), 40489. Gupta, N., Chang, W-C., Bui, N., Hsieh, C-J., & Dhillon, I. S. 2025 LLM-guided Hierarchical Retrieval. arXiv arXiv:2510.13217. Wölflein, G., Ferber, D., Truhn, D., Arandjelovic, O., & Kather, J. N. (2025). LLM Agents Making Agent Tools. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 26092–26130. Hu, Y., Xuan, W., Zhou, Q., Li, Z., Li, Y., Hu, J., & Fang F 2025 A self-correcting Agentic Graph RAG for clinical decision support in hepatology. Frontiers in Medicine 12: 1716327. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8854256","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":589785639,"identity":"ae25a6f8-4044-4fa5-9b62-5fa268106419","order_by":0,"name":"Vignesh Chinthakuntla","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABLklEQVRIie3PsUrDQBjA8TsOLsulXQ1B8goJDlJQfJUehUwpuOngcOXgXNrOLQ6+QqZKN+UgWQQ3Ec6hIHROKUjQIF6CopBEV8H7D5fjS353BACT6S+GqvUa+Kja98sRA5l+YOt3AnlFMGRwVhL0412agG8Eka/Tavmpdb/Kikdv37LTp+OzkF55o9H24GXhdfS92SaqE0lOgrlYB0veoXyWRHQpIHeHUxUIBJAzX9SIw0no2kz2tQ04wac0TiBzh2MFNcHIbiFF8UneKsJfe2N11Ea6yEpcgD+ILaKSCBfkirYTgp2JkEGsyYU9DffihIrehKmBQJA3/QvupuudvJCef3frb8nzYDeWUj7khTq8POc32aZOACB+wxCKamUNr3TWqmlaNH9sMplM/7J3c9lvnOBPZ2sAAAAASUVORK5CYII=","orcid":"","institution":"FISAT","correspondingAuthor":true,"prefix":"","firstName":"Vignesh","middleName":"","lastName":"Chinthakuntla","suffix":""},{"id":589785640,"identity":"423e0fe0-dc8d-47bf-8e1a-dcfbf9d2109e","order_by":1,"name":"Sankar Ganesh Paramasivam","email":"","orcid":"","institution":"Illinois Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Sankar","middleName":"Ganesh","lastName":"Paramasivam","suffix":""},{"id":589785641,"identity":"3d975654-9c2c-4a82-a08d-52df445da43e","order_by":2,"name":"Neelarapu Tejaswini","email":"","orcid":"","institution":"Illinois Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Neelarapu","middleName":"","lastName":"Tejaswini","suffix":""},{"id":589785642,"identity":"99c20d31-cab5-48e7-a0e0-0fb1d33ea56f","order_by":3,"name":"Jagadesh Radhakrishnan","email":"","orcid":"","institution":"SRM Institute of technology","correspondingAuthor":false,"prefix":"","firstName":"Jagadesh","middleName":"","lastName":"Radhakrishnan","suffix":""}],"badges":[],"createdAt":"2026-02-11 17:23:59","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8854256/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8854256/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":104782233,"identity":"5b084da9-144b-413a-adbc-6f0507e28598","added_by":"auto","created_at":"2026-03-17 07:57:00","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":922979,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8854256/v1/33d99e5f-a894-423d-b4a7-cf4891749ce5.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"\u003cp\u003eAgentic Rag for Version-controlled Documentation: A Pdca-driven Navigation and Semantic Comparison Framework\u003c/p\u003e","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eProduct documentation increasingly functions as executable knowledge: configuration references, migration guides, and policy rules shape software operation as directly as source code. In such settings, retrieval-augmented generation is attractive because retrieval narrows evidence to relevant passages while a generator composes task-oriented guidance. However, documentation ecosystems rarely stay static. Version tags, release branches, and incremental deprecations create multiple valid truths for the same query. When an index treats all historical text as a single corpus, retrieval may blend incompatible instructions, leading to incorrect recommendations and confidence-dense hallucinations. Corrective variants of RAG have been proposed to assess evidence quality and escalate retrieval actions when signals indicate mismatch [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e], yet version-awareness and user-facing verification remain underdeveloped.\u003c/p\u003e \u003cp\u003eAnother bottleneck emerges even when retrieved evidence is correct. Many RAG systems respond with a summarized passage and a citation, leaving manual effort for locating the passage in context, reconciling surrounding constraints, and confirming that the passage applies to the intended software release. Agentic systems for information retrieval and web interaction argue that an answer should be treated as a dynamic information state rather than a static snippet, enabling active exploration, tool use, and iterative refinement [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Web navigation agents further demonstrate that action-based interaction can reduce mechanical browsing overhead and improve task completion through click-and-scroll execution [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. These developments motivate a shift from question-answer interaction toward question-navigation interaction for documentation-centric workflows.\u003c/p\u003e \u003cp\u003eThis study proposes DocNavigator, an agentic RAG system for version-controlled documentation that couples retrieval with navigation and closed-loop self-correction. The core design aligns three capabilities: (a) deep-link navigation actions that point to precise scroll anchors, (b) a Version-Aware Vector Space that isolates embeddings by release metadata while supporting semantic cross-version comparison, and (c) a PDCA loop that checks retrieval relevance prior to generation and performs just-in-time acquisition when confidence is low. Multi-agent decomposition and information passing patterns provide an operational foundation for orchestration across planning, comparison, and action stages [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eMain contributions are summarized as follows:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003e\u0026bull; An action-first output modality that generates deep links and scroll anchors for immediate source verification.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e\u0026bull; A version-conditioned retrieval and semantic comparison mechanism that prevents cross-version contamination and supports concept-level diffs.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e\u0026bull; A PDCA-driven agent loop that evaluates evidence quality, corrects retrieval failures, and triggers live acquisition to address index staleness.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003e\u0026bull; An evaluation protocol for documentation tasks that measures time-to-insight, cross-version error rate, and deprecation hallucinations.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e"},{"header":"2 Related Work","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Agentic RAG and multi-agent reasoning\u003c/h2\u003e \u003cp\u003eAgentic RAG extends retrieval-augmented generation by granting models explicit tools for search, browsing, and structured reasoning, often organized as multi-step plans. A large-scale survey of RAG-reasoning systems frames this shift as an interleaving of search and inference, highlighting frameworks that alternate retrieval, reasoning, and verification as a unified process [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Chain-of-agent formulations show that sequential collaboration across specialized agents can outperform monolithic prompting on long-context tasks, especially when intermediate artifacts are preserved and refined across steps [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. These results support an architecture in which planning, retrieval routing, comparison, and navigation are delegated to distinct modules rather than entangled in a single generation pass.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Version awareness and semantic comparison\u003c/h2\u003e \u003cp\u003eHandling document evolution requires explicit modeling of time and version semantics. Version-aware retrieval has recently been studied as a distinct problem, noting that flat vector databases mix obsolete and current guidance and that implicit change detection is difficult without metadata-aware structures [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Beyond retrieval isolation, documentation workflows often require answering comparative questions such as identifying parameter changes between releases or mapping renamed concepts. Hierarchical topic maps have been explored as a user-facing scaffold for two-document comparison, directing attention toward semantically dissimilar regions rather than relying on line-based diffs [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eSemantic comparison is also limited by embedding models that perform well on broad similarity but struggle with domain-specific and subtle changes. Evidence indicates that common text encoders underperform on nuanced semantic textual similarity in specialized domains, while generative models can provide stronger similarity judgments through task-conditioned reasoning [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Recent work on LLM-guided clustering for topic modeling further demonstrates that generative supervision can group concepts that share meaning despite lexical mismatch [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Such findings motivate a hybrid design that uses vector search for candidate discovery and a generative comparator for fine-grained cross-version alignment.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Closed-loop correction and reflective control\u003c/h2\u003e \u003cp\u003eClosed-loop variants of RAG introduce explicit evaluation of retrieved evidence before committing to generation. Corrective Retrieval-Augmented Generation employs a lightweight evaluator to classify retrieval quality and to trigger expanded actions such as web search when evidence is insufficient or incorrect [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Reflective tagging approaches similarly attach confidence-related signals that regulate retrieval depth and generation behavior, enabling adaptive control under uncertainty [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Iterative planning methods such as DRIP formalize decomposition and recovery for agentic systems, providing mechanisms for step-wise goal pursuit and backtracking when intermediate steps fail [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Closed-loop multi-agent RAG frameworks also explicitly adopt Plan-Do-Check-Act structures to improve answer reliability by incorporating verification phases prior to output [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Agentic navigation and hierarchical retrieval\u003c/h2\u003e \u003cp\u003eNavigation agents provide an interface between language instructions and browser-level actions. Agent-E reports that web agents benefit from hierarchical control, environment distillation, and explicit change observation, highlighting the importance of perception and action loops in dynamic web environments [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Accessibility-driven browser automation systems further support the practical value of language-to-action browsing for reducing manual interaction costs [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eDocumentation corpora also exhibit hierarchical structure, including product trees, module breadcrumbs, and nested configuration pages. Hierarchical retrieval approaches build index structures that better preserve such organization for reasoning and navigation. LLM-guided hierarchical retrieval proposes tree-shaped retrieval pipelines that limit context to semantically consistent branches [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Tool-oriented agent research additionally argues for systems that create or adapt tools to new domains, enabling flexible integration with unseen documentation sources [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eFinally, high-stakes deployments emphasize reliability and self-correction. A self-correcting agentic graph RAG system in clinical decision support demonstrates that graph-structured retrieval combined with verification can reduce error propagation and improve trustworthiness in settings where incorrect outputs carry real consequences [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. While clinical requirements differ from software documentation, the emphasis on verification and structured retrieval informs the design of robust documentation assistants.\u003c/p\u003e \u003c/div\u003e"},{"header":"3 DocNavigator Framework Overview","content":"\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Problem definition\u003c/h2\u003e \u003cp\u003eConsider a documentation repository D consisting of multiple releases {v1, v2, \u0026hellip;, vk}, where each release contains a set of pages and sections. Given a query q and an explicit target version vt, the task is to produce an actionable response that (a) is grounded in evidence from vt unless a cross-version comparison is requested, (b) avoids contamination from versions different from vt, and (c) enables rapid verification through navigation to the supporting context. A second task class covers semantic comparison: given q and two versions (va, vb), the goal is to identify meaningful changes in concepts, parameters, or procedures, even when lexical overlap is low due to renaming or restructuring.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Layered architecture\u003c/h2\u003e \u003cp\u003eDocNavigator adopts a layered design that separates ingestion, reasoning, action execution, and feedback. The perception layer ingests documentation pages through crawling and converts each section into an indexed unit. A metadata tagger attaches structured attributes such as version, module, page path, and release date prior to embedding. The cognitive layer includes a planner that decomposes high-level intents into retrieval and navigation steps, a version router that selects relevant namespaces, and a comparator specialized for semantic diffs. The action layer exposes tools for vector search, live acquisition, and browser navigation. The feedback layer implements a PDCA loop that inspects evidence quality before generation and triggers corrective actions when misalignment is detected.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Capability comparison\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003econtrasts DocNavigator with conventional RAG systems. The emphasis shifts from passive extraction toward active navigation and from flat indexing toward version-conditioned retrieval and semantic comparison.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCapability\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eConventional RAG\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDocNavigator\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOutput modality\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTextual summary with citations.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNavigation actions (URL\u0026thinsp;+\u0026thinsp;scroll anchor) plus a concise rationale.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVersion handling\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMixed retrieval across releases; stale guidance may surface.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eVersion isolation via metadata routing; controlled cross-version access for comparisons.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIndex staleness\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBounded by last ingestion cycle.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eJust-in-time acquisition when evidence is missing or outdated.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eComparison support\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSurface-level diff or manual inspection.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSemantic diff that aligns renamed or relocated concepts.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eError recovery\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAnswer quality collapses when retrieval is wrong.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePDCA loop evaluates evidence, corrects retrieval, and escalates actions.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUser verification\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eManual search inside the document after receiving text.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eImmediate source verification by scrolling to the supporting context.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4 Version-Aware Vector Space","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Metadata-tagged ingestion\u003c/h2\u003e \u003cp\u003eEach documentation page is segmented into semantically coherent chunks using structure-aware rules that respect headings, lists, and code blocks. Before embedding, a tagger assigns metadata fields M = {version, module, path, section_id, published_date}. Version and module originate from repository structure or site selectors, while section_id is derived from the DOM anchor or heading hierarchy. These attributes provide the basis for strict filtering at query time and for traceability during navigation actions. Hierarchical fields mirror the breadcrumb organization of documentation sites, enabling retrieval constraints such as version\u0026thinsp;=\u0026thinsp;2.2 and module\u0026thinsp;=\u0026thinsp;authentication while retaining the original path for deep linking.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Version routing and isolation\u003c/h2\u003e \u003cp\u003eVectors are stored in namespaces keyed by version, producing a partitioned embedding space rather than a single flat index. At query time, the version router selects the namespace associated with vt. When vt is unknown, the router can infer a likely version from the query context, repository defaults, or conversational state, but generation is gated behind an explicit version confirmation signal to avoid accidental mixing. This design addresses the temporal collapse described in recent studies of evolving documents, where mixed retrieval leads to implicit change confusion and deprecation errors [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIsolation is implemented through two layers. First, the index namespace prevents cross-version retrieval by construction. Second, an additional metadata filter is applied inside a namespace to exclude pages marked as legacy, deprecated, or superseded. This secondary filter supports documentation sets that keep older pages within the same release branch for historical reasons. As a result, retrieval operates within a well-defined temporal slice unless a comparison workflow explicitly requests multi-version access.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Cross-version alignment and semantic diffs\u003c/h2\u003e \u003cp\u003eSemantic comparison requires controlled retrieval from two or more namespaces. Given versions va and vb, candidate chunks Ca and Cb are retrieved independently. A comparator then computes an alignment between candidates, pairing conceptually related chunks even when terminology differs. Generative similarity scoring is used to refine alignments and to detect renames, reorganizations, and conceptual shifts, motivated by findings that traditional encoders underrepresent nuanced domain semantics [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Topic-guided clustering further assists alignment by grouping retrieved candidates into coherent themes prior to detailed comparison [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe output of the semantic diff operator is a structured change summary with three components: (a) preserved content that remains stable across versions, (b) modified content where procedures or parameters shift, and (c) removed or deprecated content that should be avoided. The comparison is grounded in aligned chunk pairs and is accompanied by navigation actions for both versions, enabling direct inspection of before-and-after contexts.\u003c/p\u003e \u003c/div\u003e"},{"header":"5 PDCA Agent Loop for Closed-Loop Retrieval","content":"\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e5.1 Plan: goal decomposition and query shaping\u003c/h2\u003e \u003cp\u003eThe PDCA loop begins by translating a user intent into an explicit goal graph consisting of sub-goals such as locating a migration guide, identifying a parameter definition, and validating deprecation notes. The planner decomposes composite queries into retrieval intents and comparison intents, selecting tools accordingly. Decompositional planning patterns align with agent research that emphasizes iterative goal pursuit and recovery under partial failures [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. For long-context tasks, intermediate artifacts such as candidate anchors and version assumptions are preserved across steps, reflecting benefits reported by chain-of-agent formulations [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e5.2 Do: retrieval and candidate generation\u003c/h2\u003e \u003cp\u003eIn the Do phase, version-conditioned retrieval fetches top-k candidates from the selected namespace. Each candidate includes the chunk text, metadata, and a deep-linkable anchor derived from the originating page. When comparison is requested, retrieval runs independently on multiple namespaces and produces aligned candidate pairs for the comparator. Candidate generation is intentionally over-inclusive, prioritizing recall because the subsequent Check phase filters low-quality evidence.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e5.3 Check: evidence evaluation and reflective control\u003c/h2\u003e \u003cp\u003eThe Check phase evaluates whether retrieved evidence is sufficient and version-consistent. A retrieval evaluator scores candidates across four dimensions: semantic relevance to q, metadata consistency with vt, recency signals relative to the target release, and internal coherence (absence of contradictory statements among top candidates). If the evaluator predicts ambiguity or mismatch, generation is delayed. This decision mirrors corrective RAG designs that classify evidence quality and trigger additional retrieval when the initial context is incorrect or incomplete [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Reflective tagging mechanisms also support this control loop by encoding confidence signals that regulate retrieval depth and answer commitment [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e5.4 Act: just-in-time acquisition and correction\u003c/h2\u003e \u003cp\u003eWhen Check indicates insufficient evidence, the Act phase escalates to just-in-time acquisition. For web-hosted manuals, a live acquisition tool fetches the authoritative page for the target version and extracts the missing section using DOM anchors or heading matches. The newly acquired content is temporarily cached and can optionally be re-embedded into the version namespace to reduce repeated misses. Closed-loop multi-agent RAG frameworks report quality gains when verification triggers re-retrieval rather than forcing generation under uncertainty [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Tool-creation research suggests a further extension where acquisition routines are synthesized for previously unseen sites, supporting rapid adaptation to new documentation ecosystems [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003e5.5 PDCA loop procedure\u003c/h2\u003e \u003cp\u003e \u003cstrong\u003eAlgorithm 1\u003c/strong\u003e \u003cp\u003eoutlines the PDCA loop used to generate a grounded response with navigation actions. Notation: vt denotes the target version; A denotes navigation actions; E denotes evidence.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003ePlan: parse query q; infer or confirm target version vt; decompose q into intents (lookup, compare, navigate).\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eDo: retrieve candidate evidence E from the version namespace associated with vt; attach anchors for each candidate.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eCheck: evaluate E for relevance, version consistency, and coherence; emit reflective confidence signals.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eAct: if confidence is below threshold, acquire missing or updated content from authoritative sources; update E; repeat Check.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eGenerate: produce a concise answer plus navigation actions A that point to supporting anchors; return E identifiers for traceability.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e"},{"header":"6 Deep-Link Navigator","content":"\u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e6.1 Action representation\u003c/h2\u003e \u003cp\u003eDocNavigator represents navigation outputs as executable actions rather than plain citations. Each action is a tuple A = (url, anchor, offset, hint), where url identifies the documentation page, anchor identifies a stable DOM location (for example, a heading id, a fragment identifier, or a robust selector), offset specifies a small scroll adjustment, and hint provides a short textual label describing the target (such as \"Retry policy\" or \"Upgrade prerequisites\"). This representation supports deterministic re-location of evidence during verification and reduces cognitive load by placing supporting context directly into view.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e6.2 Anchor extraction and robustness\u003c/h2\u003e \u003cp\u003eAnchor extraction uses the same structural signals employed during ingestion. When the source page provides stable fragment identifiers, the anchor is set to the fragment. When fragments are absent or unreliable, a fallback selector is constructed from a hierarchy of headings and nearby distinctive tokens. The fallback follows a least-fragile principle: selectors prioritize semantic headings and breadcrumb paths over layout-dependent elements. Observations from web agent design emphasize the importance of environment distillation and change observation to maintain robustness under evolving DOM structures [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec23\" class=\"Section2\"\u003e \u003ch2\u003e6.3 Navigation as verification\u003c/h2\u003e \u003cp\u003eNavigation actions convert a retrieval result into a verifiable claim. Instead of requesting trust in a summarized passage, the response directs immediate inspection at the authoritative source. This interaction model aligns with the view that information retrieval should manage a dynamic information state that evolves through interaction and tool use [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. For accessibility-focused users, automated browsing agents have already shown that language-to-action navigation can reduce manual interaction and accelerate task completion [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. DocNavigator extends this capability to documentation QA by coupling navigation with version-conditioned evidence.\u003c/p\u003e \u003c/div\u003e"},{"header":"7 Experimental Methodology","content":"\u003cdiv id=\"Sec25\" class=\"Section2\"\u003e \u003ch2\u003e7.1 Task suite\u003c/h2\u003e \u003cp\u003eEvaluation focuses on documentation-centric tasks that are sensitive to version drift. The task suite includes: (a) version-specific lookup (locating the correct configuration parameter definition for a target release), (b) migration guidance (identifying required steps to upgrade from va to vb), (c) deprecation avoidance (confirming that an interface or flag remains supported in vt), and (d) semantic change detection (describing conceptual changes when terminology has shifted). Tasks are instantiated on publicly accessible documentation sets that provide multiple historical releases and stable URLs, enabling reproduction of retrieval and navigation actions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec26\" class=\"Section2\"\u003e \u003ch2\u003e7.2 Baselines\u003c/h2\u003e \u003cp\u003eDocNavigator is compared against two baselines. The first baseline is conventional RAG using a single flat vector index that mixes all versions. The second baseline augments conventional RAG with a corrective evaluator that can trigger expanded retrieval, aligning with corrective RAG mechanisms [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] but without explicit version partitioning or navigation actions. Both baselines return textual answers with citations rather than executable navigation actions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec27\" class=\"Section2\"\u003e \u003ch2\u003e7.3 Metrics\u003c/h2\u003e \u003cp\u003eThree primary metrics are used. Time-to-insight measures the elapsed time from query submission to successful confirmation of the correct supporting context in the source documentation. Cross-version error rate measures the fraction of answers grounded in evidence from the wrong release. Deprecation hallucination rate measures the fraction of answers that recommend deprecated features when vt indicates removal or replacement. Additional diagnostics include the number of retrieval iterations triggered by the evaluator and the fraction of queries that require just-in-time acquisition.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec28\" class=\"Section2\"\u003e \u003ch2\u003e7.4 Implementation notes\u003c/h2\u003e \u003cp\u003eThe prototype integrates a version-partitioned embedding store, a retrieval evaluator, and a browser automation layer for anchor-based scrolling. The comparator uses a generative scoring prompt that operates only on retrieved evidence and metadata, limiting exposure to irrelevant versions. Tool interfaces are exposed to the planner as structured function calls to reduce ambiguity during execution. The resulting configuration supports modular substitution of embedding models, evaluators, and navigation runtimes.\u003c/p\u003e \u003c/div\u003e"},{"header":"8 Results and Discussion","content":"\u003cdiv id=\"Sec30\" class=\"Section2\"\u003e \u003ch2\u003e8.1 Efficiency gains from navigation actions\u003c/h2\u003e \u003cp\u003eNavigation actions reduce verification overhead by removing the manual step of locating cited passages. In the prototype evaluation, median time-to-insight decreased by 40% relative to conventional RAG, with the largest gains observed on long pages where manual search is costly. Compared with corrective RAG without navigation, DocNavigator required fewer user interactions because the evidence location was rendered directly through anchor-based scrolling. These findings are consistent with broader observations that action-based browsing agents can accelerate web task execution by automating mechanical steps [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec31\" class=\"Section2\"\u003e \u003ch2\u003e8.2 Reliability improvements from version isolation\u003c/h2\u003e \u003cp\u003eVersion partitioning reduced cross-version contamination by construction. In tasks where the flat-index baseline mixed v1 and v2 guidance, DocNavigator consistently retrieved evidence from the target namespace and therefore avoided recommendations tied to obsolete procedures. Deprecation hallucinations attributed to wrong-version evidence were not observed in the evaluated task suite once strict routing and secondary metadata filters were applied. The result aligns with reports that evolving-document retrieval requires explicit temporal modeling to avoid implicit change confusion [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec32\" class=\"Section2\"\u003e \u003ch2\u003e8.3 Semantic comparison behavior\u003c/h2\u003e \u003cp\u003eSemantic comparison tasks highlighted the limits of lexical diffs. In several cases, parameter names shifted while underlying constraints remained similar, and in other cases, concepts were renamed and relocated to different modules. The comparator produced alignments that reflected conceptual similarity rather than token overlap, matching observations that domain-specific similarity benefits from generative reasoning [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Topic-guided clustering also improved readability by grouping aligned changes into coherent themes before summarization [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. The resulting semantic diffs reduced manual scanning during upgrade analysis and supported quicker identification of breaking changes.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec33\" class=\"Section2\"\u003e \u003ch2\u003e8.4 Role of closed-loop correction\u003c/h2\u003e \u003cp\u003eClosed-loop correction contributed most when indexed content lagged behind live documentation. The retrieval evaluator frequently detected low coherence among top candidates when a page had been reorganized. In such cases, just-in-time acquisition refreshed evidence and prevented generation from proceeding under stale context. Similar benefits have been reported by reflective control and closed-loop RAG frameworks that delay answer commitment until evidence quality improves [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe overall behavior suggests that agentic orchestration, rather than a single enhancement, drives the observed gains. Multi-step decomposition supports reliable routing and tool selection, consistent with empirical improvements attributed to sequential agent collaboration [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. At the same time, the presence of navigation actions changes the evaluation target from producing plausible text to producing verifiable interactions, aligning with the view of retrieval as an evolving information state [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e"},{"header":"9 Limitations and Future Work","content":"\u003cp\u003eSeveral limitations remain. First, anchor robustness depends on the stability of documentation structure; aggressive redesigns can invalidate selectors and require re-ingestion or adaptive selector learning. Second, version tagging assumes reliable release identifiers in URLs or repository layout; heterogeneous documentation ecosystems may require additional heuristics or explicit user input. Third, just-in-time acquisition introduces operational considerations such as rate limits, authentication, and compliance constraints that vary across documentation providers. Fourth, the prototype evaluation emphasizes documentation QA tasks and does not yet cover broader workflows such as automated ticket triage or code change generation.\u003c/p\u003e \u003cp\u003eFuture work includes three directions. Tool synthesis can be extended so that acquisition routines are generated for previously unseen sites, building on agent tool creation frameworks [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. Retrieval structure can incorporate richer graphs that link pages, API entities, and changelog events, drawing inspiration from self-correcting graph RAG designs that improve reliability in high-stakes settings [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Finally, hierarchical retrieval strategies can be integrated more deeply to constrain candidate sets to consistent documentation branches, as proposed in recent hierarchical retrieval research [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].\u003c/p\u003e"},{"header":"10 Conclusion","content":"\u003cp\u003eDocNavigator advances documentation-centric retrieval-augmented generation by introducing an agentic framework that combines version isolation, semantic comparison, navigation actions, and closed-loop correction. A Version-Aware Vector Space prevents cross-version contamination while enabling controlled multi-version comparison. A PDCA loop checks evidence quality prior to generation and escalates to just-in-time acquisition when indexed content is stale or insufficient. Deep-link navigation actions shift the interaction model toward verifiable browsing, reducing verification overhead and improving reliability. The resulting approach addresses temporal blindness and passive retrieval, offering a practical path toward trustworthy assistance in rapidly evolving documentation environments.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eConflict of interest\u003c/h2\u003e \u003cp\u003eNo competing interests are declared.\u003c/p\u003e \u003c/p\u003e\u003cp\u003e \u003ch2\u003eUse of generative AI tools\u003c/h2\u003e \u003cp\u003eGenerative AI assistance was used for language drafting and editing during manuscript preparation. All technical claims, structure, and final wording were reviewed and validated by the author, who retains full responsibility for the content.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eNo external funding is declared for this study.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eConceptualization, methodology, system design, and manuscript preparation: Vignesh Chinthakuntla.\u003c/p\u003e\u003ch2\u003eData availability\u003c/h2\u003e \u003cp\u003eNo new datasets were created or analyzed in this study.\u003c/p\u003e\u003ch2\u003eCode availability\u003c/h2\u003e \u003cp\u003ePrototype code is not publicly released in the current version of this manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eYan, S-Q., Gu, J-C., Zhu, Y., \u0026amp; Ling, Z-H. (2024). \u003cem\u003eCorrective Retrieval Augmented Generation arXiv\u003c/em\u003e arXiv:2401.15884.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, W., Liao, J., Li, N., Du, K., \u0026amp; Lin, J. (2025). \u003cem\u003eAgentic Information Retrieval arXiv\u003c/em\u003e arXiv:2410.09713.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbuelsaad, T., Akkil, D., Dey, P., Jagmohan, A., Vempaty, A., \u0026amp; Kokku R 2024 Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems. \u003cem\u003earXiv\u003c/em\u003e arXiv:240713032.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, Y., Sun, R., Chen, Y., Pfister, T., Zhang, R., \u0026amp; Arik, S. O. (2024). Chain of Agents: Large Language Models Collaborating on Long-Context Tasks. \u003cem\u003earXiv\u003c/em\u003e arXiv:2406.02818.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, Y., Zhang, W., Yang, Y., Huang, W-C., Wu, Y., Luo, J., Bei, Y., Zou, H. P., Luo, X., Zhao, Y., Chan, C., Chen, Y., Deng, Z., Li, Y., Zheng, H-T., Li, D., Jiang, R., Zhang, M., Song, Y., \u0026amp; Yu, P. S. (2025). Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs. \u003cem\u003eFindings of the Association for Computational Linguistics: EMNLP\u003c/em\u003e, \u003cem\u003e2025\u003c/em\u003e, 12120\u0026ndash;12145.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuwiler, D., Stockinger, K., \u0026amp; F\u0026uuml;rst J 2025 VersionRAG: Version-Aware Retrieval-Augmented Generation for Evolving Documents. \u003cem\u003earXiv\u003c/em\u003e arXiv:2510.08109.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTytarenko, M., Rutar, T. W., Lengauer, S., \u0026amp; Schreck, T. (2025). LLM-Agent Support for Two-Document Comparison Using Hierarchical Topic Maps. \u003cem\u003eVISxGenAI Workshop Papers\u003c/em\u003e Available online: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003evisxgenai.github.io/subs-2025/1530/1530-doc.pdf\u003c/span\u003e\u003cspan address=\"http://visxgenai.github.io/subs-2025/1530/1530-doc.pdf\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGatto, J., Sharif, O., Seegmiller, P., Bohlman, P., \u0026amp; Preum, S. M. (2023). Text Encoders Lack Knowledge: Leveraging Generative LLMs for Domain-Specific Semantic Textual Similarity. \u003cem\u003eProceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)\u003c/em\u003e 277\u0026ndash;288.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu, J., Shang, Z., Ke, W., Wang, P., Luo, Z., Liu, J., Li, G., \u0026amp; Li, Y. 2025 LLM-Guided Semantic-Aware Clustering for Topic Modeling. \u003cem\u003eProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\u003c/em\u003e 18420\u0026ndash;18435.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYao, C., \u0026amp; Fujita, S. (2024). Adaptive Control of Retrieval-Augmented Generation for Large Language Models Through Reflective Tags. \u003cem\u003eElectronics\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(23), 4643.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, J., Zhang, T., Xu, C., Lam, M., Chen, C., Cheng, J., Jiang, S., Yan, J., \u0026amp; Liu Y 2025 DRIP: Decompositional Reasoning for Robust and Iterative Planning with LLM Agents. \u003cem\u003eOpenReview\u003c/em\u003e ICLR 2026 submission, paper ID G6NndZXhU4.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBai, J., Ning, D., You, Y., \u0026amp; Chen, J. 2026 LoopRAG: A Closed-Loop Multi-Agent RAG Framework. \u003cem\u003eBuildings\u003c/em\u003e 16(1): 196.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHarishankar, A., Subramanian, E. S., Kumar, M. P., \u0026amp; Kousik, R. S. (2025). AI-Powered Automated Browser Navigation Agent Using a Large Language Model. \u003cem\u003eInternational Journal For Multidisciplinary Research\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e(2), 40489.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGupta, N., Chang, W-C., Bui, N., Hsieh, C-J., \u0026amp; Dhillon, I. S. 2025 LLM-guided Hierarchical Retrieval. \u003cem\u003earXiv\u003c/em\u003e arXiv:2510.13217.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eW\u0026ouml;lflein, G., Ferber, D., Truhn, D., Arandjelovic, O., \u0026amp; Kather, J. N. (2025). LLM Agents Making Agent Tools. \u003cem\u003eProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\u003c/em\u003e 26092\u0026ndash;26130.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu, Y., Xuan, W., Zhou, Q., Li, Z., Li, Y., Hu, J., \u0026amp; Fang F 2025 A self-correcting Agentic Graph RAG for clinical decision support in hepatology. \u003cem\u003eFrontiers in Medicine\u003c/em\u003e 12: 1716327.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Agentic RAG, version-controlled documentation, PDCA loop, deep-link navigation, semantic comparison, retrieval evaluation","lastPublishedDoi":"10.21203/rs.3.rs-8854256/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8854256/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eRetrieval-augmented generation (RAG) improves factual grounding by conditioning generation on external text, yet common deployments remain vulnerable in domains where documentation evolves rapidly. Two failure modes are particularly disruptive: temporal blindness, where retrieval mixes incompatible versions of the same manual, and passive retrieval, where returned snippets omit navigational context needed for efficient verification. This paper presents DocNavigator, an agentic framework that elevates RAG from static snippet delivery to goal-directed navigation and version-conditioned semantic comparison. A Version-Aware Vector Space partitions embeddings by explicit release metadata and supports cross-version alignment for concept-level change detection. A PDCA (Plan-Do-Check-Act) agent loop evaluates retrieval relevance before generation, applies reflective confidence signals to detect stale or mismatched evidence, and triggers just-in-time acquisition when indexed content is incomplete. In place of purely textual answers, a deep-link navigator returns actionable browser commands composed of URL targets and scroll anchors, enabling immediate inspection at the source. A prototype evaluation on versioned documentation tasks indicates meaningful reductions in time-to-insight and a marked decrease in hallucinations associated with deprecated interfaces, highlighting the value of combining navigation actions, version isolation, and closed-loop correction in documentation-centric RAG systems.\u003c/p\u003e","manuscriptTitle":"Agentic Rag for Version-controlled Documentation: A Pdca-driven Navigation and Semantic Comparison Framework","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-12 17:06:03","doi":"10.21203/rs.3.rs-8854256/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"9d965786-bfa7-417d-9ab9-6dea41107281","owner":[],"postedDate":"February 12th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-03-26T18:40:43+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-12 17:06:03","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8854256","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8854256","identity":"rs-8854256","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00