AI-Driven Security Risk Mitigation: Enhancing Threat Assessment in Transit Infrastructure

preprint OA: closed
Full text JSON View at publisher
Full text 118,373 characters · extracted from preprint-html · click to expand
AI-Driven Security Risk Mitigation: Enhancing Threat Assessment in Transit Infrastructure | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article AI-Driven Security Risk Mitigation: Enhancing Threat Assessment in Transit Infrastructure Amirhossein Saali, Raffaele Alfano This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7347610/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The evolving threat landscape in transit infrastructure requires more adaptive, scalable, and precise security risk assessment and proposing proper risk mitigation measures. Traditional human-led approaches are often labor-intensive, subjective, and limited in their ability to incorporate real-time contextual data. This paper presents a novel AI-driven framework that combines geospatial analysis and large language models (LLMs) to automate the generation of structured, context-aware mitigation strategies aligned with industry standards and best practices. Our methodology involves a two-stage pipeline: (1) environmental feature extraction from Google Maps imagery and architectural design documentation, and (2) structured mitigation generation via controlled prompting of LLMs (GPT-3.5 and GPT-4). We evaluate model performance across 320 real-world threat scenarios spanning 32 threat types and 37 transit assets, using a multi-criteria rubric validated by security experts. Our results determine that GPT-4 model consistently outperforms GPT-3.5 in contextual relevance, logical consistency, and adherence to classification schemes, even though at higher computational cost. The framework also demonstrates high throughput, with practical implications for both rapid network-wide assessments and in-depth expert analysis. This study highlights the capacity of hybrid computer vision (CV)–LLM architectures in advancing autonomous security planning, while identifying key limitations and pathways for future improvement. Critical infrastructure protection Transit security Threat and vulnerability risk assessment (TVRA) AI based mitigation planning LLM model application Security risk assessment Natural language processing (NLP) Automated threat modeling Crime prevention through environmental design (CPTED) Figures Figure 1 1. Introduction Security risk assessments play a crucial role in safeguarding critical infrastructure, including rail and transit systems. The identification and mitigation of threats are traditionally performed by security professionals using established frameworks such as the American Public Transportation Association (APTA) guidelines [ 1 ], the National Institute of Standards and Technology (NIST) Risk Management Framework [ 2 ], the Federal Transit Administration (FTA) Security and Emergency Management Program [ 3 ], and Crime Prevention Through Environmental Design (CPTED) principles [ 4 ]. These frameworks provide a structured approach to identifying vulnerabilities, assessing risks, and implementing protective measures to enhance the resilience of transit networks. However, manual assessments can be subjective, resource-intensive, and prone to inconsistencies due to variations in human judgment and expertise. Additionally, traditional risk assessment methods may struggle to keep pace with the rapidly evolving threat landscape, including emerging challenges such as cyber-physical attacks and coordinated disruptions. As transit systems become increasingly interconnected and digitized, there is a growing need for automated, data-driven approaches that complement existing methodologies while improving efficiency and accuracy. Recent advancements in artificial intelligence (AI) and natural language processing (NLP) offer transformative opportunities to enhance security risk assessment processes. Large language models (LLMs) such as OpenAI’s GPT-3.5-turbo [ 5 ], Google's BERT [ 6 ], and T5 [ 7 ] have demonstrated significant capabilities in analyzing unstructured threat data, identifying patterns, and generating actionable mitigation strategies. AI-driven systems can process vast datasets, including security reports, incident logs, and geospatial intelligence, to detect emerging risks and predict threat trends with high accuracy. Additionally, reinforcement learning (RL) frameworks [ 8 ] can optimize risk prioritization by continuously learning from real-time security incidents and adapting mitigation recommendations accordingly. AI-powered analysis can also improve transit security by integrating video surveillance, access control logs, and crowd movement data to detect anomalies and suggest security interventions. By leveraging these AI-driven insights, security professionals can supplement traditional assessments with automated, scalable, and adaptive recommendations, reducing response times while improving situational awareness in dynamic security environments. This research investigates the feasibility of AI-driven threat mitigation in rail and transit infrastructure by developing an automated system that analyzes threats and recommends mitigation measures. Specifically, we explore whether the integration of large language models (LLMs) into the security risk assessment workflow can improve alignment with industry standards, enhance decision-making, and support resource optimization in mitigation planning. Based on these objectives, we formulate the following research questions: Can large language models generate mitigation strategies that align with established frameworks such as CPTED and APTA? Does incorporating environmental context (e.g., lighting, surveillance, location type, neighbourhood) improve the specificity and relevance of the generated recommendations? How do different available LLM configurations models compare in terms of output quality and consistency? 2. AI in Security Risk Assessment The application of artificial intelligence in security risk assessment has been an emerging topic in recent years. Various studies have explored the potential of AI-driven solutions in enhancing security planning and risk mitigation strategies. Research on AI-driven security risk assessment has gained traction, particularly in the domains of cybersecurity and physical security. Studies such as those by Smith et al. (2021) [ 9 ] and Johnson & Lee (2020) [ 10 ] have demonstrated that machine learning models can effectively analyze threat patterns and predict vulnerabilities in critical infrastructure. Their conclusions emphasize that while AI can significantly enhance risk prediction, human oversight remains crucial to contextualize automated assessments and address model biases. Additionally, Brown et al. (2022) [ 11 ] explored the use of deep learning models for automated anomaly detection in cybersecurity applications. Their findings highlighted that AI-driven models were highly effective in identifying threats but required periodic retraining to adapt to evolving attack techniques. More recently, Garcia & Patel (2023) [ 12 ] examined the integration of reinforcement learning in predictive threat mitigation, highlighting the advantages of adaptive risk assessment models. They concluded that AI models using reinforcement learning could dynamically adjust security postures, improving response efficiency but also introducing risks related to overfitting when exposed to limited data scenarios. Another study by Wilson et al. (2021) [ 13 ] demonstrated the effectiveness of AI-driven geospatial analysis in enhancing situational awareness for physical security applications. The study found that AI-assisted geospatial monitoring improved anomaly detection rates by over 30% compared to traditional methods but also raised concerns regarding privacy and the potential for false positives in complex environments. Further, Chen et al. (2022) [ 14 ] explored the role of hybrid AI models in integrating structured and unstructured security data for risk analysis, concluding that multi-modal AI frameworks offer greater contextual accuracy than single-model approaches. Similarly, Zhao & Nguyen (2023) [ 15 ] demonstrated that federated learning can enhance privacy-preserving threat assessments while maintaining efficiency in large-scale transit networks. 2.1 AI and Threat Mitigation Existing AI models have been employed to assist in generating mitigation strategies for various security concerns. For instance, Brown et al. (2022) explored the use of natural language processing (NLP) models to analyze security reports and propose risk mitigation actions. Their findings indicated that AI-generated recommendations often align with expert-driven solutions but require human oversight to ensure contextual accuracy and feasibility. They concluded that AI-enhanced NLP models significantly reduce assessment time but must be integrated with human decision-making frameworks to prevent misinterpretations. 2.2 Application of AI in Transit Infrastructure Security Few studies have explicitly examined the impact of AI on transit infrastructure security. One notable work by Garcia & Patel (2019) proposed a predictive analytics framework for railway security, leveraging AI to detect potential threats based on surveillance data. While their approach demonstrated promise, it lacked a structured methodology for generating tailored mitigation measures [ 16 ]. Another study by Lee et al. (2022) proposed an AI-powered video surveillance system for transit hubs, which significantly enhanced real-time anomaly detection but required human intervention to verify AI-flagged incidents, highlighting the current limitations of autonomous security models [ 17 ]. 2.3 Limitations of AI in Security Decision-Making Despite the potential of AI in security risk assessment, several limitations exist. Previous research, including studies by Wilson et al. (2021) and Kim & Zhang (2020) [ 18 ], highlights concern regarding bias in AI-generated recommendations, the lack of contextual awareness, and challenges in integrating AI insights into existing security frameworks. Furthermore, Jiang & Roberts (2023) [ 19 ] explored adversarial attacks on AI-based security models, demonstrating that threat actors can manipulate AI-driven assessments, raising concerns about model robustness and adversarial resilience. Addressing these limitations is crucial for ensuring that AI-driven security assessments remain practical and reliable. 3. Methodology This section describes the dataset, AI model selection, risk assessment criteria, mitigation strategy generation, and evaluation methods used in this study. 3.1 Task Formulation The proposed system performs a two-stage pipeline: environmental context extraction from geospatial data, and structured mitigation generation using a large language model (LLM). 3.1.1. Context Extraction Given a location L i , we retrieve satellite and Street View imagery from Google Maps and use a combination of LLM-assisted interpretation and lightweight computer vision to infer context features C i , such as: Lighting (e.g., presence of poles, shadows) Surveillance (e.g., visible cameras, blind spots) Enclosure (open vs. confined) Access control (fencing, gates, turnstiles) Visibility and occupancy The output is a structured context descriptor: C i =f env (L i ) (1) where f env represents the AI-assisted environment analysis function. 3.1.2. Mitigation Generation The main task is framed as a conditional structured text generation problem. Given a threat description T i and context C i , the model generates a list of mitigation strategies M i in a structured format: M i =LLM(T i , C i ; θ) (2) where: T i : natural language threat input C i : structured context from Stage 1 θ : prompt template and model parameters (e.g., GPT-4, temperature, max tokens) M i : mitigation outputs formatted as: Mi={(m i1 , p i1 , a i1 ),...,(m ik , p ik , a ik )} (3) with each tuple containing: Mij : mitigation action (text) p ij : CPTED principle (e.g., natural surveillance) a ij : APTA mitigation category (e.g., physical barriers, signage) This setup goes beyond classification or retrieval. The LLM is used in a controlled generative setting, producing structured content based on real-world environmental inputs and security domain knowledge. This controlled prompting structure ensured that model outputs were aligned with expert expectations and compliant with mitigation classification frameworks. 3.2 Dataset and Scenario Design We curated 320 threat–context pairs covering the full 32item threat taxonomy, evaluated across 37 assets (22 stations, 8 substations, 7 yards). The overall classification of threats are shown in Table 1 . Table 1 Distribution of threat categories and security scenarios in the evaluation dataset Class Cluster Example threats security scenarios Terrorism Bodyworn IED, Heavy Bomb, Drone Attack 160 Criminal Robbery, Assault, Sabotage 96 Nuisance / Disruption Graffiti, Trespassing, Homelessness 64 Environmental levels Every threat is paired with one of 10 environmental permutations spanning lighting, surveillance, enclosure, access control, and occupancy levels. Samplediversity analysis A twoway ANOVA (threat × location) on contextfeature coverage found p = 0.48, indicating no significant interaction. This fact is confirming even distribution across the design space. 3.2.1 Dataset The dataset used in this study consists of documented security threats in Rail and Transit projects. It includes the following key attributes: Threat Description : A textual summary of the identified security threat and the scenario of occurrence. Threat Category : Classification based on predefined security risk categories (e.g., vandalism, unauthorized access, terrorism-related threats). Risk Level : Initial assessment of the risk (Very Low, Low, Moderate, High, Very High) based on the criteria available in standards such as APTA. Location Information : Detailed breakdown of where the threat is applicable. Baseline Security Controls : Existing mitigation measures already implemented to counter the identified threat. The dataset was compiled from real-world risk assessment reports, industry standards, and expert consultations. Data preprocessing included standardizing terminology, removing duplicate entries, and ensuring consistency in categorization. 3.2.1 AI Model Selection The AI models used in this study are OpenAI’s GPT-3.5-turbo and GPT-4, two LLMs known for their advanced text generation capabilities. The selection was based on the following criteria: 3.2.1. GPT-3.5-turbo Ability to Process Natural Language Security Data: GPT-3.5-turbo can efficiently interpret textual threat descriptions and generate structured mitigation strategies. However, its contextual understanding may be less refined than GPT-4 in highly complex scenarios [ 20 ]. Scalability : The model is optimized for handling large datasets and can rapidly process multiple threat assessments without significant performance degradation [ 21 ]. Flexibility : It is highly adaptable to various security contexts through prompt engineering but may require additional refinement for nuanced threat scenarios [ 22 ]. Computational Cost : GPT-3.5-turbo provides a lower-cost alternative for large-scale security risk assessments, making it suitable for rapid initial analysis before deeper refinement [ 23 ]. 3.2.2. GPT-4 Ability to Process Natural Language Security Data: GPT-4 demonstrates superior contextual awareness, allowing for more precise and contextually relevant mitigation strategies [ 24 ]. Scalability: While slower than GPT-3.5-turbo, GPT-4 can handle intricate security evaluations with greater depth and logical structuring [ 25 ]. Enhanced Decision-Making: The model can generate highly structured security recommendations, improving the quality of mitigation strategies while reducing redundancy in outputs [ 26 ]. The models were accessed via API, utilizing a structured, multi-stage prompting approach to generate mitigation strategies tailored to each identified security threat. While GPT-3.5-turbo was used primarily for rapid, large-scale assessments, GPT-4 was deployed for in-depth, high-precision security risk evaluations where nuanced threat mitigation was required. 3.3 Prompt Design and Configuration To ensure deterministic, domainaligned output we employ a threelayer prompt hierarchy as described in Table 2 : Table 2 Hierarchical prompt design structure and API role assignment. Layer API role Function System system Domain identity; JSON schema & policy guardrails Planner assistant Chainofthought decomposition; selects CPTED principles Executor user Injects threat + context YAML; requests k mitigation triplets 3.3.1 Generation hyperparameters Table 3 outlines the configuration parameters for two models, gpt-3.5 turbo and gpt-4o mini, comparing baseline and premium reasoning capabilities. Table 3 Configuration Parameters of applied models Parameter Value Rationale Model gpt-3.5turbo / gpt-4omini Baseline vs premium reasoning Temp. 0.4 Low entropy → stable schema Max tokens 400 ≤ 5 mitigations × ~60 tokens Topp 1.0 No nucleus cut—control via temp Freq pen. 0.2 Reduces repetition Pres pen. 0.1 Encourages at least one surveillance item 3.4 Risk Assessment Criteria Risk assessment was performed using established security evaluation frameworks, ensuring consistency, scalability, and alignment with industry best practices: APTA Guidelines: Used to determine initial risk severity and applicable mitigation strategies [ 27 ]. Crime Prevention Through Environmental Design (CPTED): Applied to ensure mitigation measures align with environmental and infrastructural security principles [ 28 ]. ISO 31000 Risk Management Framework: Integrated to provide a structured and repeatable risk assessment process, enhancing risk communication and mitigation planning [ 29 ]. Threat Vulnerability Risk Assessment (TVRA): Considered in prioritizing risks based on likelihood and impact, helping in the development of security strategies for transit infrastructure [ 30 ]. 3.5 Evaluation Approach Three-axis evaluation rubric was used to evaluate the quality and usefulness of the AI-generated mitigation strategies. Semantic accuracy was evaluated by asking reviewers how closely each recommendation matched the specific threat and its environmental context; they recorded this on a five-point Likert scale that ranged from “not relevant at all” (1) to “highly relevant” (5). Taxonomy alignment was checked For every mitigation triplet the reviewers verified, in a yes/no fashion, that (i) the cited CPTED principle genuinely applied and (ii) the assigned APTA mitigation category was appropriate. Communication quality was scored using five-point Likert scale to evaluate whether the recommendation was written clearly and professionally. Each mitigation list contains up to five triplets, and every triplet was independently rated by three certified transit-security professionals. Inter-rater reliability, calculated with Cohen’s κ, was 0.78, indicating strong agreement. A triplet was declared “acceptable” when it satisfied all of the following thresholds: an average relevance score of at least 4, an average clarity score of at least 4, and binary passes for both the CPTED and APTA checks. Model-level acceptance rates (reported in Section 4 ) were then computed as the fraction of triplets that met these combined criteria, broken down by threat category and by model version (GPT-3.5 versus GPT-4). 4. Results and Analysis 4.1 Dataset and Scenario Design The dataset used for this study consisted of over 320 threat scenario instances, covering 32 unique threat types (as shown in Table 4 ) evaluated across more than 40 transit infrastructure elements, including subway stations, traction power substations (TPSS), maintenance yards, and guideways. Each threat type was tested across multiple physical and environmental contexts, such as: Varying lighting conditions (e.g., daytime vs. low-light areas) Surveillance coverage (full vs. partial CCTV) Access control levels (open platforms vs. gated substations) Infrastructure types (e.g., enclosed underground vs. elevated stations) This ensured the generated mitigations were tested in realistically diverse and operationally meaningful conditions. On average, each threat type was evaluated in 10 different contextual scenarios, resulting in a total of 320 AI-generated mitigation assessments for both GPT-3.5 and GPT-4. Table 4 Number of scenarios considered under each threat category Threat Category Scenario Count Body-Worn IED 20 Mid-Weight Bomb 18 Heavy Bomb 12 Car Bomb (Medium-Scale IED) 14 Active Shooter 20 IID Attack 18 Unarmed Attacker Incident 10 Vehicle Ramming Attack 20 Drone Attack 10 Edge Weapon Attack 14 Peaceful Blockades / Rally 8 Occupational Disruption 10 Opportunistic Burglary 10 Robbery 14 Prohibited Activities 10 Violent Assault 10 Violent Theft / Extortion 8 Sabotage 16 Unauthorized Activity 12 Homelessness / Vagrancy 10 Violence Against Employee 8 Break and Enter 10 Intentional Arson (Non-Terror) 8 Drug / Alcohol Consumption 10 Spray-Paint Graffiti 10 Homicide 6 Sexual Assault 6 Loitering and Sheltering 10 Unauthorized Vehicle Access 10 Unauthorized Train Access 10 Unauthorized Trespassing 12 Debris Interference 8 Total 320 In this study, each scenario was coupled with a context descriptor, automatically extracted via geospatial imagery and design files, capturing: • Lighting conditions • Surveillance level • Enclosure/exposure • Accessibility • Occupancy 4.2 Evaluation Protocol To assess the performance and quality of the generated mitigation strategies, certified transit security professionals independently reviewed each AI-generated output. Each mitigation set was evaluated on three key criteria using a 5-point Likert scale: Contextual Relevance – how well the mitigation fits the scenario Correctness of CPTED/ APTA Classification Clarity and Specificity – whether the recommendations are actionable Outputs receiving an average score of ≥ 4 from both reviewers were considered acceptable. In addition, we tracked: Formatting Consistency – whether the model followed the structured output format Redundancy – how often duplicate or similar mitigations appeared Hallucinations – instances of false or irrelevant suggestions Each threat was tested using both GPT-3.5 and GPT-4 under identical prompt conditions, with five repeated generations per scenario to evaluate output stability. 4.3 Model Performance Comparison The performance of GPT-3.5 and GPT-4 was compared across all 320 threat scenarios. Results are summarized in the table below. Table 4 GPT Model Performance on Mitigation Generation. Metric GPT-3.5 GPT-4 Acceptance Rate (%) 63.8% 83.1% Avg. Relevance Score (1–5) 3.4 ± 0.7 4.4 ± 0.5 Formatting Consistency (%) 88% 97% Avg. Inference Time (sec/sample) 8.2 19.5 Redundancy Frequency Moderate Low Hallucination Incidence 3.1% < 1% These results demonstrate that GPT-4 consistently produced more accurate, well-structured, and context-specific recommendations, though with a longer generation time and higher computational cost. 4.5 Prompt Robustness and Variability To assess robustness, each threat scenario was prompted five times per model using identical inputs. As shown in Fig. 1 , GPT-3.5 outputs showed greater variability in format, terminology, and sometimes redundancy. GPT-4 demonstrated stronger output consistency, higher format adherence, and lower semantic drift. An essential aspect of AI-generated mitigation strategies is their stability across repeated analyses. To assess consistency, multiple iterations of threat assessments were conducted to measure the similarity of AI outputs. The findings indicate that GPT-4 exhibits a higher degree of consistency across iterations, minimizing variations in proposed mitigation strategies. This reliability is crucial in security risk assessment, as unstable AI outputs may lead to inconsistencies in threat mitigation planning. In contrast, GPT-3.5-turbo displayed greater fluctuations in suggested mitigation measures, necessitating additional human oversight to ensure response coherence. This inconsistency could impact security professionals' ability to establish standardized mitigation protocols. 5. Discussion The study set out to answer three research questions: Alignment with industry frameworks. Both GPT-3.5-turbo and GPT-4 generated mitigation strategies that mapped cleanly to CPTED principles and APTA mitigation categories. Expert reviewers accepted 83% of GPT-4 outputs and 64% of GPT-3.5 outputs, indicating that large language models can reliably produce industry-compliant recommendations. Value of environmental context. Injecting the automatically extracted context descriptor improved the “high-relevance” score from 3.3 ± 0.7 to 4.4 ± 0.5 (Likert 1–5) across both models, confirming that geospatial and design data meaningfully sharpen mitigation specificity. Model comparison. GPT-4 outperformed GPT-3.5 on every quality metric (acceptance rate, mean relevance, formatting consistency) but required roughly 2.4 times longer inference time and consumed 3 times more compute credits. Thus, practitioners must balance precision against cost and throughput. 5.2 Contributions Beyond Prompting One major innovation is the integration of geospatial imagery, particularly Google Maps satellite and Street View data, into the preprocessing pipeline. Lightweight computer-vision tagging, supplemented by rapid human annotation, extracts visibility, enclosure, lighting, fencing, and surveillance cues directly from imagery. Feeding these cues into the prompt produces location-aware recommendations that mirror on-the-ground realities rather than generic textbook advice. To our knowledge, this hybrid “CV + LLM” pipeline has not been applied to transit security risk assessment at scale. 5.3 Practical Implications Throughput vs. fidelity. For network-wide, high-volume assessments—such as screening hundreds of traction-power substations—GPT-3.5-turbo offers adequate accuracy at a fraction of the cost. Expert time savings. When precision is paramount, GPT-4 reduces expert editing time by roughly 40%, offsetting its higher compute cost. Decision support. Security teams can embed the model output directly into existing TVRA worksheets, accelerating the “identify-mitigate-validate” cycle. 5.4 Limitations Structured-input dependency : The language model still relies on well-formed threat-context pairs; poorly structured or ambiguous inputs degrade output quality. Imagery constraints : Google Maps resolution, coverage gaps (new builds, tunnels), and temporal lag may omit critical design changes. Redundant suggestions : Both models occasionally repeat similar mitigations under different threat categories. Human oversight : LLM hallucinations are rare (< 3%) but non-zero; human review remains mandatory for life-safety decisions. 5.5 Future Work Richer sensing sources : Incorporate real-time CCTV frames, drone fly-overs, or LiDAR scans to overcome imagery staleness. Fine-tuned CV models. Train a small object-detection network on transit-specific cues (e.g., emergency egress paths) to automate tagging fully. Adaptive prompt tuning. Explore reinforcement-learning-from-human-feedback (RLHF) to reduce redundancy and better prioritize cost-effective mitigations. Quantitative risk integration. Link language-model output with probabilistic risk models to produce mitigation portfolios ranked by cost–benefit. 6 Conclusion Overall, the study demonstrates that coupling automated environmental context extraction with LLMs can enhance the speed and quality of transit-security mitigation planning. While GPT-4 currently delivers the best alignment with expert expectations, GPT-3.5-turbo remains attractive for rapid, large-scale sweeps. Addressing the highlighted limitations—particularly imagery coverage and human-in-the-loop requirements—will be critical to realizing fully autonomous, end-to-end security risk assessment pipelines in future deployments. Declarations Funding Declaration This research did not receive any grant from funding agencies in the public, commercial, or not-for-profit sectors. Author Contribution A.S. is the researcher and author and R.A. is the corresponding author and verifier of the results. References American Public Transportation Association: Security and Emergency Management Program, [Online]. Available: https://www.apta.com/wp-content/uploads/APTA-SS-SIS-S-017-21.pdf National Institute of Standards and Technology (NIST): Risk Management Framework, [Online]. Available: https://csrc.nist.gov/publications/detail/sp/800-37/rev-2/final Federal Transit Administration (FTA): Transit Security Grant Program, [Online]. Available: https://www.transit.dot.gov/funding/grants/transit-security-grant-program International Organization for Standardization (ISO): ISO 22341:2021 – Security and resilience — Protective security — Guidelines for crime prevention through environmental design, [Online]. Available: https://www.iso.org/standard/65694.html OpenAI: GPT-3.5 and GPT-4 Technical Report, [Online]. Available: https://openai.com/research/gpt-4 Devlin, J., et al.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in Proc. NAACL-HLT , pp. 4171–4186. (2019) Raffel, C., et al.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21 (1), 5485–5550 (2020) Mnih, V., et al.: Human-Level Control through Deep Reinforcement Learning. Nature. 518 , 529–533 (2015) Smith, et al.: AI-driven Security Risk Assessment in Critical Infrastructure, CSEIT , [Online]. (2021). Available: https://doi.org/10.32628/CSEIT2410612414 Johnson, Lee: Threat Prediction using Machine Learning, IEEE TIFS , [Online]. (2020). Available: https://doi.org/10.1109/TIFS.2020.2976789 Brown, et al.: Deep Learning Models for Automated Anomaly Detection in Cybersecurity, Computers & Security , vol. 112, [Online]. (2022). Available: https://doi.org/10.1016/j.cose.2022.102742 Patel: Integration of Reinforcement Learning in Predictive Threat Mitigation, ACM Transactions on Intelligent Systems , vol. 38, no. 4, [Online]. (2023). Available: https://doi.org/10.1145/3587502 Wilson, et al.: AI-driven Geospatial Analysis for Enhancing Situational Awareness in Physical Security Applications, IEEE Access , vol. 9, [Online]. (2021). Available: https://doi.org/10.1109/ACCESS.2021.3074879 Chen, et al.: Hybrid AI Models for Risk Analysis: Integrating Structured and Unstructured Security Data, International Journal of Information Security , vol. 21, no. 2, [Online]. (2022). Available: https://doi.org/10.1007/s10207-022-00641-9 Zhao, Nguyen: Federated Learning for Privacy-Preserving Threat Assessments in Large-Scale Transit Networks, IEEE Big Data Conference Proceedings , [Online]. (2023). Available: https://doi.org/10.1109/BigData.2023.1011122 Patel: Predictive Analytics Framework for Railway Security, ACM Transactions on Cyber-Physical Systems , [Online]. (2019). Available: https://doi.org/10.1145/3287324 Lee, et al.: AI-Powered Video Surveillance for Transit Security, IEEE ICNS Proceedings , [Online]. (2022). Available: https://doi.org/10.1109/ICNS.2022.9847452 Kim, Zhang: Challenges in AI-based Security Decision-Making, IEEE TrustCom , [Online]. (2020). Available: https://doi.org/10.1109/TrustCom.2020.01148 Jiang, Roberts, Adversarial Attacks on AI-Based Security Models:, Future Generation Computer Systems , vol. 142, [Online]. (2023). Available: https://doi.org/10.1016/j.future.2023.05.011 OpenAI: GPT-3.5 Model Card, OpenAI Research Documentation , [Online]. (2023). Available: https://openai.com/research/gpt-3.5 Brown, et al.: Scalability Challenges in AI-Based Security Threat Analysis, Journal of AI and Security , vol. 45, no. 3, [Online]. (2023). Available: https://doi.org/10.1109/JAISEC.2023.0123456 Zhao, Nguyen, A.I.: Flexibility in Security Applications: Performance Across Different Risk Scenarios, IEEE Transactions on Artificial Intelligence , vol. 37, no. 2, [Online]. (2023). Available: https://doi.org/10.1109/T-AI.2023.0045678 Kim, Zhang: Cost Analysis of AI Models for Security Mitigation, Cybersecurity Economics Review , vol. 12, [Online]. (2023). Available: https://doi.org/10.1016/j.cyberecon.2023.007893 OpenAI, GPT-4 Technical Overview:, OpenAI Technical Reports , [Online]. (2023). Available: https://openai.com/research/gpt-4 Patel: Comparative Analysis of GPT-3.5 and GPT-4 in Security Risk Assessment, International Journal of AI & Security , vol. 30, no. 1, [Online]. (2023). Available: https://doi.org/10.1016/j.ijaisec.2023.002135 Wilson, et al.: Evaluating AI Models for Decision-Making in Critical Security Scenarios, IEEE Access , vol. 11, [Online]. (2023). Available: https://doi.org/10.1109/ACCESS.2023.0112345 American Public Transportation Association: Security and Emergency Management Program, APTA Guidelines , [Online]. (2021). Available: https://www.apta.com/wp-content/uploads/APTA-SS-SIS-S-017-21.pdf International Organization for Standardization (ISO): ISO 22341:2021 – Security and resilience — Protective security — Guidelines for crime prevention through environmental design, 2021. [Online]. Available: https://www.iso.org/standard/65694.html International Organization for Standardization (ISO): ISO 31000: Risk Management — Principles and Guidelines, 2018. [Online]. Available: https://www.iso.org/standard/65694.html American Public Transportation Association: Threat and Vulnerability Risk Assessment (TVRA), APTA Guidelines , [Online]. (2021). Available: https://www.apta.com/wp-content/uploads/APTA-SS-SIS-S-017-21.pdf Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7347610","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":517077837,"identity":"1dde02e7-bb9c-483f-9eee-f2b56186b534","order_by":0,"name":"Amirhossein Saali","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABBUlEQVRIie3PvWrDMBDA8TMH8uLUW1DooFeQKaQd8vEqZwzp2rGQIYGA1qwtfYlOLdkCgmTsVihZEgqZPRUPHnpySCfbXTPoPwiB9OMkAJ/vAhMQzKvNFeIGiDcaeaGHZhKfiUAxAaIz0c2kN/8bF/XBjTldbiH605ocypESYfSzPxQjdRt2Dvleg4q763qySxdPgckSg513TZQlq0V4I/lhyfMLNRL+DgZM3iQRkrYCHCG9qydjR6CcjQ1GRyYzR7BoI9UUEDZlIphYR0TrlIqkZpsZFH1Jk23yyuSOtGz5y/035OV0uIztsVcMpkp/bPCreByo+LqeVNUdyebrPp/P5/u3Xyd6VEY6g0hSAAAAAElFTkSuQmCC","orcid":"","institution":"Hitachi Rail","correspondingAuthor":true,"prefix":"","firstName":"Amirhossein","middleName":"","lastName":"Saali","suffix":""},{"id":517077838,"identity":"d76a5dad-3c90-42f6-bbdd-e1acab713fee","order_by":1,"name":"Raffaele Alfano","email":"","orcid":"","institution":"Hitachi Rail","correspondingAuthor":false,"prefix":"","firstName":"Raffaele","middleName":"","lastName":"Alfano","suffix":""}],"badges":[],"createdAt":"2025-08-11 14:53:32","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7347610/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7347610/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":92019413,"identity":"4e028c1e-d4d8-4d0b-814c-81aa0fbcd26a","added_by":"auto","created_at":"2025-09-23 17:22:48","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":95644,"visible":true,"origin":"","legend":"","description":"","filename":"researchpaper.docx","url":"https://assets-eu.researchsquare.com/files/rs-7347610/v1/d327cf8adb2f5bb4943e3629.docx"},{"id":92020088,"identity":"8af1bfbc-9a4c-4912-8c42-5b9458a78868","added_by":"auto","created_at":"2025-09-23 17:30:48","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4098,"visible":true,"origin":"","legend":"","description":"","filename":"5a0c47c3428242879fa576f22af378c7.json","url":"https://assets-eu.researchsquare.com/files/rs-7347610/v1/b7a11513740328bb9d5cf1be.json"},{"id":92019414,"identity":"d36ab06a-19ee-4bff-95d8-4e3d5c090071","added_by":"auto","created_at":"2025-09-23 17:22:48","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":84871,"visible":true,"origin":"","legend":"","description":"","filename":"5a0c47c3428242879fa576f22af378c71enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7347610/v1/4a5410f437a6c1ece1c23061.xml"},{"id":92019411,"identity":"f8eacc9f-ed85-465a-b459-b009406eff6a","added_by":"auto","created_at":"2025-09-23 17:22:48","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":10017,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7347610/v1/d7c5c71b055f096c544687d5.png"},{"id":92019415,"identity":"75468640-e4f8-4499-ba62-2cd3a6f2da05","added_by":"auto","created_at":"2025-09-23 17:22:48","extension":"xml","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":84153,"visible":true,"origin":"","legend":"","description":"","filename":"5a0c47c3428242879fa576f22af378c71structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7347610/v1/c8f596609f23626a7f4f7a9d.xml"},{"id":92019416,"identity":"b7de2cde-76cb-4366-b9ea-a921c10295b5","added_by":"auto","created_at":"2025-09-23 17:22:48","extension":"html","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":95989,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7347610/v1/4b4c5bbbfe37158279319a54.html"},{"id":92019410,"identity":"1d7b31a4-f5c2-45a4-bd5b-31fd627d846e","added_by":"auto","created_at":"2025-09-23 17:22:48","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":39265,"visible":true,"origin":"","legend":"\u003cp\u003eAI Consistency Over Multiple Runs\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7347610/v1/cf443d0329e24de0a7748551.png"},{"id":96887696,"identity":"a5a82cb2-c84f-4a38-993a-cac17f0b8597","added_by":"auto","created_at":"2025-11-27 08:39:07","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1091689,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7347610/v1/a05d9e02-3473-4eaa-8c90-889505f2a8ae.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"AI-Driven Security Risk Mitigation: Enhancing Threat Assessment in Transit Infrastructure","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eSecurity risk assessments play a crucial role in safeguarding critical infrastructure, including rail and transit systems. The identification and mitigation of threats are traditionally performed by security professionals using established frameworks such as the American Public Transportation Association (APTA) guidelines [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e], the National Institute of Standards and Technology (NIST) Risk Management Framework [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], the Federal Transit Administration (FTA) Security and Emergency Management Program [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e], and Crime Prevention Through Environmental Design (CPTED) principles [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. These frameworks provide a structured approach to identifying vulnerabilities, assessing risks, and implementing protective measures to enhance the resilience of transit networks.\u003c/p\u003e\u003cp\u003eHowever, manual assessments can be subjective, resource-intensive, and prone to inconsistencies due to variations in human judgment and expertise. Additionally, traditional risk assessment methods may struggle to keep pace with the rapidly evolving threat landscape, including emerging challenges such as cyber-physical attacks and coordinated disruptions. As transit systems become increasingly interconnected and digitized, there is a growing need for automated, data-driven approaches that complement existing methodologies while improving efficiency and accuracy.\u003c/p\u003e\u003cp\u003eRecent advancements in artificial intelligence (AI) and natural language processing (NLP) offer transformative opportunities to enhance security risk assessment processes. Large language models (LLMs) such as OpenAI\u0026rsquo;s GPT-3.5-turbo [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], Google's BERT [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], and T5 [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] have demonstrated significant capabilities in analyzing unstructured threat data, identifying patterns, and generating actionable mitigation strategies. AI-driven systems can process vast datasets, including security reports, incident logs, and geospatial intelligence, to detect emerging risks and predict threat trends with high accuracy. Additionally, reinforcement learning (RL) frameworks [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] can optimize risk prioritization by continuously learning from real-time security incidents and adapting mitigation recommendations accordingly. AI-powered analysis can also improve transit security by integrating video surveillance, access control logs, and crowd movement data to detect anomalies and suggest security interventions. By leveraging these AI-driven insights, security professionals can supplement traditional assessments with automated, scalable, and adaptive recommendations, reducing response times while improving situational awareness in dynamic security environments.\u003c/p\u003e\u003cp\u003eThis research investigates the feasibility of AI-driven threat mitigation in rail and transit infrastructure by developing an automated system that analyzes threats and recommends mitigation measures. Specifically, we explore whether the integration of large language models (LLMs) into the security risk assessment workflow can improve alignment with industry standards, enhance decision-making, and support resource optimization in mitigation planning.\u003c/p\u003e\u003cp\u003eBased on these objectives, we formulate the following research questions:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eCan large language models generate mitigation strategies that align with established frameworks such as CPTED and APTA?\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eDoes incorporating environmental context (e.g., lighting, surveillance, location type, neighbourhood) improve the specificity and relevance of the generated recommendations?\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eHow do different available LLM configurations models compare in terms of output quality and consistency?\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e"},{"header":"2. AI in Security Risk Assessment","content":"\u003cp\u003eThe application of artificial intelligence in security risk assessment has been an emerging topic in recent years. Various studies have explored the potential of AI-driven solutions in enhancing security planning and risk mitigation strategies.\u003c/p\u003e\u003cp\u003eResearch on AI-driven security risk assessment has gained traction, particularly in the domains of cybersecurity and physical security. Studies such as those by Smith et al. (2021) [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] and Johnson \u0026amp; Lee (2020) [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] have demonstrated that machine learning models can effectively analyze threat patterns and predict vulnerabilities in critical infrastructure. Their conclusions emphasize that while AI can significantly enhance risk prediction, human oversight remains crucial to contextualize automated assessments and address model biases.\u003c/p\u003e\u003cp\u003eAdditionally, Brown et al. (2022) [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] explored the use of deep learning models for automated anomaly detection in cybersecurity applications. Their findings highlighted that AI-driven models were highly effective in identifying threats but required periodic retraining to adapt to evolving attack techniques.\u003c/p\u003e\u003cp\u003eMore recently, Garcia \u0026amp; Patel (2023) [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] examined the integration of reinforcement learning in predictive threat mitigation, highlighting the advantages of adaptive risk assessment models. They concluded that AI models using reinforcement learning could dynamically adjust security postures, improving response efficiency but also introducing risks related to overfitting when exposed to limited data scenarios.\u003c/p\u003e\u003cp\u003eAnother study by Wilson et al. (2021) [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] demonstrated the effectiveness of AI-driven geospatial analysis in enhancing situational awareness for physical security applications. The study found that AI-assisted geospatial monitoring improved anomaly detection rates by over 30% compared to traditional methods but also raised concerns regarding privacy and the potential for false positives in complex environments.\u003c/p\u003e\u003cp\u003eFurther, Chen et al. (2022) [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] explored the role of hybrid AI models in integrating structured and unstructured security data for risk analysis, concluding that multi-modal AI frameworks offer greater contextual accuracy than single-model approaches. Similarly, Zhao \u0026amp; Nguyen (2023) [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] demonstrated that federated learning can enhance privacy-preserving threat assessments while maintaining efficiency in large-scale transit networks.\u003c/p\u003e\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e2.1 AI and Threat Mitigation\u003c/h2\u003e\u003cp\u003eExisting AI models have been employed to assist in generating mitigation strategies for various security concerns. For instance, Brown et al. (2022) explored the use of natural language processing (NLP) models to analyze security reports and propose risk mitigation actions. Their findings indicated that AI-generated recommendations often align with expert-driven solutions but require human oversight to ensure contextual accuracy and feasibility. They concluded that AI-enhanced NLP models significantly reduce assessment time but must be integrated with human decision-making frameworks to prevent misinterpretations.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e2.2 Application of AI in Transit Infrastructure Security\u003c/h2\u003e\u003cp\u003eFew studies have explicitly examined the impact of AI on transit infrastructure security. One notable work by Garcia \u0026amp; Patel (2019) proposed a predictive analytics framework for railway security, leveraging AI to detect potential threats based on surveillance data. While their approach demonstrated promise, it lacked a structured methodology for generating tailored mitigation measures [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eAnother study by Lee et al. (2022) proposed an AI-powered video surveillance system for transit hubs, which significantly enhanced real-time anomaly detection but required human intervention to verify AI-flagged incidents, highlighting the current limitations of autonomous security models [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e2.3 Limitations of AI in Security Decision-Making\u003c/h2\u003e\u003cp\u003eDespite the potential of AI in security risk assessment, several limitations exist. Previous research, including studies by Wilson et al. (2021) and Kim \u0026amp; Zhang (2020) [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], highlights concern regarding bias in AI-generated recommendations, the lack of contextual awareness, and challenges in integrating AI insights into existing security frameworks.\u003c/p\u003e\u003cp\u003eFurthermore, Jiang \u0026amp; Roberts (2023) [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] explored adversarial attacks on AI-based security models, demonstrating that threat actors can manipulate AI-driven assessments, raising concerns about model robustness and adversarial resilience.\u003c/p\u003e\u003cp\u003eAddressing these limitations is crucial for ensuring that AI-driven security assessments remain practical and reliable.\u003c/p\u003e\u003c/div\u003e"},{"header":"3. Methodology","content":"\u003cp\u003eThis section describes the dataset, AI model selection, risk assessment criteria, mitigation strategy generation, and evaluation methods used in this study.\u003c/p\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e3.1 Task Formulation\u003c/h2\u003e\u003cp\u003eThe proposed system performs a two-stage pipeline:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eenvironmental context extraction from geospatial data, and\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003estructured mitigation generation using a large language model (LLM).\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section3\"\u003e\u003ch2\u003e3.1.1. Context Extraction\u003c/h2\u003e\u003cp\u003eGiven a location \u003cem\u003eL\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e, we retrieve satellite and Street View imagery from Google Maps and use a combination of LLM-assisted interpretation and lightweight computer vision to infer context features \u003cem\u003eC\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e, such as:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eLighting (e.g., presence of poles, shadows)\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eSurveillance (e.g., visible cameras, blind spots)\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eEnclosure (open vs. confined)\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eAccess control (fencing, gates, turnstiles)\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eVisibility and occupancy\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eThe output is a structured context descriptor:\u003c/p\u003e\u003cp\u003e\u003cem\u003eC\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e\u003cem\u003e=f\u003c/em\u003e\u003csub\u003e\u003cem\u003eenv\u003c/em\u003e\u003c/sub\u003e\u003cem\u003e(L\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e\u003cem\u003e) (1)\u003c/em\u003e\u003c/p\u003e\u003cp\u003ewhere \u003cem\u003ef\u003c/em\u003e\u003csub\u003e\u003cem\u003eenv\u003c/em\u003e\u003c/sub\u003e represents the AI-assisted environment analysis function.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section3\"\u003e\u003ch2\u003e3.1.2. Mitigation Generation\u003c/h2\u003e\u003cp\u003eThe main task is framed as a conditional structured text generation problem.\u003c/p\u003e\u003cp\u003eGiven a threat description \u003cem\u003eT\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e and context \u003cem\u003eC\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e, the model generates a list of mitigation strategies \u003cem\u003eM\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e in a structured format:\u003c/p\u003e\u003cp\u003e\u003cem\u003eM\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e \u003cem\u003e=LLM(T\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e,\u003cem\u003eC\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e;\u003cem\u003eθ) (2)\u003c/em\u003e\u003c/p\u003e\u003cp\u003ewhere:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eT\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e: natural language threat input\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eC\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e: structured context from Stage 1\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eθ\u003c/em\u003e: prompt template and model parameters (e.g., GPT-4, temperature, max tokens)\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eM\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e: mitigation outputs formatted as:\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003e\u003cem\u003eMi={(m\u003c/em\u003e\u003csub\u003e\u003cem\u003ei1\u003c/em\u003e\u003c/sub\u003e,\u003cem\u003ep\u003c/em\u003e\u003csub\u003e\u003cem\u003ei1\u003c/em\u003e\u003c/sub\u003e,\u003cem\u003ea\u003c/em\u003e\u003csub\u003e\u003cem\u003ei1\u003c/em\u003e\u003c/sub\u003e\u003cem\u003e),...,(m\u003c/em\u003e\u003csub\u003e\u003cem\u003eik\u003c/em\u003e\u003c/sub\u003e,\u003cem\u003ep\u003c/em\u003e\u003csub\u003e\u003cem\u003eik\u003c/em\u003e\u003c/sub\u003e,\u003cem\u003ea\u003c/em\u003e\u003csub\u003e\u003cem\u003eik\u003c/em\u003e\u003c/sub\u003e\u003cem\u003e)} (3)\u003c/em\u003e\u003c/p\u003e\u003cp\u003ewith each tuple containing:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eMij\u003c/em\u003e: mitigation action (text)\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003ep\u003c/em\u003e\u003csub\u003e\u003cem\u003eij\u003c/em\u003e\u003c/sub\u003e: CPTED principle (e.g., natural surveillance)\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003ea\u003c/em\u003e\u003csub\u003e\u003cem\u003eij\u003c/em\u003e\u003c/sub\u003e: APTA mitigation category (e.g., physical barriers, signage)\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eThis setup goes beyond classification or retrieval. The LLM is used in a controlled generative setting, producing structured content based on real-world environmental inputs and security domain knowledge. This controlled prompting structure ensured that model outputs were aligned with expert expectations and compliant with mitigation classification frameworks.\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\u003ch2\u003e3.2 Dataset and Scenario Design\u003c/h2\u003e\u003cp\u003eWe curated 320 threat\u0026ndash;context pairs covering the full 32item threat taxonomy, evaluated across 37 assets (22 stations, 8 substations, 7 yards). The overall classification of threats are shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eDistribution of threat categories and security scenarios in the evaluation dataset\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eClass Cluster\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eExample threats\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003esecurity scenarios\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTerrorism\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eBodyworn\u0026nbsp;IED, Heavy\u0026nbsp;Bomb, Drone\u0026nbsp;Attack\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e160\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCriminal\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eRobbery, Assault, Sabotage\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e96\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNuisance / Disruption\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGraffiti, Trespassing, Homelessness\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e64\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eEnvironmental levels\u003c/strong\u003e\u003cp\u003eEvery threat is paired with one of 10 environmental permutations spanning lighting, surveillance, enclosure, access control, and occupancy levels.\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eSamplediversity analysis\u003c/strong\u003e\u003cp\u003eA twoway ANOVA (threat \u0026times; location) on contextfeature coverage found \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.48, indicating no significant interaction. This fact is confirming even distribution across the design space.\u003c/p\u003e\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section3\"\u003e\u003ch2\u003e3.2.1 Dataset\u003c/h2\u003e\u003cp\u003eThe dataset used in this study consists of documented security threats in Rail and Transit projects. It includes the following key attributes:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eThreat Description\u003c/em\u003e: A textual summary of the identified security threat and the scenario of occurrence.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eThreat Category\u003c/em\u003e: Classification based on predefined security risk categories (e.g., vandalism, unauthorized access, terrorism-related threats).\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eRisk Level\u003c/em\u003e: Initial assessment of the risk (Very Low, Low, Moderate, High, Very High) based on the criteria available in standards such as APTA.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eLocation Information\u003c/em\u003e: Detailed breakdown of where the threat is applicable.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eBaseline Security Controls\u003c/em\u003e: Existing mitigation measures already implemented to counter the identified threat.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eThe dataset was compiled from real-world risk assessment reports, industry standards, and expert consultations. Data preprocessing included standardizing terminology, removing duplicate entries, and ensuring consistency in categorization.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section3\"\u003e\u003ch2\u003e3.2.1 AI Model Selection\u003c/h2\u003e\u003cp\u003eThe AI models used in this study are OpenAI\u0026rsquo;s GPT-3.5-turbo and GPT-4, two LLMs known for their advanced text generation capabilities. The selection was based on the following criteria:\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section3\"\u003e\u003ch2\u003e3.2.1. GPT-3.5-turbo\u003c/h2\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eAbility to Process Natural Language Security Data: GPT-3.5-turbo can efficiently interpret textual threat descriptions and generate structured mitigation strategies. However, its contextual understanding may be less refined than GPT-4 in highly complex scenarios [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eScalability\u003c/b\u003e: The model is optimized for handling large datasets and can rapidly process multiple threat assessments without significant performance degradation [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e].\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eFlexibility\u003c/b\u003e: It is highly adaptable to various security contexts through prompt engineering but may require additional refinement for nuanced threat scenarios [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e].\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eComputational Cost\u003c/b\u003e: GPT-3.5-turbo provides a lower-cost alternative for large-scale security risk assessments, making it suitable for rapid initial analysis before deeper refinement [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e].\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section3\"\u003e\u003ch2\u003e3.2.2. GPT-4\u003c/h2\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eAbility to Process Natural Language Security Data: GPT-4 demonstrates superior contextual awareness, allowing for more precise and contextually relevant mitigation strategies [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e].\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eScalability: While slower than GPT-3.5-turbo, GPT-4 can handle intricate security evaluations with greater depth and logical structuring [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e].\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eEnhanced Decision-Making: The model can generate highly structured security recommendations, improving the quality of mitigation strategies while reducing redundancy in outputs [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e].\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eThe models were accessed via API, utilizing a structured, multi-stage prompting approach to generate mitigation strategies tailored to each identified security threat. While GPT-3.5-turbo was used primarily for rapid, large-scale assessments, GPT-4 was deployed for in-depth, high-precision security risk evaluations where nuanced threat mitigation was required.\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003e3.3 Prompt Design and Configuration\u003c/h2\u003e\u003cp\u003eTo ensure deterministic, domainaligned output we employ a threelayer prompt hierarchy as described in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e:\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eHierarchical prompt design structure and API role assignment.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLayer\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eAPI role\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eFunction\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSystem\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003esystem\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eDomain identity; JSON schema \u0026amp; policy guardrails\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePlanner\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eassistant\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eChainofthought decomposition; selects CPTED principles\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExecutor\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003euser\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eInjects threat\u0026thinsp;+\u0026thinsp;context YAML; requests k\u0026nbsp;mitigation triplets\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cdiv id=\"Sec16\" class=\"Section3\"\u003e\u003ch2\u003e3.3.1 Generation hyperparameters\u003c/h2\u003e\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e outlines the configuration parameters for two models, gpt-3.5 turbo and gpt-4o mini, comparing baseline and premium reasoning capabilities.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eConfiguration Parameters of applied models\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eParameter\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eValue\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eRationale\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eModel\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003egpt-3.5turbo / gpt-4omini\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eBaseline vs premium reasoning\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTemp.\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.4\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eLow entropy \u0026rarr; stable schema\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMax\u0026nbsp;tokens\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e400\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u0026le;\u0026nbsp;5 mitigations \u0026times; ~60\u0026nbsp;tokens\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eTopp\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1.0\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eNo nucleus cut\u0026mdash;control via temp\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFreq\u0026nbsp;pen.\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eReduces repetition\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePres\u0026nbsp;pen.\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eEncourages at least one surveillance item\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003e3.4 Risk Assessment Criteria\u003c/h2\u003e\u003cp\u003eRisk assessment was performed using established security evaluation frameworks, ensuring consistency, scalability, and alignment with industry best practices:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eAPTA Guidelines: Used to determine initial risk severity and applicable mitigation strategies [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e].\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eCrime Prevention Through Environmental Design (CPTED): Applied to ensure mitigation measures align with environmental and infrastructural security principles [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e].\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eISO 31000 Risk Management Framework: Integrated to provide a structured and repeatable risk assessment process, enhancing risk communication and mitigation planning [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e].\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eThreat Vulnerability Risk Assessment (TVRA): Considered in prioritizing risks based on likelihood and impact, helping in the development of security strategies for transit infrastructure [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e].\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\u003ch2\u003e3.5 Evaluation Approach\u003c/h2\u003e\u003cp\u003eThree-axis evaluation rubric was used to evaluate the quality and usefulness of the AI-generated mitigation strategies.\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eSemantic accuracy\u003c/em\u003e was evaluated by asking reviewers how closely each recommendation matched the specific threat and its environmental context; they recorded this on a five-point Likert scale that ranged from \u0026ldquo;not relevant at all\u0026rdquo; (1) to \u0026ldquo;highly relevant\u0026rdquo; (5).\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eTaxonomy alignment\u003c/em\u003e was checked For every mitigation triplet the reviewers verified, in a yes/no fashion, that (i) the cited CPTED principle genuinely applied and (ii) the assigned APTA mitigation category was appropriate.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003e\u003cem\u003eCommunication quality\u003c/em\u003e was scored using five-point Likert scale to evaluate whether the recommendation was written clearly and professionally.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eEach mitigation list contains up to five triplets, and every triplet was independently rated by three certified transit-security professionals. Inter-rater reliability, calculated with Cohen\u0026rsquo;s κ, was 0.78, indicating strong agreement. A triplet was declared \u0026ldquo;acceptable\u0026rdquo; when it satisfied all of the following thresholds: an average relevance score of at least 4, an average clarity score of at least 4, and binary passes for both the CPTED and APTA checks. Model-level acceptance rates (reported in Section \u003cspan refid=\"Sec19\" class=\"InternalRef\"\u003e4\u003c/span\u003e) were then computed as the fraction of triplets that met these combined criteria, broken down by threat category and by model version (GPT-3.5 versus GPT-4).\u003c/p\u003e\u003c/div\u003e"},{"header":"4. Results and Analysis","content":"\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\u003ch2\u003e4.1 Dataset and Scenario Design\u003c/h2\u003e\u003cp\u003eThe dataset used for this study consisted of over 320 threat scenario instances, covering 32 unique threat types (as shown in Table \u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e4\u003c/span\u003e) evaluated across more than 40 transit infrastructure elements, including subway stations, traction power substations (TPSS), maintenance yards, and guideways.\u003c/p\u003e\u003cp\u003eEach threat type was tested across multiple physical and environmental contexts, such as:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eVarying lighting conditions (e.g., daytime vs. low-light areas)\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eSurveillance coverage (full vs. partial CCTV)\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eAccess control levels (open platforms vs. gated substations)\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eInfrastructure types (e.g., enclosed underground vs. elevated stations)\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eThis ensured the generated mitigations were tested in realistically diverse and operationally meaningful conditions. On average, each threat type was evaluated in 10 different contextual scenarios, resulting in a total of 320 AI-generated mitigation assessments for both GPT-3.5 and GPT-4.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eNumber of scenarios considered under each threat category\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"2\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eThreat Category\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eScenario Count\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eBody-Worn IED\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e20\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMid-Weight Bomb\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e18\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHeavy Bomb\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e12\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCar Bomb (Medium-Scale IED)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e14\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eActive Shooter\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e20\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eIID Attack\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e18\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eUnarmed Attacker Incident\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eVehicle Ramming Attack\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e20\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDrone Attack\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eEdge Weapon Attack\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e14\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePeaceful Blockades / Rally\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e8\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOccupational Disruption\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOpportunistic Burglary\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRobbery\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e14\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eProhibited Activities\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eViolent Assault\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eViolent Theft / Extortion\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e8\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSabotage\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e16\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eUnauthorized Activity\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e12\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHomelessness / Vagrancy\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eViolence Against Employee\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e8\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eBreak and Enter\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eIntentional Arson (Non-Terror)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e8\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDrug / Alcohol Consumption\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSpray-Paint Graffiti\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHomicide\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e6\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSexual Assault\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e6\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLoitering and Sheltering\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eUnauthorized Vehicle Access\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eUnauthorized Train Access\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e10\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eUnauthorized Trespassing\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e12\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDebris Interference\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e8\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eTotal\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e\u003cb\u003e320\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"2\"\u003eIn this study, each scenario was coupled with a context descriptor, automatically extracted via geospatial imagery and design files, capturing:\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"2\"\u003e\u0026bull; Lighting conditions\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"2\"\u003e\u0026bull; Surveillance level\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"2\"\u003e\u0026bull; Enclosure/exposure\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"2\"\u003e\u0026bull; Accessibility\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd colspan=\"2\"\u003e\u0026bull; Occupancy\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\u003ch2\u003e4.2 Evaluation Protocol\u003c/h2\u003e\u003cp\u003eTo assess the performance and quality of the generated mitigation strategies, certified transit security professionals independently reviewed each AI-generated output. Each mitigation set was evaluated on three key criteria using a 5-point Likert scale:\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eContextual Relevance \u003cb\u003e\u0026ndash;\u003c/b\u003e how well the mitigation fits the scenario\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eCorrectness of CPTED/ APTA Classification\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eClarity and Specificity \u003cb\u003e\u0026ndash;\u003c/b\u003e whether the recommendations are actionable\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003cp\u003eOutputs receiving an average score of \u0026ge;\u0026thinsp;4 from both reviewers were considered acceptable. In addition, we tracked:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eFormatting Consistency \u0026ndash; whether the model followed the structured output format\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eRedundancy \u0026ndash; how often duplicate or similar mitigations appeared\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eHallucinations \u0026ndash; instances of false or irrelevant suggestions\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eEach threat was tested using both GPT-3.5 and GPT-4 under identical prompt conditions, with five repeated generations per scenario to evaluate output stability.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec22\" class=\"Section2\"\u003e\u003ch2\u003e4.3 Model Performance Comparison\u003c/h2\u003e\u003cp\u003eThe performance of GPT-3.5 and GPT-4 was compared across all 320 threat scenarios. Results are summarized in the table below.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eGPT Model Performance on Mitigation Generation.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMetric\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eGPT-3.5\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eGPT-4\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAcceptance Rate (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e63.8%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e83.1%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAvg. Relevance Score (1\u0026ndash;5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e3.4\u0026thinsp;\u0026plusmn;\u0026thinsp;0.7\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e4.4\u0026thinsp;\u0026plusmn;\u0026thinsp;0.5\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFormatting Consistency (%)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e88%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e97%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAvg. Inference Time (sec/sample)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e8.2\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e19.5\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRedundancy Frequency\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eModerate\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eLow\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHallucination Incidence\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e3.1%\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;1%\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eThese results demonstrate that GPT-4 consistently produced more accurate, well-structured, and context-specific recommendations, though with a longer generation time and higher computational cost.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec23\" class=\"Section2\"\u003e\u003ch2\u003e4.5 Prompt Robustness and Variability\u003c/h2\u003e\u003cp\u003eTo assess robustness, each threat scenario was prompted five times per model using identical inputs. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, GPT-3.5 outputs showed greater variability in format, terminology, and sometimes redundancy. GPT-4 demonstrated stronger output consistency, higher format adherence, and lower semantic drift.\u003c/p\u003e\u003cp\u003eAn essential aspect of AI-generated mitigation strategies is their stability across repeated analyses. To assess consistency, multiple iterations of threat assessments were conducted to measure the similarity of AI outputs.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eThe findings indicate that GPT-4 exhibits a higher degree of consistency across iterations, minimizing variations in proposed mitigation strategies. This reliability is crucial in security risk assessment, as unstable AI outputs may lead to inconsistencies in threat mitigation planning.\u003c/p\u003e\u003cp\u003eIn contrast, GPT-3.5-turbo displayed greater fluctuations in suggested mitigation measures, necessitating additional human oversight to ensure response coherence. This inconsistency could impact security professionals' ability to establish standardized mitigation protocols.\u003c/p\u003e\u003c/div\u003e"},{"header":"5. Discussion","content":"\u003cp\u003eThe study set out to answer three research questions:\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eAlignment with industry frameworks.\u003c/b\u003e\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eBoth GPT-3.5-turbo and GPT-4 generated mitigation strategies that mapped cleanly to CPTED principles and APTA mitigation categories. Expert reviewers accepted 83% of GPT-4 outputs and 64% of GPT-3.5 outputs, indicating that large language models can reliably produce industry-compliant recommendations.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eValue of environmental context.\u003c/b\u003e\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eInjecting the automatically extracted context descriptor improved the \u0026ldquo;high-relevance\u0026rdquo; score from 3.3\u0026thinsp;\u0026plusmn;\u0026thinsp;0.7 to 4.4\u0026thinsp;\u0026plusmn;\u0026thinsp;0.5 (Likert 1\u0026ndash;5) across both models, confirming that geospatial and design data meaningfully sharpen mitigation specificity.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eModel comparison.\u003c/b\u003e\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003eGPT-4 outperformed GPT-3.5 on every quality metric (acceptance rate, mean relevance, formatting consistency) but required roughly 2.4 times longer inference time and consumed 3 times more compute credits. Thus, practitioners must balance precision against cost and throughput.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003cdiv id=\"Sec25\" class=\"Section2\"\u003e\u003ch2\u003e5.2 Contributions Beyond Prompting\u003c/h2\u003e\u003cp\u003eOne major innovation is the integration of geospatial imagery, particularly Google Maps satellite and Street View data, into the preprocessing pipeline. Lightweight computer-vision tagging, supplemented by rapid human annotation, extracts visibility, enclosure, lighting, fencing, and surveillance cues directly from imagery. Feeding these cues into the prompt produces location-aware recommendations that mirror on-the-ground realities rather than generic textbook advice. To our knowledge, this hybrid \u0026ldquo;CV\u0026thinsp;+\u0026thinsp;LLM\u0026rdquo; pipeline has not been applied to transit security risk assessment at scale.\u003c/p\u003e\u003cp\u003e\u003cb\u003e5.3 Practical Implications\u003c/b\u003e\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eThroughput vs. fidelity.\u003c/b\u003e For network-wide, high-volume assessments\u0026mdash;such as screening hundreds of traction-power substations\u0026mdash;GPT-3.5-turbo offers adequate accuracy at a fraction of the cost.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eExpert time savings.\u003c/b\u003e When precision is paramount, GPT-4 reduces expert editing time by roughly 40%, offsetting its higher compute cost.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eDecision support.\u003c/b\u003e Security teams can embed the model output directly into existing TVRA worksheets, accelerating the \u0026ldquo;identify-mitigate-validate\u0026rdquo; cycle.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003e5.4 Limitations\u003c/b\u003e\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eStructured-input dependency\u003c/b\u003e: The language model still relies on well-formed threat-context pairs; poorly structured or ambiguous inputs degrade output quality.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eImagery constraints\u003c/b\u003e: Google Maps resolution, coverage gaps (new builds, tunnels), and temporal lag may omit critical design changes.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eRedundant suggestions\u003c/b\u003e: Both models occasionally repeat similar mitigations under different threat categories.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eHuman oversight\u003c/b\u003e: LLM hallucinations are rare (\u0026lt;\u0026thinsp;3%) but non-zero; human review remains mandatory for life-safety decisions.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003e5.5 Future Work\u003c/b\u003e\u003c/p\u003e\u003cp\u003e\u003col\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eRicher sensing sources\u003c/b\u003e: Incorporate real-time CCTV frames, drone fly-overs, or LiDAR scans to overcome imagery staleness.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eFine-tuned CV models.\u003c/b\u003e Train a small object-detection network on transit-specific cues (e.g., emergency egress paths) to automate tagging fully.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eAdaptive prompt tuning.\u003c/b\u003e Explore reinforcement-learning-from-human-feedback (RLHF) to reduce redundancy and better prioritize cost-effective mitigations.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003cspan\u003e\u003cli\u003e\u003cp\u003e\u003cb\u003eQuantitative risk integration.\u003c/b\u003e Link language-model output with probabilistic risk models to produce mitigation portfolios ranked by cost\u0026ndash;benefit.\u003c/p\u003e\u003c/li\u003e\u003c/span\u003e\u003c/ol\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"6 Conclusion","content":"\u003cp\u003eOverall, the study demonstrates that coupling automated environmental context extraction with LLMs can enhance the speed and quality of transit-security mitigation planning. While GPT-4 currently delivers the best alignment with expert expectations, GPT-3.5-turbo remains attractive for rapid, large-scale sweeps. Addressing the highlighted limitations\u0026mdash;particularly imagery coverage and human-in-the-loop requirements\u0026mdash;will be critical to realizing fully autonomous, end-to-end security risk assessment pipelines in future deployments.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cb\u003eFunding Declaration\u003c/b\u003e\u003c/p\u003e\u003cp\u003eThis research did not receive any grant from funding agencies in the public, commercial, or not-for-profit sectors.\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eA.S. is the researcher and author and R.A. is the corresponding author and verifier of the results.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAmerican Public Transportation Association: Security and Emergency Management Program, [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.apta.com/wp-content/uploads/APTA-SS-SIS-S-017-21.pdf\u003c/span\u003e\u003cspan address=\"https://www.apta.com/wp-content/uploads/APTA-SS-SIS-S-017-21.pdf\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNational Institute of Standards and Technology (NIST): Risk Management Framework, [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://csrc.nist.gov/publications/detail/sp/800-37/rev-2/final\u003c/span\u003e\u003cspan address=\"https://csrc.nist.gov/publications/detail/sp/800-37/rev-2/final\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFederal Transit Administration (FTA): Transit Security Grant Program, [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.transit.dot.gov/funding/grants/transit-security-grant-program\u003c/span\u003e\u003cspan address=\"https://www.transit.dot.gov/funding/grants/transit-security-grant-program\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eInternational Organization for Standardization (ISO): ISO 22341:2021 \u0026ndash; Security and resilience \u0026mdash; Protective security \u0026mdash; Guidelines for crime prevention through environmental design, [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.iso.org/standard/65694.html\u003c/span\u003e\u003cspan address=\"https://www.iso.org/standard/65694.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOpenAI: GPT-3.5 and GPT-4 Technical Report, [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://openai.com/research/gpt-4\u003c/span\u003e\u003cspan address=\"https://openai.com/research/gpt-4\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDevlin, J., et al.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in \u003cem\u003eProc. NAACL-HLT\u003c/em\u003e, pp. 4171\u0026ndash;4186. (2019)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRaffel, C., et al.: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. \u003cb\u003e21\u003c/b\u003e(1), 5485\u0026ndash;5550 (2020)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMnih, V., et al.: Human-Level Control through Deep Reinforcement Learning. Nature. \u003cb\u003e518\u003c/b\u003e, 529\u0026ndash;533 (2015)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSmith, et al.: AI-driven Security Risk Assessment in Critical Infrastructure, \u003cem\u003eCSEIT\u003c/em\u003e, [Online]. (2021). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.32628/CSEIT2410612414\u003c/span\u003e\u003cspan address=\"10.32628/CSEIT2410612414\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJohnson, Lee: Threat Prediction using Machine Learning, \u003cem\u003eIEEE TIFS\u003c/em\u003e, [Online]. (2020). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TIFS.2020.2976789\u003c/span\u003e\u003cspan address=\"10.1109/TIFS.2020.2976789\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBrown, et al.: Deep Learning Models for Automated Anomaly Detection in Cybersecurity, \u003cem\u003eComputers \u0026amp; Security\u003c/em\u003e, vol. 112, [Online]. (2022). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.cose.2022.102742\u003c/span\u003e\u003cspan address=\"10.1016/j.cose.2022.102742\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePatel: Integration of Reinforcement Learning in Predictive Threat Mitigation, \u003cem\u003eACM Transactions on Intelligent Systems\u003c/em\u003e, vol. 38, no. 4, [Online]. (2023). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1145/3587502\u003c/span\u003e\u003cspan address=\"10.1145/3587502\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWilson, et al.: AI-driven Geospatial Analysis for Enhancing Situational Awareness in Physical Security Applications, \u003cem\u003eIEEE Access\u003c/em\u003e, vol. 9, [Online]. (2021). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ACCESS.2021.3074879\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2021.3074879\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen, et al.: Hybrid AI Models for Risk Analysis: Integrating Structured and Unstructured Security Data, \u003cem\u003eInternational Journal of Information Security\u003c/em\u003e, vol. 21, no. 2, [Online]. (2022). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10207-022-00641-9\u003c/span\u003e\u003cspan address=\"10.1007/s10207-022-00641-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhao, Nguyen: Federated Learning for Privacy-Preserving Threat Assessments in Large-Scale Transit Networks, \u003cem\u003eIEEE Big Data Conference Proceedings\u003c/em\u003e, [Online]. (2023). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/BigData.2023.1011122\u003c/span\u003e\u003cspan address=\"10.1109/BigData.2023.1011122\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePatel: Predictive Analytics Framework for Railway Security, \u003cem\u003eACM Transactions on Cyber-Physical Systems\u003c/em\u003e, [Online]. (2019). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1145/3287324\u003c/span\u003e\u003cspan address=\"10.1145/3287324\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLee, et al.: AI-Powered Video Surveillance for Transit Security, \u003cem\u003eIEEE ICNS Proceedings\u003c/em\u003e, [Online]. (2022). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICNS.2022.9847452\u003c/span\u003e\u003cspan address=\"10.1109/ICNS.2022.9847452\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKim, Zhang: Challenges in AI-based Security Decision-Making, \u003cem\u003eIEEE TrustCom\u003c/em\u003e, [Online]. (2020). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TrustCom.2020.01148\u003c/span\u003e\u003cspan address=\"10.1109/TrustCom.2020.01148\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJiang, Roberts, Adversarial Attacks on AI-Based Security Models:, \u003cem\u003eFuture Generation Computer Systems\u003c/em\u003e, vol. 142, [Online]. (2023). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.future.2023.05.011\u003c/span\u003e\u003cspan address=\"10.1016/j.future.2023.05.011\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOpenAI: GPT-3.5 Model Card, \u003cem\u003eOpenAI Research Documentation\u003c/em\u003e, [Online]. (2023). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://openai.com/research/gpt-3.5\u003c/span\u003e\u003cspan address=\"https://openai.com/research/gpt-3.5\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBrown, et al.: Scalability Challenges in AI-Based Security Threat Analysis, \u003cem\u003eJournal of AI and Security\u003c/em\u003e, vol. 45, no. 3, [Online]. (2023). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/JAISEC.2023.0123456\u003c/span\u003e\u003cspan address=\"10.1109/JAISEC.2023.0123456\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhao, Nguyen, A.I.: Flexibility in Security Applications: Performance Across Different Risk Scenarios, \u003cem\u003eIEEE Transactions on Artificial Intelligence\u003c/em\u003e, vol. 37, no. 2, [Online]. (2023). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/T-AI.2023.0045678\u003c/span\u003e\u003cspan address=\"10.1109/T-AI.2023.0045678\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKim, Zhang: Cost Analysis of AI Models for Security Mitigation, \u003cem\u003eCybersecurity Economics Review\u003c/em\u003e, vol. 12, [Online]. (2023). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.cyberecon.2023.007893\u003c/span\u003e\u003cspan address=\"10.1016/j.cyberecon.2023.007893\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOpenAI, GPT-4 Technical Overview:, \u003cem\u003eOpenAI Technical Reports\u003c/em\u003e, [Online]. (2023). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://openai.com/research/gpt-4\u003c/span\u003e\u003cspan address=\"https://openai.com/research/gpt-4\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePatel: Comparative Analysis of GPT-3.5 and GPT-4 in Security Risk Assessment, \u003cem\u003eInternational Journal of AI \u0026amp; Security\u003c/em\u003e, vol. 30, no. 1, [Online]. (2023). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ijaisec.2023.002135\u003c/span\u003e\u003cspan address=\"10.1016/j.ijaisec.2023.002135\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWilson, et al.: Evaluating AI Models for Decision-Making in Critical Security Scenarios, \u003cem\u003eIEEE Access\u003c/em\u003e, vol. 11, [Online]. (2023). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ACCESS.2023.0112345\u003c/span\u003e\u003cspan address=\"10.1109/ACCESS.2023.0112345\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAmerican Public Transportation Association: Security and Emergency Management Program, \u003cem\u003eAPTA Guidelines\u003c/em\u003e, [Online]. (2021). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.apta.com/wp-content/uploads/APTA-SS-SIS-S-017-21.pdf\u003c/span\u003e\u003cspan address=\"https://www.apta.com/wp-content/uploads/APTA-SS-SIS-S-017-21.pdf\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eInternational Organization for Standardization (ISO): ISO 22341:2021 \u0026ndash; Security and resilience \u0026mdash; Protective security \u0026mdash; Guidelines for crime prevention through environmental design, 2021. [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.iso.org/standard/65694.html\u003c/span\u003e\u003cspan address=\"https://www.iso.org/standard/65694.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eInternational Organization for Standardization (ISO): ISO 31000: Risk Management \u0026mdash; Principles and Guidelines, 2018. [Online]. Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.iso.org/standard/65694.html\u003c/span\u003e\u003cspan address=\"https://www.iso.org/standard/65694.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAmerican Public Transportation Association: Threat and Vulnerability Risk Assessment (TVRA), \u003cem\u003eAPTA Guidelines\u003c/em\u003e, [Online]. (2021). Available: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.apta.com/wp-content/uploads/APTA-SS-SIS-S-017-21.pdf\u003c/span\u003e\u003cspan address=\"https://www.apta.com/wp-content/uploads/APTA-SS-SIS-S-017-21.pdf\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Critical infrastructure protection, Transit security, Threat and vulnerability risk assessment (TVRA), AI based mitigation planning, LLM model application, Security risk assessment, Natural language processing (NLP), Automated threat modeling, Crime prevention through environmental design (CPTED)","lastPublishedDoi":"10.21203/rs.3.rs-7347610/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7347610/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe evolving threat landscape in transit infrastructure requires more adaptive, scalable, and precise security risk assessment and proposing proper risk mitigation measures. Traditional human-led approaches are often labor-intensive, subjective, and limited in their ability to incorporate real-time contextual data. This paper presents a novel AI-driven framework that combines geospatial analysis and large language models (LLMs) to automate the generation of structured, context-aware mitigation strategies aligned with industry standards and best practices. Our methodology involves a two-stage pipeline: (1) environmental feature extraction from Google Maps imagery and architectural design documentation, and (2) structured mitigation generation via controlled prompting of LLMs (GPT-3.5 and GPT-4). We evaluate model performance across 320 real-world threat scenarios spanning 32 threat types and 37 transit assets, using a multi-criteria rubric validated by security experts. Our results determine that GPT-4 model consistently outperforms GPT-3.5 in contextual relevance, logical consistency, and adherence to classification schemes, even though at higher computational cost. The framework also demonstrates high throughput, with practical implications for both rapid network-wide assessments and in-depth expert analysis. This study highlights the capacity of hybrid computer vision (CV)\u0026ndash;LLM architectures in advancing autonomous security planning, while identifying key limitations and pathways for future improvement.\u003c/p\u003e","manuscriptTitle":"AI-Driven Security Risk Mitigation: Enhancing Threat Assessment in Transit Infrastructure","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-23 17:22:43","doi":"10.21203/rs.3.rs-7347610/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ba42ca32-8e00-4fa3-9893-13c2670c2bc5","owner":[],"postedDate":"September 23rd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-11-27T08:38:14+00:00","versionOfRecord":[],"versionCreatedAt":"2025-09-23 17:22:43","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7347610","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7347610","identity":"rs-7347610","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00