Advancing Audio-Visual Attention Analysis in 360° Videos Through Real-Time Visualization

doi:10.21203/rs.3.rs-5924870/v1

Advancing Audio-Visual Attention Analysis in 360° Videos Through Real-Time Visualization

2025 · doi:10.21203/rs.3.rs-5924870/v1

preprint OA: closed

Full text JSON View at publisher

Full text 169,869 characters · extracted from preprint-html · click to expand

Advancing Audio-Visual Attention Analysis in 360° Videos Through Real-Time Visualization | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Advancing Audio-Visual Attention Analysis in 360° Videos Through Real-Time Visualization Amit Hirway, Yuansong Qiao, Niall Murray This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5924870/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This study presents an open-source, real-time visualization tool designed to analyse audio-visual attention in 360° video environments under varying sound conditions. Traditional methods, such as static saliency maps and post-hoc analyses, often fail to capture the dynamic and participant-specific nature of attention shifts in immersive environments. To address these limitations, the proposed tool dynamically integrates head pose fixation maps with sound intensity heatmaps, enabling real-time tracking of attention patterns across different audio conditions, including No Sound (NS), Stereo (ST), First-Order Ambisonics (FO), and Third-Order Ambisonics (HO). Attention shifts across sound conditions were quantified using the Jaccard Index, which measures the overlap of the top 5% most-viewed regions across participants. The results demonstrate that increasing auditory complexity—from silence to spatial audio—significantly broadens visual exploration. First-Order Ambisonics (FO) led to the most dispersed attention patterns, with a 62.4% reduction in attention overlap indoors and 58.8% outdoors compared to NS. Third-Order Ambisonics (HO) resulted in a 61.2% reduction indoors and 52.0% outdoors, suggesting that while FO encourages broader exploration, HO facilitates a more focused distribution of attention. Notably, HO conditions led to a 3.2% increase indoors and a 16.6% increase outdoors in attention overlap compared to FO, indicating that higher-order spatial audio helps guide attention more precisely in complex environments. Unlike conventional approaches, which rely on static analyses, this tool provides real-time, participant-specific insights into attention shifts, offering a dynamic perspective on how spatial audio influences exploration. These capabilities empower VR content creators and researchers with actionable insights, optimizing spatial audio design and enhancing user engagement. By offering a robust and adaptable framework, this study advances the understanding of audio-visual interactions in immersive media environments. 360° video Spatial Audio Ambisonics Audio-Visual attention Real-Time Visualization Fixation Maps Heat Maps Open-Source Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 1. Introduction Virtual Reality (VR) has revolutionized digital experiences by creating immersive environments where users engage with both visual and auditory stimuli. Among the most popular VR content formats are 360° videos, which allow users to explore spherical environments through Head-Mounted Displays (HMDs). These experiences enhance user presence and interactivity but pose challenges in understanding how auditory and visual cues interact to shape user attention. A critical factor in VR engagement is audio-visual attention (AVA), the process by which users selectively focus on visual elements while responding to auditory stimuli (Almquist & Pasquero, 2019). Understanding AVA is essential for optimizing VR content design, improving user experiences, and advancing immersive storytelling. Spatial audio, particularly ambisonics (Calamia et al., 2019), has emerged as a powerful tool for directing attention in 360° environments. By enabling precise localization of sound sources, spatial audio can shift focus from centrally salient visual elements to peripheral regions, encouraging users to explore their surroundings more comprehensively. However, effectively capturing how users respond to spatial audio remains an ongoing challenge. Traditional methods, such as static saliency maps and post-hoc evaluations of head pose and gaze data (Min & Hou, 2021), have provided valuable insights but struggle to capture the dynamic interplay between sound and vision or account for real-time, participant-specific interactions. Previous studies (Hirway et al., 2020; Hirway et al., 2022) have shown that spatial audio influences visual attention patterns, but their reliance on retrospective and static analyses has limited their ability to uncover attention shifts as they occur. These constraints underscore the need for tools capable of offering real-time, data-driven insights into the effects of auditory stimuli on attention behaviour. To overcome these challenges, this study presents a novel, open-source, real-time visualization tool that evaluates AVA in 360° videos. The tool dynamically integrates head pose fixation maps with sound intensity heatmaps, offering both qualitative visualizations and a quantitative framework to assess attention patterns under different sound conditions. Unlike traditional methods, this participant-specific, real-time approach enables deeper exploration of the interplay between sound and vision, providing a more comprehensive understanding of how auditory stimuli influence user behaviour. To validate the tool's effectiveness, a user study involving 73 participants was conducted ((Hirway et al., 2024), evaluating their visual attention while viewing a dataset of 360° videos (Farina, 2020) under four distinct sound conditions: No Sound (NS), Stereo Sound (ST), First-Order Ambisonics (FO), and Third-Order Ambisonics (HO). These conditions facilitated a systematic investigation into how increasing auditory complexity influences attention patterns, highlighting the advantages of spatial audio—particularly HO—in guiding exploration and enhancing engagement within VR environments. This paper makes the following key contributions: 1. Introduction of an open-source tool that integrates real-time fixation maps and sound intensity heatmaps with a quantitative framework for understanding AVA in 360° videos. 2. Demonstration of the tool's effectiveness through a user study with 73 participants, capturing both qualitative visualizations and quantitative comparisons to reveal how spatial audio influences attention patterns, particularly highlighting HO’s role in fostering dynamic exploration. 3. Actionable insights for immersive content design, offering guidance on optimizing spatial audio to enhance engagement and exploration in VR environments. 4. Broader implications across industries, with potential applications in fields such as training simulations, virtual tourism, and immersive storytelling, where spatial audio plays a crucial role in enhancing user engagement. The remainder of this paper is organized as follows: Section 2 reviews the existing literature and contextualizes our work within the field of audio-visual attention in immersive media. Section 3 presents a demonstration of the developed tool and details the data acquisition process. Section 4 describes the tool's functionality and its evaluation through various performance metrics. Section 5 provides an in-depth analysis of the results obtained from the dataset, highlighting key trends and findings. Section 6 explores the factors influencing visual attention under different audio conditions. Section 7 discusses the broader implications of our findings for immersive media research and VR content design. Section 8 concludes the paper and outlines potential directions for future work. 2. RELATED WORK Research into attention patterns in 360° videos has primarily focused on visual attention (VA), utilizing head and eye tracking to understand user engagement with immersive content. These studies have yielded valuable insights and produced public datasets, such as the dataset of head and eye movements for 360° videos by David et al. (2018) and the Panonut360 dataset by Xu et al. (2024), contributing to the foundational understanding of user behaviour in virtual environments. However, VA studies often prioritize visual cues, neglecting the role of auditory stimuli—particularly spatial audio—in shaping attention in 360° environments. 2.1 Visual Attention Studies Early visual attention (VA) research in 360° videos employed saliency maps to model attention patterns. For instance, Lo et al. (2017) explored how video pacing influenced viewer orientations by using OpenTrack to capture head movements across fast-paced and slow-paced video categories. Similarly, David et al. (2018) combined saliency maps and scan paths to analyse head and eye movement data in VA, revealing how visual elements drive attention shifts. While these studies provided significant insights, their focus on static visual analyses limited their applicability to dynamic, audio-visual interactions. 2.2 Audio-Visual Attention and Sound Recognizing the importance of sound, recent studies have incorporated auditory stimuli into attention analyses. Min et al. (2014) demonstrated that distinct audio cues could redirect user focus, particularly when sound sources were independent of visually salient objects. Marighetto et al. (2017) highlighted the role of audio-visual interactions in reducing gaze dispersion in non-360° videos, showing how sound can guide visual focus. Similarly, Wu et al. (2017) and Almquist (2018) examined audio-visual attention by linking head pose and content types to attention patterns, emphasizing the interplay between audio and visual cues. However, these studies often relied on static or non-spatial soundscapes, failing to capture real-time, participant-specific interactions. Expanding on these prior efforts, Hirway et al. (2020, 2022, 2024) have conducted a series of studies to systematically investigate the influence of spatial audio on visual attention (VA) and Quality of Experience (QoE) in 360° videos. Their first study (Hirway et al., 2020) examined how spatial audio impacts user immersion and attention by comparing non-spatial (stereo) and spatial (third-order Ambisonics) audio conditions. The findings revealed that spatial audio led to a more immersive experience, with users demonstrating higher maximum head pose pitch values and focusing more on sound-emitting regions, highlighting a quicker integration into the spatial environment. Building on this, Hirway et al. (2022) expanded the scope by incorporating additional physiological and behavioural measures such as pupil diameter, gaze fixations, and audio energy maps to analyse user behaviour under various sound conditions (no sound, stereo, and third-order Ambisonics). Their results showed significant variations in viewing patterns and physiological responses, with spatial audio leading to more dynamic exploration of the 360° scene. The study provided practical insights for optimizing audio-visual content by demonstrating how different sound configurations influence user engagement. Most recently, Hirway et al. (2024) conducted an extensive empirical study with 73 participants, comparing the effects of no sound, stereo audio, and both first- and third-order Ambisonics on head pose, eye gaze, pupil dilation, and heart rate. The results demonstrated that spatial audio, particularly third-order Ambisonics, captured heightened attention as evidenced by increased physiological arousal and diverse head movements in response to distributed sound sources. This study also emphasized the importance of spatial audio in enhancing user engagement and providing practical implications for optimizing content processing, encoding, distribution, and rendering. The availability of open-source datasets and scripts further enhances reproducibility and provides valuable resources for future immersive media research. 2.3 Eye-tracking and Spatial audio tools Advancements in eye-tracking technologies and spatial audio tools have significantly contributed to immersive media research, enabling deeper insights into audio-visual interactions. Eye-tracking systems, such as Tobii Pro Insight (2020) and Pupil Labs (Kassner et al., 2014), provide high-precision gaze tracking capabilities that allow researchers to analyse user attention in virtual environments. These tools have been widely adopted in VR and 360° video studies to measure gaze patterns, fixation durations, and attentional shifts. However, while these tools excel in tracking visual behaviour, they often lack seamless integration with spatial audio analysis, which plays a crucial role in immersive media experiences. On the other hand, spatial audio frameworks such as SPARTA and COMPASS (Politis et al., 2022), developed at Aalto University's Acoustics Lab, offer real-time spatial sound reproduction and visualization capabilities. These frameworks enable audio rendering with Ambisonics and directional audio techniques, enhancing immersion in virtual environments. Similarly, SoundSpaces 2.0 (Chen et al., 2022) provides geometry-based audio rendering, facilitating tasks such as source localization and spatial navigation within virtual spaces. While these spatial audio tools provide powerful features for auditory analysis, they primarily focus on audio processing and lack the necessary components to incorporate real-time visual attention tracking. Despite the availability of standalone tools for both eye-tracking and spatial audio analysis, the challenge lies in their integration to form a comprehensive framework that can simultaneously analyse audio-visual attention dynamics. Existing tools either prioritize visual tracking without considering spatial sound influences or focus on sound reproduction without capturing user gaze behaviour in real time. Bridging this gap is crucial for understanding how spatial audio affects visual attention, particularly in 360° immersive environments. This work aims to address these limitations by introducing an open-source framework that integrates real-time spatial audio visualization with eye-tracking analytics, offering a holistic approach to studying attention behaviours in immersive content. 2.4 Addressing the Gaps . The reviewed literature highlights a significant gap in the integration of spatial audio with real-time visual attention analysis in immersive environments. While existing studies have demonstrated the role of audio cues in guiding visual attention, they predominantly rely on static or non-spatial audio conditions, post-experiment data analysis, retrospective evaluations, and offline processing methods. These approaches, though informative, fail to capture the dynamic and interactive nature of immersive experiences in real time. Moreover, current eye-tracking and spatial audio tools, despite their sophistication, function as standalone systems that limit their ability to provide a holistic understanding of audio-visual interactions in 360° content. Bridging this gap is essential, as the interplay between spatial audio and visual attention is fundamental to optimizing user experiences in virtual reality storytelling, training simulations, and interactive media. A deeper understanding of these interactions can lead to improved content placement strategies, enhanced user engagement, and more effective VR applications. To address these challenges, this study underscores the necessity of developing integrated, real-time analysis tools that combine spatial audio with eye-tracking data. Such solutions can provide a deeper, data-driven understanding of how users explore immersive content under different sound conditions, leading to more informed design choices and optimized content delivery strategies. 3. TOOL DEMONSTRATION AND DATA ACQUISITION This study employed an open-source, real-time visualization tool to analyse audio-visual attention (AVA) in 360° videos. The dataset used in this study was previously explored in our earlier works (Hirway et al., 2020; 2022; 2024), which examined visual attention patterns under varying sound conditions using traditional methods such as saliency maps and post-hoc evaluations of head pose and gaze data. While these approaches provided valuable insights, they were limited in their ability to capture real-time, participant-specific interactions between auditory and visual stimuli. To overcome these challenges, the proposed tool offers an enhanced analytical capability, enabling a more dynamic and granular exploration of user attention patterns in response to spatial audio cues. The tool was applied to the dataset to provide deeper insights into how different audio conditions influence gaze behaviour and content engagement in immersive environments 3.1 Laboratory Design The experiment followed ISO 8589:2007 standards (ISO, 2007) to create a controlled sensory environment, minimizing external distractions. Participants were seated on a motorized rotating chair with three degrees of freedom (Fig. 1 ), allowing unrestricted head movements while viewing 360° videos. The hardware and software used in the experiment are summarized in Table 1 , ensuring high accuracy in data capture and seamless tool operation. 3.2 360° Video Stimuli The study employed a curated set of 360° videos to evaluate AVA under four sound conditions: No Sound (NS), Stereo Sound (ST), First-Order Ambisonics (FO), and Third-Order Ambisonics (HO). These videos adhered to ITU-T P.910 standards (International Telecommunication Union, 2023) for resolution (4096 × 2048 pixels), frame rate (29.970 FPS), and duration (60 seconds), ensuring consistency across all stimuli. The videos were categorized into two groups: Indoor Scenes: Centralized focus points, such as opera performances, with limited visual dispersion (Fig. 2a-e). Outdoor Scenes: Open environments with dispersed sound-emitting objects, such as clock towers or animal sounds (Fig. 2f-j). To standardize clip duration and format, FFmpeg (FFmpeg.org. (2021) was used for video pre-processing. The videos were randomized into 5-minute sequences to minimize participant bias during exposure. Table 1. Experiment Setup: Hardware and Software Component Details and Utility PC Intel Core™ i5 – 4590 CPU @ 3.30GHz, 10.0 GB RAM, 16GB nVidia GTX 970 Graphics Card, running Windows 10. Used to operate hardware and software for the immersive environment. HMD HTC Vive with Tobii Pro VR Integration (Tobii Pro, 2018). Enables participants to watch 360° videos. Headphones Beyerdynamic DT 990 Pro (Beyerdynamic, 2020). Used for listening to non-spatial and spatial audio. 360° Player GoPro VR Player (GoPro, 2020). Plays 360° videos on the HMD and records head orientation as yaw, pitch, and roll. 360° Videos Provide the audio-visual stimuli for participants. E4 Wristband Empatica E4 (Empatica, 2023). Collects physiological data, including heart rate, for analysis. Figure 2. Representative frames for videos in the Indoor (a-e) and Outdoor (f-j) categories 3.3 Data Acquisition for Tool Evaluation The data acquisition process focused on capturing participants’ interactions with the immersive 360° environment, collecting four key metrics: Head Pose Data: Captured at 120 Hz as unit quaternions (yaw, pitch, roll), representing participants’ head orientation within the spherical video space. These inputs formed the foundation for generating fixation maps (Fig. 3). Gaze Data: Tracked at 120 Hz to identify regions where participants directed their attention. While collected, gaze data was not analysed in this study, but it remains a target for future integration to enhance gaze accuracy analysis. Pupil Diameter: Monitored in real-time to observe physiological responses during video playback. Although visualized by the tool, pupil diameter data was not quantitatively analysed in the current study. Heart Rate: Measured using the Empatica E4 wristband to capture physiological responses throughout video playback. This metric was collected but not integrated into the tool’s visualizations, offering potential for future studies. Figure 3. Yaw, Pitch and Roll Angles in Degrees The tool processed these inputs in real-time, dynamically generating fixation maps and sound intensity heatmaps. By integrating these metrics, the tool overcomes critical limitations of traditional methods, enabling participant-specific, dynamic analyses of audio-visual attention. This innovative approach provides a deeper understanding of how spatial audio influences exploration patterns in immersive environments. 4. FUNCTIONALITY AND EVALUATION This section demonstrates the capabilities of the real-time visualization tool, focusing on its integration of multimodal metrics for analysing audio-visual attention (AVA) in 360° video environments. By combining fixation maps, sound intensity heatmaps, and quantitative indices, the tool addresses limitations of traditional static methods, offering richer, participant-specific insights into how auditory cues influence visual exploration. 4.1 Tool Overview The custom web-based tool (Fig. 4 ) was developed using JavaScript libraries, including three.js (Cabello, 2024), omnitone.js (Google Creative Lab, 2024), and JSAmbisonics (Maddams, 2024). It dynamically processes head pose, sound intensity, and pupil diameter data to generate real-time visual outputs that enable participant-specific analysis. The tool offers several key features: 1. Current Field of View (FOV): The tool visualizes participants’ real-time perspectives within the 360° environment by leveraging head pose data to approximate gaze direction. While head pose is less precise than direct eye-tracking, it serves as a reliable proxy for attention, especially during stabilized head movements. Future iterations may incorporate eye-tracking for enhanced precision. 2. Head Pose Scan Path: Tracks yaw and pitch movements over time, providing a visual representation of participants’ exploration patterns within the 360° environment. 3. Sound Intensity Heatmap: Dynamically overlays spatial audio intensity onto the video, highlighting correlations between auditory cues and visual attention shifts. This real-time overlay distinguishes the tool from static methods, enabling researchers to evaluate how sound conditions influence participant exploration dynamically. By enabling real-time analysis, the tool overcomes critical limitations of static, post-hoc approaches and provides actionable insights into how auditory cues shape attention patterns in immersive environments. 4.2 Fixation and Sound Intensity Heatmap Generation The framework processes head pose data to identify fixations, defined as moments when head orientation remains stable within a 0.5-degree angular deviation for at least 200 milliseconds (Duchowski, 2007). These fixations are mapped onto the equirectangular projection of the video to create fixation maps, revealing areas of concentrated attention. Simultaneously, spatial audio signals are processed using JSAmbisonics (Politis & Poirier-Quinot, 2020) to compute real-time sound intensity across ambisonics channels. These intensity values are visualized as heatmaps overlaid onto the corresponding regions of the 360° video. Figures 5a–5d and 6a–6d illustrate fixation patterns and sound intensity overlays for representative indoor and outdoor videos across sound conditions. In the No Sound (NS) condition (Figs. 5a, 6a), fixations were concentrated on central visual elements, indicating reliance on visual saliency in the absence of auditory cues. Peripheral regions remained largely unexplored. In the Stereo Sound (ST) condition (Figs. 5b, 6b), fixations showed slight deviations from the NS condition, reflecting minimal shifts in attention. In the First-Order Ambisonics (FO) condition (Figs. 5c, 6c), spatial audio successfully directed attention toward peripheral regions, expanding fixation patterns beyond central elements. The Third-Order Ambisonics (HO) condition (Figs. 5d, 6d) produced the most dispersed fixation patterns, with participants dynamically exploring the entire 360° environment. Peripheral regions received significant attention, highlighting HO’s superior ability to enhance exploration. This combined use of fixation maps and sound intensity heatmaps enables researchers to qualitatively evaluate how auditory cues influence visual exploration and provides a comprehensive understanding of user behaviour. 4.3 Visual Attention Analysis Using Jaccard Index To complement the qualitative findings, the Jaccard Index (Jaccard, 1901) was employed to quantify the overlap of visual attention patterns across different sound conditions (NS, ST, FO, and HO). The Jaccard Index provides valuable insights into how spatial audio influences gaze behaviour by measuring the similarity between the top 5% of fixations under each sound condition. The choice of the Jaccard Index in this study is motivated by several factors. First, its focus on overlap allows for a direct assessment of the extent to which spatial audio manipulations influence visual exploration, aligning well with the study’s core research question. Second, its binary comparison approach, which treats fixation locations as either present or absent, simplifies the analysis and reduces computational complexity compared to distance-based metrics. Finally, the interpretability of the Jaccard Index, with values ranging from 0 (no overlap) to 1 (complete overlap), makes it an intuitive and effective metric for assessing variations in gaze distribution across conditions. The results from the Jaccard Index analysis provide compelling evidence of how spatial audio complexity influences attention patterns. A high Jaccard Index value suggests that gaze patterns remain consistent across sound conditions, implying a limited effect of audio on visual attention. Conversely, a low Jaccard Index value indicates significant divergence in attention, suggesting that spatial audio cues effectively drive exploration and engagement with different parts of the scene. Notably, the analysis revealed that FO and HO conditions consistently encouraged participants to explore peripheral areas, leading to more dynamic and dispersed attention patterns compared to NS and ST conditions. These findings support the argument that higher-order spatial audio enhances spatial awareness and encourages broader visual engagement. By integrating qualitative visualizations with quantitative metrics, the framework offers a comprehensive means of analysing audio-visual attention in both indoor and outdoor environments. The use of the Jaccard Index in this context not only reinforces previous observations but also provides a measurable and reproducible way to assess the impact of spatial audio on gaze behaviour. This discussion highlights the importance of selecting appropriate analytical methods to gain a deeper understanding of user interactions within immersive media 5. RESULTS This section demonstrates the real-time visualization framework’s ability to analyse audio-visual attention patterns across different sound conditions in 360° videos. By leveraging examples from both indoor and outdoor environments, the framework effectively captures and quantifies attention shifts, offering valuable insights into how auditory cues influence user behaviour in immersive settings. Through the processing of head pose data and spatial audio cues, the framework provides a dynamic representation of attention distribution across varying auditory conditions, from silence (NS) to third-order ambisonics (HO). The integration of fixation maps with sound intensity heatmaps facilitates an intuitive understanding of how user attention adapts in response to spatial audio stimuli. The examples presented illustrate the framework’s potential to advance immersive media research by enabling a deeper exploration of the interplay between audio and visual stimuli in real-time scenarios. 5.1 Indoor Videos Indoor environments, often characterized by static and centralized focal points, provide an ideal scenario for observing constrained attention patterns. These settings highlight the tool’s capacity to reveal how attention evolves in response to increasing auditory complexity. For example, Fig. 7 depicts the top 5% most-viewed areas in an indoor video across sound conditions. Under NS and ST, attention remains concentrated on central visual elements, suggesting a reliance on visual saliency alone. However, with FO and HO, the tool visualizes a noticeable broadening of focus, as participants begin to explore peripheral areas influenced by spatial audio. Quantitative analysis using the Jaccard Index, presented in Table 2 , further underscores this shift. The high overlap values between NS and ST (e.g., 0.98) reflect minimal auditory impact, while the lower overlaps observed for FO and HO (e.g., 0.58 and 0.49, respectively) highlight the diversifying effect of spatial audio on attention distribution. These trends are visually reinforced in Fig. 8 , which compares overlap regions and showcases the tool’s ability to quantify shifts in exploration. These findings demonstrate how the tool captures not only the broadening effect of spatial audio but also the persistence of centralized attention in constrained environments. Such insights pave the way for future investigations into how spatial constraints interact with auditory cues to shape attention patterns. Table 2 Jaccard indices for Indoor and Outdoor videos across the four sound conditions Indoor Sound NS ST FO HO Outdoor Sound NS ST FO HO 1 NS 1 1 NS 1 ST 0.98 1 ST 0.94 1 FO 0.58 0.58 1 FO 0.24 0.23 1 HO 0.49 0.49 0.6 1 HO 0.19 0.18 0.76 1 3 NS 1 2 NS 1 ST 1 1 ST 0.99 1 FO 0.1 0.1 1 FO 0.37 0.37 1 HO 0.1 0.1 0.16 1 HO 0.34 0.34 0.61 1 5 NS 1 4 NS 1 ST 0.82 1 ST 0.99 1 FO 0.37 0.34 1 FO 0.36 0.36 1 HO 0.51 0.51 0.4 1 HO 0.64 0.64 0.57 1 6 NS 1 5 NS 1 ST 0.9 1 ST 0.98 1 FO 0.48 0.45 1 FO 0.76 0.76 1 HO 0.41 0.39 0.61 1 HO 0.73 0.73 0.87 1 7 NS 1 6 NS 1 ST 0.9 1 ST 0.91 1 FO 0.35 0.35 1 FO 0.37 0.36 1 HO 0.43 0.42 0.69 1 HO 0.49 0.49 0.7 1 5.2 Outdoor Videos In contrast to indoor environments, outdoor settings introduce dynamic and dispersed visual and auditory elements, providing a more complex scenario for attention analysis. These environments emphasize the tool's versatility in capturing broader and more exploratory attention patterns. As shown in Fig. 9 , the top 5% most-viewed areas in an outdoor video reveal an increasing dispersion of attention as sound complexity rises. Under NS and ST, participants primarily focus on prominent visual elements, while FO and HO guide attention toward peripheral and less salient regions. This effect is particularly pronounced in expansive outdoor scenes, where spatial audio encourages dynamic exploration. Quantitative data in Table 2 aligns with these observations, with Jaccard Index values for FO and HO showing significantly reduced overlaps compared to NS and ST. Figure 10 further illustrates this divergence, visualizing the overlap regions and emphasizing how spatial audio facilitates broader engagement with the 360° environment. These examples highlight the tool’s ability to analyse the complex interplay of visual and auditory stimuli in diverse contexts. 5.3 Broader Implications The combined analysis of fixation maps, sound intensity overlays, and quantitative indices such as the Jaccard Index underscores the tool’s comprehensive approach to studying audio-visual interactions. While indoor environments demonstrate the impact of auditory cues within constrained spaces, outdoor examples showcase the tool’s adaptability to dynamic and spatially complex settings. Figures 7 through 10 and Table 2 collectively illustrate how the tool bridges qualitative visualization with quantitative assessment. By enabling researchers to observe and measure attention shifts in real-time, the tool not only addresses limitations of static methods but also provides a foundation for exploring a wide range of research questions. 5.4 Establishing a Framework Beyond its immediate applications, the tool establishes a framework for understanding audio-visual attention in immersive media. By integrating real-time visualization and quantitative analysis, it opens new avenues for investigating questions such as: How do varying auditory complexities influence attention across different content types? What is the role of spatial audio in fostering engagement in interactive virtual environments? How can the findings inform design strategies for VR content creators and immersive storytellers? These questions highlight the tool’s broader relevance, emphasizing its potential to advance research in immersive media and related fields. 6. KEY INFLUENCES ON VISUAL ATTENTION IN 360° VIDEO ENVIRONMENTS The analysis of visual attention patterns in 360° video environments reveals significant variations based on spatial complexity, sound localization, exploration potential, sound-scene congruence, and environmental immersion. Indoor and outdoor environments exhibit distinct attention patterns, with notable reductions in attention overlap observed under higher spatial audio conditions. Below is a summary of the Jaccard Index percentage reductions across different sound conditions (NS to HO, ST to HO, FO to HO) for videos in both indoor and outdoor environments : Table 3 Percentage Reduction in Jaccard Index Values Across Sound Conditions for Indoor and Outdoor Videos Category Video Number NS to HO (%) ST to HO (%) FO to HO (%) Indoor 1 81 50 15 Indoor 3 90 90 0 Indoor 5 49 37 38 Indoor 6 59 54 15 Indoor 7 57 52 23 Outdoor 1 81 79 21 Outdoor 2 66 65 8 Outdoor 4 36 35 -78 Outdoor 5 27 25 -4 Outdoor 6 51 46 32 This table presents the reductions in attention overlap (Jaccard Index values) across different transitions, illustrating how spatial audio affects exploration behaviour in different environments. These values serve as a reference throughout the following discussion, providing quantitative support for the observed attention patterns and comparisons across sound conditions. 6.1 Spatial Complexity and Visual Attention Spatial complexity influences attention distribution, with indoor environments exhibiting more centralized focus compared to outdoor scenes. In Video 1 (indoor, two actors on stage), the transition from NS to HO resulted in an 81% reduction in Jaccard Index values, indicating a significant shift in visual attention. Similar patterns were observed in Video 5 (indoor, multiple actors moving across the stage), where a 27% reduction was noted, suggesting that increased scene complexity encourages broader exploration under spatial audio conditions. In contrast, outdoor environments such as Video 2 (hilltop with moving motorbike and dogs) exhibited a 66% reduction in Jaccard Index values from NS to HO, demonstrating that dynamic elements significantly enhance attention dispersion. Static outdoor environments, such as Video 6 (two actors sitting below a monument), resulted in a lower reduction of 51%, indicating that when visual elements are fixed, attention patterns remain more concentrated despite spatial audio enhancements. 6.2 Sound Localization and Visual Focus Sound localization significantly influenced attention shifts, particularly in dynamic outdoor environments. In Video 5 (outdoor, market square with a hidden musician), the Jaccard Index values decreased by 27% from NS to HO, suggesting that spatial audio effectively guided participants' gaze toward the off-screen sound source. Similarly, in Video 2, with moving sound sources, a reduction of 66% further supports the role of higher-order ambisonics in enhancing sound-driven attention shifts. Indoor environments, where sound sources were more congruent with visual elements, showed relatively smaller reductions. In Video 1, with performers fixed on stage, the transition from NS to HO led to a decrease of 81%, indicating that sound localization was less influential compared to outdoor scenarios with moving elements. 6.3 Exploration Potential Exploration potential varies based on scene openness and auditory complexity. Indoor environments with minimal visual complexity, such as Video 6, exhibited a 51% reduction in Jaccard overlap from NS to HO, reflecting limited exploratory behaviour due to the confined nature of the scene. In contrast, outdoor environments such as Video 4 (people and birds near water) showed a 36% reduction, suggesting moderate exploratory tendencies under spatial audio conditions. Notably, in Video 2, exploration potential was at its highest, with a 66% reduction, reflecting how spatial audio encourages users to engage with dynamic soundscapes. However, cases like Video 4, with a negative reduction of -77.8% from FO to HO, indicate that user behaviour may not always align with spatial audio cues in more uniform environments. 6.4 Sound-Scene Congruence The congruence between auditory and visual elements also impacts attention patterns. In Video 1 (indoor, two actors on stage), high congruence between sound and visuals led to an 81% reduction in Jaccard Index, reinforcing the idea that sound-scene alignment limits the influence of spatial audio. Conversely, in Video 5 (outdoor, hidden musician), a 27% reduction suggests that incongruent audio-visual elements encourage users to explore the environment more actively. 6.5 Environmental Immersion and Presence Environmental immersion was more pronounced in outdoor settings with dynamic sound sources. In Video 2, the significant reduction of 66% from NS to HO implies that the presence of dynamic auditory cues enhances the sense of presence, encouraging broader exploration of the environment. In contrast, the smaller reduction observed in indoor environments, such as Video 6 (single actor close to the camera), highlights how confined spaces restrict the immersive potential of spatial audio. 6.6 Summary of Key Findings The analysis of visual attention patterns across different video environments reveals key insights into the role of spatial audio: Spatial Complexity: Higher scene complexity correlates with greater attention dispersion, with up to 81% reduction in indoor settings and 66% in outdoor scenes, demonstrating the impact of dynamic elements. Sound Localization: Dynamic environments, such as Video 5 (market square) and Video 2 (hilltop with motorbike), showed substantial reductions in attention overlap, reinforcing the effectiveness of spatial audio in guiding attention. Exploration Potential: Outdoor environments encourage broader exploration, with spatial audio leading to significant reductions in Jaccard Index values. Sound-Scene Congruence: High congruence in indoor settings results in limited attention dispersion, while lower congruence encourages exploration in outdoor scenarios. Environmental Immersion: Outdoor environments exhibit stronger immersion effects, as indicated by larger reductions in attention overlap under spatial audio conditions. These findings provide actionable insights for VR content creators, highlighting the importance of tailoring spatial audio strategies to different environments to optimize user engagement and immersion. 7. DISCUSSION This study highlights the critical role of auditory cues in shaping visual attention within 360° video environments. By integrating real-time visualization of sound intensity with head pose and fixation data, it provides novel insights into how spatial audio conditions influence user behaviour dynamically. Unlike traditional approaches that rely on static saliency maps and post-hoc analyses, the real-time visualization tool used in this study captures attention shifts instantaneously, allowing for a deeper understanding of how sound conditions interact with spatial complexity to guide exploration. The findings demonstrate that increasing auditory complexity leads to more dispersed visual attention patterns, with notable differences between indoor and outdoor environments. 7.1 Impact of Sound Complexity on Attention Patterns The results demonstrate that as the complexity of sound conditions increased—from no sound (NS) to stereo sound (ST), first-order ambisonics (FO), and third-order ambisonics (HO)—participants exhibited increasingly diverse attention patterns. Under NS conditions, attention remained concentrated on central visual elements, particularly in indoor videos such as Video 3, where the stage and performer were the dominant visual elements. The Jaccard Index values for indoor settings under NS remained high, indicating minimal exploration beyond these focal points. As spatial audio complexity increased, a shift in attention distribution was observed. In Video 5 (indoor, multiple actors moving across the stage), the transition from NS to HO resulted in a 49% reduction in attention overlap, highlighting how spatial audio facilitated exploration beyond the initial fixation zones. Similarly, in outdoor environments, such as Video 2 (hilltop scene with a moving motorbike and dogs), the Jaccard Index values decreased by 66% from NS to HO, suggesting that dynamic sound elements prompted participants to explore the scene more thoroughly. Notably, the shift from FO to HO in some videos, such as Video 6 (indoor, single actor close to the camera), resulted in only a 15% reduction, indicating that in scenarios with limited spatial complexity, third-order ambisonics offered only marginal benefits in broadening attention distribution. 7.2 Third-Order Ambisonics: Enabling Broader and Dynamic Attention Shifts Third-order ambisonics (HO) emerged as the most effective sound condition for facilitating diverse and dynamic attention shifts, particularly in outdoor environments. The Jaccard Index analysis revealed significant differences between FO and HO conditions, with HO consistently resulting in broader exploration. In Video 5 (market square with a hidden musician), the transition from FO to HO resulted in a 4% reduction in attention overlap, highlighting how the increased spatial resolution of HO helped guide attention toward previously overlooked sound sources. In dynamic outdoor settings such as Video 2, where sound sources moved across a large area, HO contributed to an 8% reduction in overlap compared to FO, demonstrating its effectiveness in expanding the scope of exploration. These findings underscore the role of third-order ambisonics in enhancing auditory spatialization, leading to a more distributed and dynamic visual attention pattern, particularly in environments with high spatial complexity. 7.3 Comparing Indoor and Outdoor Environments A key finding of this study is the distinct attention dynamics observed between indoor and outdoor environments, driven by differences in spatial complexity and environmental openness. Indoor settings, such as Video 1 (two actors singing on stage), exhibited relatively centralized attention patterns, with a significant 81% reduction in attention overlap from NS to HO, indicating that spatial audio encouraged only moderate exploration within the confined environment. This suggests that spatial constraints, such as walls and limited movement, restrict the potential impact of spatial audio. Conversely, outdoor environments, such as Video 4 (people and birds near a water body), demonstrated a broader dispersion of attention under spatial audio conditions, with a 36% reduction in Jaccard Index values from NS to HO. The presence of open spaces and widely distributed sound sources contributed to more exploratory behaviour. Similarly, in Video 5, the presence of hidden sound sources resulted in a 27% reduction, highlighting the role of spatial audio in guiding attention beyond immediate visual cues. These findings suggest that spatial audio plays a more significant role in expanding attention in outdoor environments, where users are less constrained by physical boundaries and are more likely to engage with peripheral elements. 7.4 Addressing Limitations of Traditional Approaches Traditional approaches, such as static saliency maps and post-hoc analyses, often fail to capture the dynamic interplay between auditory and visual stimuli in immersive environments. The real-time visualization tool developed in this study provides a more comprehensive analysis by integrating head pose data, fixation maps, and sound intensity overlays to track attention patterns as they evolve. For example, in Video 6 (indoor, single performer close to the camera), where the spatial complexity was minimal, static methods might suggest uniform attention distribution across conditions. However, the real-time tool revealed nuanced shifts in attention, with a 15% reduction in overlap from FO to HO, showing how third-order ambisonics influenced subtle shifts in user focus. Similarly, in outdoor scenarios like Video 2, the tool effectively captured significant reductions in attention overlap (66%) as sound sources moved dynamically across the scene. These insights highlight the importance of real-time analysis in understanding the full impact of spatial audio, offering both qualitative and quantitative perspectives that traditional methods often overlook. 7.5 Key Contributions This study makes several important contributions to the field of immersive media research. First, it demonstrates how increasing sound complexity, particularly through third-order ambisonics, can broaden and diversify attention patterns in 360° environments. The analysis revealed that in dynamic outdoor environments such as Video 2, spatial audio contributed to a 66% reduction in attention overlap, fostering greater exploration and engagement. Second, it highlights the interplay between spatial complexity and auditory cues, revealing how environmental context—whether indoor or outdoor—affects exploration patterns. The findings from Video 1 (indoor, confined environment with high congruence) and Video 5 (outdoor, market square with hidden sound sources) demonstrate the varying influence of spatial audio in different settings. Finally, the study introduces a novel real-time visualization tool that integrates qualitative visualizations with quantitative metrics such as the Jaccard Index. By addressing limitations in traditional methods, this tool provides a robust framework for analysing dynamic attention patterns, offering valuable insights for designing more engaging and user-driven content in immersive environments. 8. CONCLUSION & FUTURE WORK This study introduces an open-source, real-time visualization tool designed to analyse the influence of auditory cues, particularly spatial audio, on visual attention in 360° video environments. By dynamically integrating fixation maps and sound intensity heatmaps, the tool provides not only real-time visualizations but also robust quantitative insights, such as tracking attention shifts through overlap reduction metrics. The results demonstrate that as auditory conditions become more complex—from No Sound (NS) to Stereo Sound (ST), First-Order Ambisonics (FO), and Third-Order Ambisonics (HO)—participants exhibit increasingly diverse and dynamic attention patterns. In outdoor environments, such as Video 2, attention overlap was reduced by 66%, indicating a significant increase in exploration under spatial audio conditions. Conversely, indoor environments, such as Video 1, exhibited an 81% reduction, suggesting that spatial audio’s influence is constrained by physical boundaries and scene structure. HO emerged as the most effective auditory condition for guiding attention, fostering broader exploration. In outdoor environments characterized by spatially complex stimuli, spatial audio contributed to a more significant reduction in attention overlap compared to indoor environments, where the effect of spatial audio was more constrained due to physical limitations. These findings underscore spatial audio’s dual role in enhancing immersion and serving as a mechanism to direct attention toward key elements within a scene. The tool overcomes critical limitations in traditional methodologies, such as static saliency maps and retrospective analysis, by offering real-time participant-specific insights that capture attention shifts dynamically across varying soundscapes. This capability has practical implications for improving narrative coherence, engagement, and content design in VR experiences. Understanding how attention dispersion varies across environments provides actionable insights for optimizing VR content layout, narrative pacing, and user engagement strategies. By establishing a robust framework for analysing audio-visual interactions through real-time metrics and dynamic visualizations, this study provides content creators and researchers with an adaptable tool for optimizing user experiences in immersive environments. These contributions lay the groundwork for advancing immersive media research and content creation. Building on the insights provided by this study, future research could explore several promising directions to deepen our understanding of the dynamics of audio-visual attention in immersive environments. The system could support additional metrics such as gaze data, pupil diameter, and heart rate. The range of applications could also be extended to incorporate domains such as education, training, storytelling etc. Declarations FUNDING This research is supported by Science Foundation Ireland through the ADAPT Centre and the European Regional Development Fund under Grant 13/RC/2106_P2. Additionally, it is funded by the Horizon Europe Framework Programme (HORIZON) under Grant Agreement 101070109 (TRANSMIXR) https://transmixr.eu. No additional external funding was received for this stud SUPPLEMENTARY MATERIALS To ensure transparency, facilitate replication, and support further research, all supplementary materials from this study are publicly available. These resources provide comprehensive support for the qualitative and quantitative analyses presented and enable deeper exploration of the interplay between auditory cues and visual attention. ETHICS APPROVAL AND CONSENT TO PARTICIPATE All participants provided informed consent prior to their involvement in the study. The research was conducted in accordance with ethical guidelines and was approved by the university’s Research Ethics Committee. References Almquist, E., & Pasquero, J. (2019). Audio-visual attention in immersive environments: Understanding the role of spatial sound. Journal of VR Interaction Research, 5, 135-150. Almquist, M., & Almquist, V. (2018). Analysis of 360° video viewing behaviours. Dissertation. Beyerdynamic. (2020). Beyerdynamic DT990 Pro. Retrieved March 27, 2023, from [https://europe.beyerdynamic.com/dt-990-pro.html](https://europe.beyerdynamic.com/dt-990-pro.html) Calamia, P. T., Murphy, D. A., & Wakefield, G. (2019). Influence of ambisonic spatialization on head movements in immersive 360° audio-visual experiences. Applied Acoustics, 150, 194–202. Cabello, R. (n.d.). three.js - JavaScript 3D library. Retrieved September 15, 2024, from [https://threejs.org/](https://threejs.org/) Chen, C., Zhang, Z., Wu, Y., & Lee, J. (2022). SoundSpaces 2.0: Geometry-Aware Audio Rendering for 3D Environments [arXiv preprint]. Retrieved from [arxiv.org](https://arxiv.org/abs/2206.08312) David, E., Gutierrez, J., Coutrot, A., Perreira da Silva, M., & Le Callet, P. (2018). A Dataset of Head and Eye Movements for 360° Videos. Proceedings of the 9th ACM Multimedia Systems Conference, 432–437. Retrieved from [https://hal.science/hal-01994923](https://hal.science/hal-01994923) Duchowski, A. T. (2007). Eye Tracking Methodology: Theory and Practice (2nd ed.). Springer-Verlag. doi: 10.1007/978-1-84628-609-4 Empatica. (n.d.). E4 wristband support page. Retrieved March 27, 2023, from [https://support.empatica.com/hc/en-us/categories/200023126-E4-wristband](https://support.empatica.com/hc/en-us/categories/200023126-E4-wristband) Farina, A. (2020). Index of /Public. Retrieved March 7, 2021, from [http://www.angelofarina.it/Public/](http://www.angelofarina.it/Public/) FFmpeg.org. (2021). FFmpeg. Retrieved March 27, 2023, from [https://ffmpeg.org/](https://ffmpeg.org/) Google Creative Lab. (n.d.). Omnitone: Spatial audio on the web. Retrieved September 15, 2024, from [https://google.github.io/omnitone/](https://google.github.io/omnitone/) Hirway, A., Qiao, Y., & Murray, N. (2020). A QoE and Visual Attention Evaluation on the Influence of Spatial Audio in 360 Videos. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), pp. 345-350. doi: 10.1109/AIVR50618.2020.00071 Hirway, A., Qiao, Y., & Murray, N. (2022). Spatial audio in 360° videos: does it influence visual attention? In Proceedings of the 13th ACM Multimedia Systems Conference (MMSys ’22), pp. 39–51. Association for Computing Machinery. doi: 10.1145/3524273.3528179 Hirway, A., Qiao, Y., & Murray, N. (2024). A Quality of Experience and Visual Attention Evaluation for 360° Videos with Non-spatial and Spatial Audio. ACM Transactions on Multimedia Computing, Communications, and Applications, 20(9), Article 271. doi: 10.1145/3650208 International Telecommunication Union. (2023). ITU-T P.910: Subjective video quality assessment methods for multimedia applications. Retrieved March 27, 2023, from [https://www.itu.int/rec/T-REC-P.910-202310-I/en](https://www.itu.int/rec/T-REC-P.910-202310-I/en) ISO. (2007). ISO 8589:2007 Sensory analysis — General guidance for the design of test rooms. International Standards Organization. Retrieved September 15, 2024, from [https://www.iso.org/obp/ui/#iso:std:iso:8589:ed-2:v1:en](https://www.iso.org/obp/ui/#iso:std:iso:8589:ed-2:v1:en) Jaccard, P. (1901). Nouvelles recherches sur la distribution florale. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 241-270. Kassner, M., Patera, W., & Bulling, A. (2014). Pupil: An open source platform for pervasive eye tracking and mobile gaze-based interaction. In Proc. ACM MobiSys Workshop on Mobile and Pervasive Eye Tracking, pp. XX–XX. doi: 10.1145/2611009.2611013 Lo, W.-C., Fan, C.-L., Lee, J., Huang, C.-Y., Chen, K.-T., & Hsu, C.-H. (2017). 360° video viewing dataset in head-mounted virtual reality. In Proc. 8th ACM Multimedia Systems Conf. (MMSys ’17), pp. 211–216. doi: 10.1145/3083187.3083219 Maddams, J. (n.d.). Ambisonics.js - B-format ambisonic decoder for the web. Retrieved September 15, 2024, from [https://github.com/jmaddams/Ambisonics.js](https://github.com/jmaddams/Ambisonics.js) Marighetto, P., Coutrot, A., Riche, N., Guyader, N., Mancas, M., Gosselin, B., & Laganiere, R. (2017). Audio-visual attention: Eye-tracking dataset and analysis toolbox. In Proc. IEEE Int. Conf. Image Processing (ICIP 2017), pp. 1802–1806. doi: 10.1109/ICIP.2017.8296592 Min, J., & Hou, Y. (2021). Audio-visual saliency in omnidirectional videos: a review. IEEE Transactions on Multimedia, 23(5), 1902–1915. Min, X., Zhai, G., Gao, Z., Hu, C., & Yang, X. (2014). Sound influences visual attention discriminately in videos. In Proc. 6th Int. Workshop on Quality of Multimedia Experience (QoMEX 2014), pp. 153–158. doi: 10.1109/QoMEX.2014.6982312 Politis, A., & Poirier-Quinot, D. (2020). JSAmbisonics: JavaScript library for first-order and higher-order ambisonic processing. Retrieved September 15, 2024, from [https://github.com/polarch/JSAmbisonics](https://github.com/polarch/JSAmbisonics) Politis, A. et al. (2022). SPARTA & COMPASS Suite for Spatial Audio. Aalto University Acoustics Lab. Retrieved from [aaltodoc.aalto.fi](https://aaltodoc.aalto.fi/items/72a60211-d51a-4404-b90d-096ae3970b97) Privitera, A. G., Fontana, F., & Geronazzo, M. (2024). The Role of Audio in Immersive Storytelling: a Systematic Review in Cultural Heritage. Multimedia Tools and Applications. Advance online publication. doi: 10.1007/s11042-024-19288-4 Tobii Pro. (2018). Tobii Pro VR Integration – based on HTC Vive Development Kit Description. Retrieved March 27, 2023, from [https://www.tobiipro.com/siteassets/tobii-pro/product-descriptions/tobii-pro-vr-integration-product-description.pdf/?v=1.7](https://www.tobiipro.com/siteassets/tobii-pro/product-descriptions/tobii-pro-vr-integration-product-description.pdf/?v=1.7) Tobii Pro Insight. (2020). Tobii Pro eye-tracking technology for research. Tobii Pro. Retrieved from [https://www.tobii.com](https://www.tobii.com) Vilkamo, S., Backman, J., & Pulkki, V. (2019). Binaural cue coding and rendering toolbox. Retrieved from [https://github.com/savil/binaural-cue-coding](https://github.com/savil/binaural-cue-coding) Wu, C., Tan, Z., Wang, Z., & Yang, S. (2017). A dataset for exploring user behaviors in VR spherical video streaming. In Proc. 8th ACM Multimedia Systems Conf. (MMSys ’17), New York, NY, USA, pp. 193–198. doi: 10.1145/3083187.3083210 Xu, Y., Du, J., Wang, J., Ning, Y., Zhou, S., & Cao, Y. (2024). Panonut360: A Head and Eye Tracking Dataset for Panoramic Video. arXiv preprint arXiv:2403.17708. Retrieved from [https://arxiv.org/abs/2403.17708](https://arxiv.org/abs/2403.17708) Beyerdynamic. (2020). Beyerdynamic DT990 Pro. Retrieved March 27, 2023, from [https://europe.beyerdynamic.com/dt-990-pro.html](https://europe.beyerdynamic.com/dt-990-pro.html) Cabello, R. (n.d.). three.js - JavaScript 3D library. Retrieved September 15, 2024, from [https://threejs.org/](https://threejs.org/) Chen, C., Zhang, Z., Wu, Y., & Lee, J. (2022). SoundSpaces 2.0: Geometry-Aware Audio Rendering for 3D Environments [arXiv preprint]. Retrieved from [arxiv.org](https://arxiv.org/abs/2206.08312) David, E., Gutierrez, J., Coutrot, A., Perreira da Silva, M., & Le Callet, P. (2018). A Dataset of Head and Eye Movements for 360° Videos. Proceedings of the 9th ACM Multimedia Systems Conference, 432–437. Retrieved from [https://hal.science/hal-01994923](https://hal.science/hal-01994923) Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5924870","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":411683073,"identity":"0c658c1c-0cd9-49bd-9afd-10ad4ef49a00","order_by":0,"name":"Amit Hirway","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA1ElEQVRIiWNgGAWjYFACxgYgkgCzHoBINiK0NDZAtTAbHCBOC9AOoEVg8yUOEOMs/mmH2x8w7rDINzh+xqz6Y1sdAx97A34tErcTgQ47I2G54UyO2Y2DbYcZ2HgI2QXW0iZhYHYArOUA0HkJ+HXIw7Wcf2NWcBDoMDb5B/i1GMC13MgxYzjYxgy0hYC7DIFaZiQCtdjfeFYscebcYR42HgIOk7ud/uADMKAMJPuTN36oKKuTk28/QMAaEIAYy2EAInmIUA8H7A9IUT0KRsEoGAUjCAAA0UFFXb/wcpYAAAAASUVORK5CYII=","orcid":"","institution":"Technological University of the Shannon – Midlands Midwest","correspondingAuthor":true,"prefix":"","firstName":"Amit","middleName":"","lastName":"Hirway","suffix":""},{"id":411683074,"identity":"69a238bf-7c1e-476c-b10d-c4916b8624cb","order_by":1,"name":"Yuansong Qiao","email":"","orcid":"","institution":"Technological University of the Shannon – Midlands Midwest","correspondingAuthor":false,"prefix":"","firstName":"Yuansong","middleName":"","lastName":"Qiao","suffix":""},{"id":411683075,"identity":"dba981d3-6836-4311-8b4f-8591e56cc0f7","order_by":2,"name":"Niall Murray","email":"","orcid":"","institution":"Technological University of the Shannon – Midlands Midwest","correspondingAuthor":false,"prefix":"","firstName":"Niall","middleName":"","lastName":"Murray","suffix":""}],"badges":[],"createdAt":"2025-01-29 14:23:23","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5924870/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5924870/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":75948545,"identity":"eec93b77-499b-4752-9adf-c881c826616f","added_by":"auto","created_at":"2025-02-10 21:55:36","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":363261,"visible":true,"origin":"","legend":"\u003cp\u003eParticipant experiencing the 360° environment in our lab\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-5924870/v1/3bbb2f75983c6bb9fd6ba0b5.png"},{"id":75948238,"identity":"1ef5d6a0-e0ee-4a4b-a8bf-e8b652c80ec1","added_by":"auto","created_at":"2025-02-10 21:47:36","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1042031,"visible":true,"origin":"","legend":"\u003cp\u003eRepresentative frames for videos in the Indoor (a-e) and Outdoor (f-j) categories\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-5924870/v1/19dbe22aef1007d5cae24006.png"},{"id":75948239,"identity":"e6e4d0f6-800c-4e9c-936f-bedc1174613a","added_by":"auto","created_at":"2025-02-10 21:47:36","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":84999,"visible":true,"origin":"","legend":"\u003cp\u003eYaw, Pitch and Roll Angles in Degrees\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-5924870/v1/b5e599ad708f763d74aa3db4.png"},{"id":75948546,"identity":"b088f386-151c-46ed-b218-e6f8c840cd67","added_by":"auto","created_at":"2025-02-10 21:55:36","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":376193,"visible":true,"origin":"","legend":"\u003cp\u003eWeb-based tool in action\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-5924870/v1/6420abf6526305e179372a7d.png"},{"id":75948551,"identity":"4680f762-216c-428e-a350-6870e0186cb4","added_by":"auto","created_at":"2025-02-10 21:55:37","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1200011,"visible":true,"origin":"","legend":"\u003cp\u003ea. Fixations for a participant for Indoor video 1 in the NS condition. Red dots represent the fixations.\u003c/p\u003e\n\u003cp\u003eb. Fixations for a participant for the same Indoor video in the ST (non-spatial audio) condition\u003c/p\u003e\n\u003cp\u003ec. Fixations and sound heatmap for a participant for the same Indoor video in the FO (spatial audio) condition\u003c/p\u003e\n\u003cp\u003ed. Fixations and so\u003cem\u003eu\u003c/em\u003end heatmap for a participant for the \u0026nbsp;\u0026nbsp;same Indoor video in the HO (spatial audio) condition\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-5924870/v1/b1271b499f8b116ecaa9c2b5.png"},{"id":75948262,"identity":"d2d2672d-888c-48a6-be67-5aa181ef327b","added_by":"auto","created_at":"2025-02-10 21:47:37","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":2296473,"visible":true,"origin":"","legend":"\u003cp\u003ea. Fixations for a participant for Outdoor video 2 in the NS condition. Red dots represent the fixations\u003c/p\u003e\n\u003cp\u003eb. Fixations for a participant for the same Outdoor video in the ST (non-spatial audio) condition.\u003c/p\u003e\n\u003cp\u003ec. Fixations and sound heatmap for a participant for the same Outdoor video in the FO (spatial audio) condition.\u003c/p\u003e\n\u003cp\u003ed. Fixations and so\u003cem\u003eu\u003c/em\u003end heatmap for a participant for the same Outdoor video in the HO (spatial audio) condition.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-5924870/v1/75d7b6d6354fe41e92a05d8d.png"},{"id":75948673,"identity":"ab16d206-e45a-412e-b76a-592cd34160fc","added_by":"auto","created_at":"2025-02-10 22:03:36","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":423039,"visible":true,"origin":"","legend":"\u003cp\u003eTop 5 percent most looked at areas of the Indoor video 1 across the four sound conditions\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-5924870/v1/916e12ed375907d279ad6015.png"},{"id":75948261,"identity":"af68e9ab-6bdc-43d5-aa11-a645e2b4f357","added_by":"auto","created_at":"2025-02-10 21:47:36","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":245473,"visible":true,"origin":"","legend":"\u003cp\u003eOverlap of the most looked at areas for the Indoor Video 1 across the four sound conditions. The colour gradient ranges from dark colours (low overlap) to bright colours (high overlap), with yellow and white areas showing the highest concentration of attention. Orange/Red arcs or strokes represent attention paths or regions influenced by spatial audio.\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-5924870/v1/cab132f3f50c4f8b9185edf1.png"},{"id":75948289,"identity":"bd59dae6-c8d3-4690-b5d2-5ef43de0b1cf","added_by":"auto","created_at":"2025-02-10 21:47:37","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":452954,"visible":true,"origin":"","legend":"\u003cp\u003eTop 5 percent most looked at areas of the Outdoor video 2 across the four sound conditions\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-5924870/v1/62ddcf0ce01fc0ea33c12c2a.png"},{"id":75948281,"identity":"17681241-346c-4881-84e3-493d60b6c34b","added_by":"auto","created_at":"2025-02-10 21:47:37","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":278299,"visible":true,"origin":"","legend":"\u003cp\u003eOverlap of the most looked at areas for the Outdoor Video 2 across the four sound conditions. The colour gradient ranges from dark colours (low overlap) to bright colours (high overlap), with yellow and white areas showing the highest concentration of attention. Orange/Red arcs or strokes represent attention paths or regions influenced by spatial audio\u003cem\u003e.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-5924870/v1/d60165d8785af8432a4cf338.png"},{"id":108492366,"identity":"2d356d8d-0a0d-41e0-9dea-831f1a35f14b","added_by":"auto","created_at":"2026-05-05 09:57:36","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":9543015,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5924870/v1/6b733bb2-8df6-48b0-8c47-fe4b310c2d3a.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Advancing Audio-Visual Attention Analysis in 360° Videos Through Real-Time Visualization","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eVirtual Reality (VR) has revolutionized digital experiences by creating immersive environments where users engage with both visual and auditory stimuli. Among the most popular VR content formats are 360\u0026deg; videos, which allow users to explore spherical environments through Head-Mounted Displays (HMDs). These experiences enhance user presence and interactivity but pose challenges in understanding how auditory and visual cues interact to shape user attention.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eA critical factor in VR engagement is audio-visual attention (AVA), the process by which users selectively focus on visual elements while responding to auditory stimuli (Almquist \u0026amp; Pasquero, 2019). Understanding AVA is essential for optimizing VR content design, improving user experiences, and advancing immersive storytelling.\u003c/p\u003e\n\u003cp\u003eSpatial audio, particularly ambisonics (Calamia et al., 2019), has emerged as a powerful tool for directing attention in 360\u0026deg; environments. By enabling precise localization of sound sources, spatial audio can shift focus from centrally salient visual elements to peripheral regions, encouraging users to explore their surroundings more comprehensively. However, effectively capturing how users respond to spatial audio remains an ongoing challenge.\u003c/p\u003e\n\u003cp\u003eTraditional methods, such as static saliency maps and post-hoc evaluations of head pose and gaze data (Min \u0026amp; Hou, 2021), have provided valuable insights but struggle to capture the dynamic interplay between sound and vision or account for real-time, participant-specific interactions. Previous studies (Hirway et al., 2020; Hirway et al., 2022) have shown that spatial audio influences visual attention patterns, but their reliance on retrospective and static analyses has limited their ability to uncover attention shifts as they occur. These constraints underscore the need for tools capable of offering real-time, data-driven insights into the effects of auditory stimuli on attention behaviour.\u003c/p\u003e\n\u003cp\u003eTo overcome these challenges, this study presents a novel, open-source, real-time visualization tool that evaluates AVA in 360\u0026deg; videos. The tool dynamically integrates head pose fixation maps with sound intensity heatmaps, offering both qualitative visualizations and a quantitative framework to assess attention patterns under different sound conditions. Unlike traditional methods, this participant-specific, real-time approach enables deeper exploration of the interplay between sound and vision, providing a more comprehensive understanding of how auditory stimuli influence user behaviour.\u003c/p\u003e\n\u003cp\u003eTo validate the tool\u0026apos;s effectiveness, a user study involving 73 participants was conducted ((Hirway et al., 2024), evaluating their visual attention while viewing a dataset of 360\u0026deg; videos (Farina, 2020) under four distinct sound conditions: No Sound (NS), Stereo Sound (ST), First-Order Ambisonics (FO), and Third-Order Ambisonics (HO). These conditions facilitated a systematic investigation into how increasing auditory complexity influences attention patterns, highlighting the advantages of spatial audio\u0026mdash;particularly HO\u0026mdash;in guiding exploration and enhancing engagement within VR environments.\u003c/p\u003e\n\u003cp\u003eThis paper makes the following key contributions: \u0026nbsp;\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e1. Introduction of an open-source tool that integrates real-time fixation maps and sound intensity heatmaps with a quantitative framework for understanding AVA in 360\u0026deg; videos. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e2. Demonstration of the tool\u0026apos;s effectiveness through a user study with 73 participants, capturing both qualitative visualizations and quantitative comparisons to reveal how spatial audio influences attention patterns, particularly highlighting HO\u0026rsquo;s role in fostering dynamic exploration. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e3. Actionable insights for immersive content design, offering guidance on optimizing spatial audio to enhance engagement and exploration in VR environments. \u0026nbsp;\u003c/p\u003e\n\u003cp\u003e4. Broader implications across industries, with potential applications in fields such as training simulations, virtual tourism, and immersive storytelling, where spatial audio plays a crucial role in enhancing user engagement.\u003c/p\u003e\n\u003cp\u003eThe remainder of this paper is organized as follows: \u0026nbsp;Section 2 reviews the existing literature and contextualizes our work within the field of audio-visual attention in immersive media. \u0026nbsp; Section 3 presents a demonstration of the developed tool and details the data acquisition process. \u0026nbsp; Section 4 describes the tool\u0026apos;s functionality and its evaluation through various performance metrics. \u0026nbsp; Section 5 provides an in-depth analysis of the results obtained from the dataset, highlighting key trends and findings. \u0026nbsp;Section 6 explores the factors influencing visual attention under different audio conditions. \u0026nbsp; Section 7 discusses the broader implications of our findings for immersive media research and VR content design. \u0026nbsp; Section 8 concludes the paper and outlines potential directions for future work. \u0026nbsp;\u003c/p\u003e"},{"header":"2. RELATED WORK","content":"\u003cp\u003eResearch into attention patterns in 360\u0026deg; videos has primarily focused on visual attention (VA), utilizing head and eye tracking to understand user engagement with immersive content. These studies have yielded valuable insights and produced public datasets, such as the dataset of head and eye movements for 360\u0026deg; videos by David et al. (2018) and the Panonut360 dataset by Xu et al. (2024), contributing to the foundational understanding of user behaviour in virtual environments. However, VA studies often prioritize visual cues, neglecting the role of auditory stimuli\u0026mdash;particularly spatial audio\u0026mdash;in shaping attention in 360\u0026deg; environments.\u003c/p\u003e \u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Visual Attention Studies\u003c/h2\u003e \u003cp\u003eEarly visual attention (VA) research in 360\u0026deg; videos employed saliency maps to model attention patterns. For instance, Lo et al. (2017) explored how video pacing influenced viewer orientations by using OpenTrack to capture head movements across fast-paced and slow-paced video categories. Similarly, David et al. (2018) combined saliency maps and scan paths to analyse head and eye movement data in VA, revealing how visual elements drive attention shifts. While these studies provided significant insights, their focus on static visual analyses limited their applicability to dynamic, audio-visual interactions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Audio-Visual Attention and Sound\u003c/h2\u003e \u003cp\u003eRecognizing the importance of sound, recent studies have incorporated auditory stimuli into attention analyses. Min et al. (2014) demonstrated that distinct audio cues could redirect user focus, particularly when sound sources were independent of visually salient objects. Marighetto et al. (2017) highlighted the role of audio-visual interactions in reducing gaze dispersion in non-360\u0026deg; videos, showing how sound can guide visual focus. Similarly, Wu et al. (2017) and Almquist (2018) examined audio-visual attention by linking head pose and content types to attention patterns, emphasizing the interplay between audio and visual cues. However, these studies often relied on static or non-spatial soundscapes, failing to capture real-time, participant-specific interactions.\u003c/p\u003e \u003cp\u003eExpanding on these prior efforts, Hirway et al. (2020, 2022, 2024) have conducted a series of studies to systematically investigate the influence of spatial audio on visual attention (VA) and Quality of Experience (QoE) in 360\u0026deg; videos. Their first study (Hirway et al., 2020) examined how spatial audio impacts user immersion and attention by comparing non-spatial (stereo) and spatial (third-order Ambisonics) audio conditions. The findings revealed that spatial audio led to a more immersive experience, with users demonstrating higher maximum head pose pitch values and focusing more on sound-emitting regions, highlighting a quicker integration into the spatial environment.\u003c/p\u003e \u003cp\u003eBuilding on this, Hirway et al. (2022) expanded the scope by incorporating additional physiological and behavioural measures such as pupil diameter, gaze fixations, and audio energy maps to analyse user behaviour under various sound conditions (no sound, stereo, and third-order Ambisonics). Their results showed significant variations in viewing patterns and physiological responses, with spatial audio leading to more dynamic exploration of the 360\u0026deg; scene. The study provided practical insights for optimizing audio-visual content by demonstrating how different sound configurations influence user engagement.\u003c/p\u003e \u003cp\u003eMost recently, Hirway et al. (2024) conducted an extensive empirical study with 73 participants, comparing the effects of no sound, stereo audio, and both first- and third-order Ambisonics on head pose, eye gaze, pupil dilation, and heart rate. The results demonstrated that spatial audio, particularly third-order Ambisonics, captured heightened attention as evidenced by increased physiological arousal and diverse head movements in response to distributed sound sources. This study also emphasized the importance of spatial audio in enhancing user engagement and providing practical implications for optimizing content processing, encoding, distribution, and rendering. The availability of open-source datasets and scripts further enhances reproducibility and provides valuable resources for future immersive media research.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Eye-tracking and Spatial audio tools\u003c/h2\u003e \u003cp\u003eAdvancements in eye-tracking technologies and spatial audio tools have significantly contributed to immersive media research, enabling deeper insights into audio-visual interactions. Eye-tracking systems, such as Tobii Pro Insight (2020) and Pupil Labs (Kassner et al., 2014), provide high-precision gaze tracking capabilities that allow researchers to analyse user attention in virtual environments. These tools have been widely adopted in VR and 360\u0026deg; video studies to measure gaze patterns, fixation durations, and attentional shifts. However, while these tools excel in tracking visual behaviour, they often lack seamless integration with spatial audio analysis, which plays a crucial role in immersive media experiences.\u003c/p\u003e \u003cp\u003eOn the other hand, spatial audio frameworks such as SPARTA and COMPASS (Politis et al., 2022), developed at Aalto University's Acoustics Lab, offer real-time spatial sound reproduction and visualization capabilities. These frameworks enable audio rendering with Ambisonics and directional audio techniques, enhancing immersion in virtual environments. Similarly, SoundSpaces 2.0 (Chen et al., 2022) provides geometry-based audio rendering, facilitating tasks such as source localization and spatial navigation within virtual spaces. While these spatial audio tools provide powerful features for auditory analysis, they primarily focus on audio processing and lack the necessary components to incorporate real-time visual attention tracking.\u003c/p\u003e \u003cp\u003eDespite the availability of standalone tools for both eye-tracking and spatial audio analysis, the challenge lies in their integration to form a comprehensive framework that can simultaneously analyse audio-visual attention dynamics. Existing tools either prioritize visual tracking without considering spatial sound influences or focus on sound reproduction without capturing user gaze behaviour in real time. Bridging this gap is crucial for understanding how spatial audio affects visual attention, particularly in 360\u0026deg; immersive environments.\u003c/p\u003e \u003cp\u003eThis work aims to address these limitations by introducing an open-source framework that integrates real-time spatial audio visualization with eye-tracking analytics, offering a holistic approach to studying attention behaviours in immersive content.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Addressing the Gaps\u003c/h2\u003e \u003cp\u003e. The reviewed literature highlights a significant gap in the integration of spatial audio with real-time visual attention analysis in immersive environments. While existing studies have demonstrated the role of audio cues in guiding visual attention, they predominantly rely on static or non-spatial audio conditions, post-experiment data analysis, retrospective evaluations, and offline processing methods. These approaches, though informative, fail to capture the dynamic and interactive nature of immersive experiences in real time. Moreover, current eye-tracking and spatial audio tools, despite their sophistication, function as standalone systems that limit their ability to provide a holistic understanding of audio-visual interactions in 360\u0026deg; content.\u003c/p\u003e \u003cp\u003eBridging this gap is essential, as the interplay between spatial audio and visual attention is fundamental to optimizing user experiences in virtual reality storytelling, training simulations, and interactive media. A deeper understanding of these interactions can lead to improved content placement strategies, enhanced user engagement, and more effective VR applications. To address these challenges, this study underscores the necessity of developing integrated, real-time analysis tools that combine spatial audio with eye-tracking data. Such solutions can provide a deeper, data-driven understanding of how users explore immersive content under different sound conditions, leading to more informed design choices and optimized content delivery strategies.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. TOOL DEMONSTRATION AND DATA ACQUISITION","content":"\u003cp\u003eThis study employed an open-source, real-time visualization tool to analyse audio-visual attention (AVA) in 360\u0026deg; videos. The dataset used in this study was previously explored in our earlier works (Hirway et al., 2020; 2022; 2024), which examined visual attention patterns under varying sound conditions using traditional methods such as saliency maps and post-hoc evaluations of head pose and gaze data. While these approaches provided valuable insights, they were limited in their ability to capture real-time, participant-specific interactions between auditory and visual stimuli.\u003c/p\u003e\n\u003cp\u003eTo overcome these challenges, the proposed tool offers an enhanced analytical capability, enabling a more dynamic and granular exploration of user attention patterns in response to spatial audio cues. The tool was applied to the dataset to provide deeper insights into how different audio conditions influence gaze behaviour and content engagement in immersive environments\u003c/p\u003e\n\u003cdiv id=\"Sec7\"\u003e\n \u003ch2\u003e3.1 Laboratory Design\u003c/h2\u003e\n \u003cp\u003eThe experiment followed ISO 8589:2007 standards (ISO, 2007) to create a controlled sensory environment, minimizing external distractions. Participants were seated on a motorized rotating chair with three degrees of freedom (Fig. \u003cspan\u003e1\u003c/span\u003e), allowing unrestricted head movements while viewing 360\u0026deg; videos. The hardware and software used in the experiment are summarized in Table \u003cspan\u003e1\u003c/span\u003e, ensuring high accuracy in data capture and seamless tool operation.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\"\u003e\n \u003ch2\u003e3.2 360\u0026deg; Video Stimuli\u003c/h2\u003e\n \u003cp\u003eThe study employed a curated set of 360\u0026deg; videos to evaluate AVA under four sound conditions: No Sound (NS), Stereo Sound (ST), First-Order Ambisonics (FO), and Third-Order Ambisonics (HO). These videos adhered to ITU-T P.910 standards (International Telecommunication Union, 2023) for resolution (4096 \u0026times; 2048 pixels), frame rate (29.970 FPS), and duration (60 seconds), ensuring consistency across all stimuli. The videos were categorized into two groups:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eIndoor Scenes: Centralized focus points, such as opera performances, with limited visual dispersion (Fig.\u0026nbsp;2a-e).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eOutdoor Scenes: Open environments with dispersed sound-emitting objects, such as clock towers or animal sounds (Fig.\u0026nbsp;2f-j).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eTo standardize clip duration and format, FFmpeg (FFmpeg.org. (2021) was used for video pre-processing. The videos were randomized into 5-minute sequences to minimize participant bias during exposure.\u003c/p\u003e\n \u003cp\u003eTable 1. Experiment Setup: Hardware and Software\u003c/p\u003e\n \u003cdiv\u003e\n \u003ctable id=\"Tab1\" border=\"1\"\u003e\u003c/table\u003e\n \u003c/div\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"592\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eComponent\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eDetails and Utility\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003ePC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eIntel Core\u0026trade; i5 \u0026ndash; 4590 CPU @ 3.30GHz, 10.0 GB RAM, 16GB nVidia GTX 970 Graphics Card, running Windows 10. Used to operate hardware and software for the immersive environment.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eHMD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eHTC Vive with Tobii Pro VR Integration (Tobii Pro, 2018). Enables participants to watch 360\u0026deg; videos.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eHeadphones\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eBeyerdynamic DT 990 Pro (Beyerdynamic, 2020). Used for listening to non-spatial and spatial audio.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e360\u0026deg; Player\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eGoPro VR Player (GoPro, 2020). Plays 360\u0026deg; videos on the HMD and records head orientation as yaw, pitch, and roll.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e360\u0026deg; Videos\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e\u0026nbsp;Provide the audio-visual stimuli for participants.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eE4 Wristband\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eEmpatica E4 (Empatica, 2023). Collects physiological data, including heart rate, for analysis.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003eFigure\u0026nbsp;2. Representative frames for videos in the Indoor (a-e) and Outdoor (f-j) categories\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec9\"\u003e\n \u003ch2\u003e3.3 Data Acquisition for Tool Evaluation\u003c/h2\u003e\n \u003cp\u003eThe data acquisition process focused on capturing participants\u0026rsquo; interactions with the immersive 360\u0026deg; environment, collecting four key metrics:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eHead Pose Data: Captured at 120 Hz as unit quaternions (yaw, pitch, roll), representing participants\u0026rsquo; head orientation within the spherical video space. These inputs formed the foundation for generating fixation maps (Fig.\u0026nbsp;3).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eGaze Data: Tracked at 120 Hz to identify regions where participants directed their attention. While collected, gaze data was not analysed in this study, but it remains a target for future integration to enhance gaze accuracy analysis.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003ePupil Diameter: Monitored in real-time to observe physiological responses during video playback. Although visualized by the tool, pupil diameter data was not quantitatively analysed in the current study.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eHeart Rate: Measured using the Empatica E4 wristband to capture physiological responses throughout video playback. This metric was collected but not integrated into the tool\u0026rsquo;s visualizations, offering potential for future studies.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eFigure\u0026nbsp;3. Yaw, Pitch and Roll Angles in Degrees\u003c/p\u003e\n \u003cp\u003eThe tool processed these inputs in real-time, dynamically generating fixation maps and sound intensity heatmaps. By integrating these metrics, the tool overcomes critical limitations of traditional methods, enabling participant-specific, dynamic analyses of audio-visual attention. This innovative approach provides a deeper understanding of how spatial audio influences exploration patterns in immersive environments.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"4. FUNCTIONALITY AND EVALUATION","content":"\u003cp\u003eThis section demonstrates the capabilities of the real-time visualization tool, focusing on its integration of multimodal metrics for analysing audio-visual attention (AVA) in 360\u0026deg; video environments. By combining fixation maps, sound intensity heatmaps, and quantitative indices, the tool addresses limitations of traditional static methods, offering richer, participant-specific insights into how auditory cues influence visual exploration.\u003c/p\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n \u003ch2\u003e4.1 Tool Overview\u003c/h2\u003e\n \u003cp\u003eThe custom web-based tool (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e) was developed using JavaScript libraries, including three.js (Cabello, 2024), omnitone.js (Google Creative Lab, 2024), and JSAmbisonics (Maddams, 2024). It dynamically processes head pose, sound intensity, and pupil diameter data to generate real-time visual outputs that enable participant-specific analysis.\u003c/p\u003e\n \u003cp\u003eThe tool offers several key features:\u003c/p\u003e\u003cspan\u003e\n \u003cp\u003e1. Current Field of View (FOV): The tool visualizes participants\u0026rsquo; real-time perspectives within the 360\u0026deg; environment by leveraging head pose data to approximate gaze direction. While head pose is less precise than direct eye-tracking, it serves as a reliable proxy for attention, especially during stabilized head movements. Future iterations may incorporate eye-tracking for enhanced precision.\u003c/p\u003e\n \u003c/span\u003e \u003cspan\u003e\n \u003cp\u003e2. Head Pose Scan Path: Tracks yaw and pitch movements over time, providing a visual representation of participants\u0026rsquo; exploration patterns within the 360\u0026deg; environment.\u003c/p\u003e\n \u003c/span\u003e \u003cspan\u003e\n \u003cp\u003e3. Sound Intensity Heatmap: Dynamically overlays spatial audio intensity onto the video, highlighting correlations between auditory cues and visual attention shifts. This real-time overlay distinguishes the tool from static methods, enabling researchers to evaluate how sound conditions influence participant exploration dynamically.\u003c/p\u003e\n \u003c/span\u003e\n \u003cp\u003eBy enabling real-time analysis, the tool overcomes critical limitations of static, post-hoc approaches and provides actionable insights into how auditory cues shape attention patterns in immersive environments.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n \u003ch2\u003e4.2 Fixation and Sound Intensity Heatmap Generation\u003c/h2\u003e\n \u003cp\u003eThe framework processes head pose data to identify fixations, defined as moments when head orientation remains stable within a 0.5-degree angular deviation for at least 200 milliseconds (Duchowski, 2007). These fixations are mapped onto the equirectangular projection of the video to create fixation maps, revealing areas of concentrated attention. Simultaneously, spatial audio signals are processed using JSAmbisonics (Politis \u0026amp; Poirier-Quinot, 2020) to compute real-time sound intensity across ambisonics channels. These intensity values are visualized as heatmaps overlaid onto the corresponding regions of the 360\u0026deg; video. Figures 5a\u0026ndash;5d and 6a\u0026ndash;6d illustrate fixation patterns and sound intensity overlays for representative indoor and outdoor videos across sound conditions.\u0026nbsp;\u003c/p\u003e\n \u003cp\u003eIn the No Sound (NS) condition (Figs.\u0026nbsp;5a, 6a), fixations were concentrated on central visual elements, indicating reliance on visual saliency in the absence of auditory cues. Peripheral regions remained largely unexplored. In the Stereo Sound (ST) condition (Figs.\u0026nbsp;5b, 6b), fixations showed slight deviations from the NS condition, reflecting minimal shifts in attention. In the First-Order Ambisonics (FO) condition (Figs.\u0026nbsp;5c, 6c), spatial audio successfully directed attention toward peripheral regions, expanding fixation patterns beyond central elements. The Third-Order Ambisonics (HO) condition (Figs.\u0026nbsp;5d, 6d) produced the most dispersed fixation patterns, with participants dynamically exploring the entire 360\u0026deg; environment. Peripheral regions received significant attention, highlighting HO\u0026rsquo;s superior ability to enhance exploration. This combined use of fixation maps and sound intensity heatmaps enables researchers to qualitatively evaluate how auditory cues influence visual exploration and provides a comprehensive understanding of user behaviour.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003e4.3 Visual Attention Analysis Using Jaccard Index\u003c/h2\u003e\n \u003cp\u003eTo complement the qualitative findings, the Jaccard Index (Jaccard, 1901) was employed to quantify the overlap of visual attention patterns across different sound conditions (NS, ST, FO, and HO). The Jaccard Index provides valuable insights into how spatial audio influences gaze behaviour by measuring the similarity between the top 5% of fixations under each sound condition.\u003c/p\u003e\n \u003cp\u003eThe choice of the Jaccard Index in this study is motivated by several factors. First, its focus on overlap allows for a direct assessment of the extent to which spatial audio manipulations influence visual exploration, aligning well with the study\u0026rsquo;s core research question. Second, its binary comparison approach, which treats fixation locations as either present or absent, simplifies the analysis and reduces computational complexity compared to distance-based metrics. Finally, the interpretability of the Jaccard Index, with values ranging from 0 (no overlap) to 1 (complete overlap), makes it an intuitive and effective metric for assessing variations in gaze distribution across conditions.\u003c/p\u003e\n \u003cp\u003eThe results from the Jaccard Index analysis provide compelling evidence of how spatial audio complexity influences attention patterns. A high Jaccard Index value suggests that gaze patterns remain consistent across sound conditions, implying a limited effect of audio on visual attention. Conversely, a low Jaccard Index value indicates significant divergence in attention, suggesting that spatial audio cues effectively drive exploration and engagement with different parts of the scene. Notably, the analysis revealed that FO and HO conditions consistently encouraged participants to explore peripheral areas, leading to more dynamic and dispersed attention patterns compared to NS and ST conditions. These findings support the argument that higher-order spatial audio enhances spatial awareness and encourages broader visual engagement.\u003c/p\u003e\n \u003cp\u003eBy integrating qualitative visualizations with quantitative metrics, the framework offers a comprehensive means of analysing audio-visual attention in both indoor and outdoor environments. The use of the Jaccard Index in this context not only reinforces previous observations but also provides a measurable and reproducible way to assess the impact of spatial audio on gaze behaviour. This discussion highlights the importance of selecting appropriate analytical methods to gain a deeper understanding of user interactions within immersive media\u003c/p\u003e\n\u003c/div\u003e"},{"header":"5. RESULTS","content":"\u003cp\u003eThis section demonstrates the real-time visualization framework\u0026rsquo;s ability to analyse audio-visual attention patterns across different sound conditions in 360\u0026deg; videos. By leveraging examples from both indoor and outdoor environments, the framework effectively captures and quantifies attention shifts, offering valuable insights into how auditory cues influence user behaviour in immersive settings.\u003c/p\u003e \u003cp\u003eThrough the processing of head pose data and spatial audio cues, the framework provides a dynamic representation of attention distribution across varying auditory conditions, from silence (NS) to third-order ambisonics (HO). The integration of fixation maps with sound intensity heatmaps facilitates an intuitive understanding of how user attention adapts in response to spatial audio stimuli. The examples presented illustrate the framework\u0026rsquo;s potential to advance immersive media research by enabling a deeper exploration of the interplay between audio and visual stimuli in real-time scenarios.\u003c/p\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e5.1 Indoor Videos\u003c/h2\u003e \u003cp\u003eIndoor environments, often characterized by static and centralized focal points, provide an ideal scenario for observing constrained attention patterns. These settings highlight the tool\u0026rsquo;s capacity to reveal how attention evolves in response to increasing auditory complexity. For example, Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e7\u003c/span\u003e depicts the top 5% most-viewed areas in an indoor video across sound conditions. Under NS and ST, attention remains concentrated on central visual elements, suggesting a reliance on visual saliency alone. However, with FO and HO, the tool visualizes a noticeable broadening of focus, as participants begin to explore peripheral areas influenced by spatial audio.\u003c/p\u003e \u003cp\u003eQuantitative analysis using the Jaccard Index, presented in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, further underscores this shift. The high overlap values between NS and ST (e.g., 0.98) reflect minimal auditory impact, while the lower overlaps observed for FO and HO (e.g., 0.58 and 0.49, respectively) highlight the diversifying effect of spatial audio on attention distribution. These trends are visually reinforced in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e8\u003c/span\u003e, which compares overlap regions and showcases the tool\u0026rsquo;s ability to quantify shifts in exploration. These findings demonstrate how the tool captures not only the broadening effect of spatial audio but also the persistence of centralized attention in constrained environments. Such insights pave the way for future investigations into how spatial constraints interact with auditory cues to shape attention patterns.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eJaccard indices for Indoor and Outdoor videos across the four sound conditions\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"12\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c12\" colnum=\"12\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIndoor\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSound\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNS\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eST\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eFO\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eHO\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eOutdoor\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eSound\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eNS\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eST\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eFO\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c12\"\u003e \u003cp\u003eHO\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003e1\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eNS\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003e1\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eNS\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eST\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eST\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eFO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eFO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eHO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eHO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003e3\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eNS\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003e2\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eNS\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eST\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eST\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eFO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eFO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eHO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eHO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003e5\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eNS\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003e4\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eNS\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eST\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eST\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eFO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.34\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eFO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eHO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eHO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003e6\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eNS\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003e5\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eNS\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eST\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eST\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eFO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eFO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eHO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.41\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eHO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003e7\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eNS\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003e6\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eNS\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eST\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eST\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eFO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eFO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c12\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eHO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.42\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003eHO\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e0.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e0.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e5.2 Outdoor Videos\u003c/h2\u003e \u003cp\u003eIn contrast to indoor environments, outdoor settings introduce dynamic and dispersed visual and auditory elements, providing a more complex scenario for attention analysis. These environments emphasize the tool's versatility in capturing broader and more exploratory attention patterns. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e9\u003c/span\u003e, the top 5% most-viewed areas in an outdoor video reveal an increasing dispersion of attention as sound complexity rises. Under NS and ST, participants primarily focus on prominent visual elements, while FO and HO guide attention toward peripheral and less salient regions. This effect is particularly pronounced in expansive outdoor scenes, where spatial audio encourages dynamic exploration.\u003c/p\u003e \u003cp\u003eQuantitative data in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e aligns with these observations, with Jaccard Index values for FO and HO showing significantly reduced overlaps compared to NS and ST. Figure\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e10\u003c/span\u003e further illustrates this divergence, visualizing the overlap regions and emphasizing how spatial audio facilitates broader engagement with the 360\u0026deg; environment. These examples highlight the tool\u0026rsquo;s ability to analyse the complex interplay of visual and auditory stimuli in diverse contexts.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e5.3 Broader Implications\u003c/h2\u003e \u003cp\u003eThe combined analysis of fixation maps, sound intensity overlays, and quantitative indices such as the Jaccard Index underscores the tool\u0026rsquo;s comprehensive approach to studying audio-visual interactions. While indoor environments demonstrate the impact of auditory cues within constrained spaces, outdoor examples showcase the tool\u0026rsquo;s adaptability to dynamic and spatially complex settings.\u003c/p\u003e \u003cp\u003eFigures \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e7\u003c/span\u003e through \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e10\u003c/span\u003e and Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e collectively illustrate how the tool bridges qualitative visualization with quantitative assessment. By enabling researchers to observe and measure attention shifts in real-time, the tool not only addresses limitations of static methods but also provides a foundation for exploring a wide range of research questions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e5.4 Establishing a Framework\u003c/h2\u003e \u003cp\u003eBeyond its immediate applications, the tool establishes a framework for understanding audio-visual attention in immersive media. By integrating real-time visualization and quantitative analysis, it opens new avenues for investigating questions such as:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eHow do varying auditory complexities influence attention across different content types?\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eWhat is the role of spatial audio in fostering engagement in interactive virtual environments?\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eHow can the findings inform design strategies for VR content creators and immersive storytellers?\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThese questions highlight the tool\u0026rsquo;s broader relevance, emphasizing its potential to advance research in immersive media and related fields.\u003c/p\u003e \u003c/div\u003e"},{"header":"6. KEY INFLUENCES ON VISUAL ATTENTION IN 360° VIDEO ENVIRONMENTS","content":"\u003cp\u003eThe analysis of visual attention patterns in 360\u0026deg; video environments reveals significant variations based on spatial complexity, sound localization, exploration potential, sound-scene congruence, and environmental immersion. Indoor and outdoor environments exhibit distinct attention patterns, with notable reductions in attention overlap observed under higher spatial audio conditions.\u003c/p\u003e \u003cp\u003eBelow is a summary of the Jaccard Index percentage reductions across different sound conditions (NS to HO, ST to HO, FO to HO) for videos in both indoor and outdoor environments :\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePercentage Reduction in Jaccard Index Values Across Sound Conditions for Indoor and Outdoor Videos\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCategory\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eVideo Number\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNS to HO (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eST to HO (%)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eFO to HO (%)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eIndoor\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e1\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eIndoor\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e3\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e90\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eIndoor\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e5\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e37\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e38\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eIndoor\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e6\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eIndoor\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e7\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e23\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eOutdoor\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e1\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e21\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eOutdoor\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e2\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e66\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e65\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eOutdoor\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e4\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e-78\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eOutdoor\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e5\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e-4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eOutdoor\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e32\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThis table presents the reductions in attention overlap (Jaccard Index values) across different transitions, illustrating how spatial audio affects exploration behaviour in different environments. These values serve as a reference throughout the following discussion, providing quantitative support for the observed attention patterns and comparisons across sound conditions.\u003c/p\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003e6.1 Spatial Complexity and Visual Attention\u003c/h2\u003e \u003cp\u003eSpatial complexity influences attention distribution, with indoor environments exhibiting more centralized focus compared to outdoor scenes. In Video 1 (indoor, two actors on stage), the transition from NS to HO resulted in an 81% reduction in Jaccard Index values, indicating a significant shift in visual attention. Similar patterns were observed in Video 5 (indoor, multiple actors moving across the stage), where a 27% reduction was noted, suggesting that increased scene complexity encourages broader exploration under spatial audio conditions.\u003c/p\u003e \u003cp\u003eIn contrast, outdoor environments such as Video 2 (hilltop with moving motorbike and dogs) exhibited a 66% reduction in Jaccard Index values from NS to HO, demonstrating that dynamic elements significantly enhance attention dispersion. Static outdoor environments, such as Video 6 (two actors sitting below a monument), resulted in a lower reduction of 51%, indicating that when visual elements are fixed, attention patterns remain more concentrated despite spatial audio enhancements.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e6.2 Sound Localization and Visual Focus\u003c/h2\u003e \u003cp\u003e Sound localization significantly influenced attention shifts, particularly in dynamic outdoor environments. In Video 5 (outdoor, market square with a hidden musician), the Jaccard Index values decreased by 27% from NS to HO, suggesting that spatial audio effectively guided participants' gaze toward the off-screen sound source. Similarly, in Video 2, with moving sound sources, a reduction of 66% further supports the role of higher-order ambisonics in enhancing sound-driven attention shifts.\u003c/p\u003e \u003cp\u003eIndoor environments, where sound sources were more congruent with visual elements, showed relatively smaller reductions. In Video 1, with performers fixed on stage, the transition from NS to HO led to a decrease of 81%, indicating that sound localization was less influential compared to outdoor scenarios with moving elements.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e6.3 Exploration Potential\u003c/h2\u003e \u003cp\u003eExploration potential varies based on scene openness and auditory complexity. Indoor environments with minimal visual complexity, such as Video 6, exhibited a 51% reduction in Jaccard overlap from NS to HO, reflecting limited exploratory behaviour due to the confined nature of the scene. In contrast, outdoor environments such as Video 4 (people and birds near water) showed a 36% reduction, suggesting moderate exploratory tendencies under spatial audio conditions.\u003c/p\u003e \u003cp\u003eNotably, in Video 2, exploration potential was at its highest, with a 66% reduction, reflecting how spatial audio encourages users to engage with dynamic soundscapes. However, cases like Video 4, with a negative reduction of -77.8% from FO to HO, indicate that user behaviour may not always align with spatial audio cues in more uniform environments.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec23\" class=\"Section2\"\u003e \u003ch2\u003e6.4 Sound-Scene Congruence\u003c/h2\u003e \u003cp\u003eThe congruence between auditory and visual elements also impacts attention patterns. In Video 1 (indoor, two actors on stage), high congruence between sound and visuals led to an 81% reduction in Jaccard Index, reinforcing the idea that sound-scene alignment limits the influence of spatial audio. Conversely, in Video 5 (outdoor, hidden musician), a 27% reduction suggests that incongruent audio-visual elements encourage users to explore the environment more actively.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003e6.5 Environmental Immersion and Presence\u003c/h2\u003e \u003cp\u003eEnvironmental immersion was more pronounced in outdoor settings with dynamic sound sources. In Video 2, the significant reduction of 66% from NS to HO implies that the presence of dynamic auditory cues enhances the sense of presence, encouraging broader exploration of the environment. In contrast, the smaller reduction observed in indoor environments, such as Video 6 (single actor close to the camera), highlights how confined spaces restrict the immersive potential of spatial audio.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec25\" class=\"Section2\"\u003e \u003ch2\u003e6.6 Summary of Key Findings\u003c/h2\u003e \u003cp\u003eThe analysis of visual attention patterns across different video environments reveals key insights into the role of spatial audio:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eSpatial Complexity: Higher scene complexity correlates with greater attention dispersion, with up to 81% reduction in indoor settings and 66% in outdoor scenes, demonstrating the impact of dynamic elements.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSound Localization: Dynamic environments, such as Video 5 (market square) and Video 2 (hilltop with motorbike), showed substantial reductions in attention overlap, reinforcing the effectiveness of spatial audio in guiding attention.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eExploration Potential: Outdoor environments encourage broader exploration, with spatial audio leading to significant reductions in Jaccard Index values.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSound-Scene Congruence: High congruence in indoor settings results in limited attention dispersion, while lower congruence encourages exploration in outdoor scenarios.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eEnvironmental Immersion: Outdoor environments exhibit stronger immersion effects, as indicated by larger reductions in attention overlap under spatial audio conditions.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThese findings provide actionable insights for VR content creators, highlighting the importance of tailoring spatial audio strategies to different environments to optimize user engagement and immersion.\u003c/p\u003e \u003c/div\u003e"},{"header":"7. DISCUSSION","content":"\u003cp\u003eThis study highlights the critical role of auditory cues in shaping visual attention within 360\u0026deg; video environments. By integrating real-time visualization of sound intensity with head pose and fixation data, it provides novel insights into how spatial audio conditions influence user behaviour dynamically. Unlike traditional approaches that rely on static saliency maps and post-hoc analyses, the real-time visualization tool used in this study captures attention shifts instantaneously, allowing for a deeper understanding of how sound conditions interact with spatial complexity to guide exploration. The findings demonstrate that increasing auditory complexity leads to more dispersed visual attention patterns, with notable differences between indoor and outdoor environments.\u003c/p\u003e \u003cdiv id=\"Sec27\" class=\"Section2\"\u003e \u003ch2\u003e7.1 Impact of Sound Complexity on Attention Patterns\u003c/h2\u003e \u003cp\u003e The results demonstrate that as the complexity of sound conditions increased\u0026mdash;from no sound (NS) to stereo sound (ST), first-order ambisonics (FO), and third-order ambisonics (HO)\u0026mdash;participants exhibited increasingly diverse attention patterns. Under NS conditions, attention remained concentrated on central visual elements, particularly in indoor videos such as Video 3, where the stage and performer were the dominant visual elements. The Jaccard Index values for indoor settings under NS remained high, indicating minimal exploration beyond these focal points.\u003c/p\u003e \u003cp\u003eAs spatial audio complexity increased, a shift in attention distribution was observed. In Video 5 (indoor, multiple actors moving across the stage), the transition from NS to HO resulted in a 49% reduction in attention overlap, highlighting how spatial audio facilitated exploration beyond the initial fixation zones. Similarly, in outdoor environments, such as Video 2 (hilltop scene with a moving motorbike and dogs), the Jaccard Index values decreased by 66% from NS to HO, suggesting that dynamic sound elements prompted participants to explore the scene more thoroughly.\u003c/p\u003e \u003cp\u003eNotably, the shift from FO to HO in some videos, such as Video 6 (indoor, single actor close to the camera), resulted in only a 15% reduction, indicating that in scenarios with limited spatial complexity, third-order ambisonics offered only marginal benefits in broadening attention distribution.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec28\" class=\"Section2\"\u003e \u003ch2\u003e7.2 Third-Order Ambisonics: Enabling Broader and Dynamic Attention Shifts\u003c/h2\u003e \u003cp\u003eThird-order ambisonics (HO) emerged as the most effective sound condition for facilitating diverse and dynamic attention shifts, particularly in outdoor environments. The Jaccard Index analysis revealed significant differences between FO and HO conditions, with HO consistently resulting in broader exploration. In Video 5 (market square with a hidden musician), the transition from FO to HO resulted in a 4% reduction in attention overlap, highlighting how the increased spatial resolution of HO helped guide attention toward previously overlooked sound sources.\u003c/p\u003e \u003cp\u003eIn dynamic outdoor settings such as Video 2, where sound sources moved across a large area, HO contributed to an 8% reduction in overlap compared to FO, demonstrating its effectiveness in expanding the scope of exploration. These findings underscore the role of third-order ambisonics in enhancing auditory spatialization, leading to a more distributed and dynamic visual attention pattern, particularly in environments with high spatial complexity.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec29\" class=\"Section2\"\u003e \u003ch2\u003e7.3 Comparing Indoor and Outdoor Environments\u003c/h2\u003e \u003cp\u003eA key finding of this study is the distinct attention dynamics observed between indoor and outdoor environments, driven by differences in spatial complexity and environmental openness. Indoor settings, such as Video 1 (two actors singing on stage), exhibited relatively centralized attention patterns, with a significant 81% reduction in attention overlap from NS to HO, indicating that spatial audio encouraged only moderate exploration within the confined environment. This suggests that spatial constraints, such as walls and limited movement, restrict the potential impact of spatial audio.\u003c/p\u003e \u003cp\u003eConversely, outdoor environments, such as Video 4 (people and birds near a water body), demonstrated a broader dispersion of attention under spatial audio conditions, with a 36% reduction in Jaccard Index values from NS to HO. The presence of open spaces and widely distributed sound sources contributed to more exploratory behaviour. Similarly, in Video 5, the presence of hidden sound sources resulted in a 27% reduction, highlighting the role of spatial audio in guiding attention beyond immediate visual cues.\u003c/p\u003e \u003cp\u003eThese findings suggest that spatial audio plays a more significant role in expanding attention in outdoor environments, where users are less constrained by physical boundaries and are more likely to engage with peripheral elements.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec30\" class=\"Section2\"\u003e \u003ch2\u003e7.4 Addressing Limitations of Traditional Approaches\u003c/h2\u003e \u003cp\u003eTraditional approaches, such as static saliency maps and post-hoc analyses, often fail to capture the dynamic interplay between auditory and visual stimuli in immersive environments. The real-time visualization tool developed in this study provides a more comprehensive analysis by integrating head pose data, fixation maps, and sound intensity overlays to track attention patterns as they evolve.\u003c/p\u003e \u003cp\u003eFor example, in Video 6 (indoor, single performer close to the camera), where the spatial complexity was minimal, static methods might suggest uniform attention distribution across conditions. However, the real-time tool revealed nuanced shifts in attention, with a 15% reduction in overlap from FO to HO, showing how third-order ambisonics influenced subtle shifts in user focus. Similarly, in outdoor scenarios like Video 2, the tool effectively captured significant reductions in attention overlap (66%) as sound sources moved dynamically across the scene.\u003c/p\u003e \u003cp\u003eThese insights highlight the importance of real-time analysis in understanding the full impact of spatial audio, offering both qualitative and quantitative perspectives that traditional methods often overlook.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec31\" class=\"Section2\"\u003e \u003ch2\u003e7.5 Key Contributions\u003c/h2\u003e \u003cp\u003eThis study makes several important contributions to the field of immersive media research. First, it demonstrates how increasing sound complexity, particularly through third-order ambisonics, can broaden and diversify attention patterns in 360\u0026deg; environments. The analysis revealed that in dynamic outdoor environments such as Video 2, spatial audio contributed to a 66% reduction in attention overlap, fostering greater exploration and engagement.\u003c/p\u003e \u003cp\u003eSecond, it highlights the interplay between spatial complexity and auditory cues, revealing how environmental context\u0026mdash;whether indoor or outdoor\u0026mdash;affects exploration patterns. The findings from Video 1 (indoor, confined environment with high congruence) and Video 5 (outdoor, market square with hidden sound sources) demonstrate the varying influence of spatial audio in different settings.\u003c/p\u003e \u003cp\u003eFinally, the study introduces a novel real-time visualization tool that integrates qualitative visualizations with quantitative metrics such as the Jaccard Index. By addressing limitations in traditional methods, this tool provides a robust framework for analysing dynamic attention patterns, offering valuable insights for designing more engaging and user-driven content in immersive environments.\u003c/p\u003e \u003c/div\u003e"},{"header":"8. CONCLUSION \u0026 FUTURE WORK","content":"\u003cp\u003eThis study introduces an open-source, real-time visualization tool designed to analyse the influence of auditory cues, particularly spatial audio, on visual attention in 360\u0026deg; video environments. By dynamically integrating fixation maps and sound intensity heatmaps, the tool provides not only real-time visualizations but also robust quantitative insights, such as tracking attention shifts through overlap reduction metrics. The results demonstrate that as auditory conditions become more complex\u0026mdash;from No Sound (NS) to Stereo Sound (ST), First-Order Ambisonics (FO), and Third-Order Ambisonics (HO)\u0026mdash;participants exhibit increasingly diverse and dynamic attention patterns. In outdoor environments, such as Video 2, attention overlap was reduced by 66%, indicating a significant increase in exploration under spatial audio conditions. Conversely, indoor environments, such as Video 1, exhibited an 81% reduction, suggesting that spatial audio\u0026rsquo;s influence is constrained by physical boundaries and scene structure.\u003c/p\u003e \u003cp\u003eHO emerged as the most effective auditory condition for guiding attention, fostering broader exploration. In outdoor environments characterized by spatially complex stimuli, spatial audio contributed to a more significant reduction in attention overlap compared to indoor environments, where the effect of spatial audio was more constrained due to physical limitations. These findings underscore spatial audio\u0026rsquo;s dual role in enhancing immersion and serving as a mechanism to direct attention toward key elements within a scene.\u003c/p\u003e \u003cp\u003eThe tool overcomes critical limitations in traditional methodologies, such as static saliency maps and retrospective analysis, by offering real-time participant-specific insights that capture attention shifts dynamically across varying soundscapes. This capability has practical implications for improving narrative coherence, engagement, and content design in VR experiences. Understanding how attention dispersion varies across environments provides actionable insights for optimizing VR content layout, narrative pacing, and user engagement strategies.\u003c/p\u003e \u003cp\u003eBy establishing a robust framework for analysing audio-visual interactions through real-time metrics and dynamic visualizations, this study provides content creators and researchers with an adaptable tool for optimizing user experiences in immersive environments. These contributions lay the groundwork for advancing immersive media research and content creation. Building on the insights provided by this study, future research could explore several promising directions to deepen our understanding of the dynamics of audio-visual attention in immersive environments. The system could support additional metrics such as gaze data, pupil diameter, and heart rate. The range of applications could also be extended to incorporate domains such as education, training, storytelling etc.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eFUNDING\u003c/p\u003e\n\u003cp\u003eThis research is supported by Science Foundation Ireland through the ADAPT Centre and the European Regional Development Fund under Grant 13/RC/2106_P2. Additionally, it is funded by the Horizon Europe Framework Programme (HORIZON) under Grant Agreement 101070109 (TRANSMIXR) https://transmixr.eu. No additional external funding was received for this stud\u003c/p\u003e\n\u003cp\u003eSUPPLEMENTARY MATERIALS\u003c/p\u003e\n\u003cp\u003eTo ensure transparency, facilitate replication, and support further research, all supplementary materials from this study are publicly available. These resources provide comprehensive support for the qualitative and quantitative analyses presented and enable deeper exploration of the interplay between auditory cues and visual attention.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eETHICS APPROVAL AND CONSENT TO PARTICIPATE\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAll participants provided informed consent prior to their involvement in the study. The research was conducted in accordance with ethical guidelines and was approved by the university\u0026rsquo;s Research Ethics Committee. \u0026nbsp;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAlmquist, E., \u0026amp; Pasquero, J. (2019). Audio-visual attention in immersive environments: Understanding the role of spatial sound. Journal of VR Interaction Research, 5, 135-150.\u003c/li\u003e\n\u003cli\u003eAlmquist, M., \u0026amp; Almquist, V. (2018). Analysis of 360\u0026deg; video viewing behaviours. Dissertation.\u003c/li\u003e\n\u003cli\u003eBeyerdynamic. (2020). Beyerdynamic DT990 Pro. Retrieved March 27, 2023, from [https://europe.beyerdynamic.com/dt-990-pro.html](https://europe.beyerdynamic.com/dt-990-pro.html)\u003c/li\u003e\n\u003cli\u003eCalamia, P. T., Murphy, D. A., \u0026amp; Wakefield, G. (2019). Influence of ambisonic spatialization on head movements in immersive 360\u0026deg; audio-visual experiences. Applied Acoustics, 150, 194\u0026ndash;202.\u003c/li\u003e\n\u003cli\u003eCabello, R. (n.d.). three.js - JavaScript 3D library. Retrieved September 15, 2024, from [https://threejs.org/](https://threejs.org/) \u003c/li\u003e\n\u003cli\u003eChen, C., Zhang, Z., Wu, Y., \u0026amp; Lee, J. (2022). SoundSpaces 2.0: Geometry-Aware Audio Rendering for 3D Environments [arXiv preprint]. Retrieved from [arxiv.org](https://arxiv.org/abs/2206.08312)\u003c/li\u003e\n\u003cli\u003eDavid, E., Gutierrez, J., Coutrot, A., Perreira da Silva, M., \u0026amp; Le Callet, P. (2018). A Dataset of Head and Eye Movements for 360\u0026deg; Videos. Proceedings of the 9th ACM Multimedia Systems Conference, 432\u0026ndash;437. Retrieved from [https://hal.science/hal-01994923](https://hal.science/hal-01994923)\u003c/li\u003e\n\u003cli\u003eDuchowski, A. T. (2007). Eye Tracking Methodology: Theory and Practice (2nd ed.). Springer-Verlag. doi: 10.1007/978-1-84628-609-4\u003c/li\u003e\n\u003cli\u003eEmpatica. (n.d.). E4 wristband support page. Retrieved March 27, 2023, from [https://support.empatica.com/hc/en-us/categories/200023126-E4-wristband](https://support.empatica.com/hc/en-us/categories/200023126-E4-wristband)\u003c/li\u003e\n\u003cli\u003eFarina, A. (2020). Index of /Public. Retrieved March 7, 2021, from [http://www.angelofarina.it/Public/](http://www.angelofarina.it/Public/) \u003c/li\u003e\n\u003cli\u003eFFmpeg.org. (2021). FFmpeg. Retrieved March 27, 2023, from [https://ffmpeg.org/](https://ffmpeg.org/)\u003c/li\u003e\n\u003cli\u003eGoogle Creative Lab. (n.d.). Omnitone: Spatial audio on the web. Retrieved September 15, 2024, from [https://google.github.io/omnitone/](https://google.github.io/omnitone/)\u003c/li\u003e\n\u003cli\u003eHirway, A., Qiao, Y., \u0026amp; Murray, N. (2020). A QoE and Visual Attention Evaluation on the Influence of Spatial Audio in 360 Videos. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), pp. 345-350. doi: 10.1109/AIVR50618.2020.00071\u003c/li\u003e\n\u003cli\u003eHirway, A., Qiao, Y., \u0026amp; Murray, N. (2022). Spatial audio in 360\u0026deg; videos: does it influence visual attention? In Proceedings of the 13th ACM Multimedia Systems Conference (MMSys \u0026rsquo;22), pp. 39\u0026ndash;51. Association for Computing Machinery. doi: 10.1145/3524273.3528179\u003c/li\u003e\n\u003cli\u003eHirway, A., Qiao, Y., \u0026amp; Murray, N. (2024). A Quality of Experience and Visual Attention Evaluation for 360\u0026deg; Videos with Non-spatial and Spatial Audio. ACM Transactions on Multimedia Computing, Communications, and Applications, 20(9), Article 271. doi: 10.1145/3650208\u003c/li\u003e\n\u003cli\u003eInternational Telecommunication Union. (2023). ITU-T P.910: Subjective video quality assessment methods for multimedia applications. Retrieved March 27, 2023, from [https://www.itu.int/rec/T-REC-P.910-202310-I/en](https://www.itu.int/rec/T-REC-P.910-202310-I/en)\u003c/li\u003e\n\u003cli\u003eISO. (2007). ISO 8589:2007 Sensory analysis \u0026mdash; General guidance for the design of test rooms. International Standards Organization. Retrieved September 15, 2024, from [https://www.iso.org/obp/ui/#iso:std:iso:8589:ed-2:v1:en](https://www.iso.org/obp/ui/#iso:std:iso:8589:ed-2:v1:en)\u003c/li\u003e\n\u003cli\u003eJaccard, P. (1901). Nouvelles recherches sur la distribution florale. Bulletin de la Soci\u0026eacute;t\u0026eacute; Vaudoise des Sciences Naturelles, 37, 241-270. \u003c/li\u003e\n\u003cli\u003eKassner, M., Patera, W., \u0026amp; Bulling, A. (2014). Pupil: An open source platform for pervasive eye tracking and mobile gaze-based interaction. In Proc. ACM MobiSys Workshop on Mobile and Pervasive Eye Tracking, pp. XX\u0026ndash;XX. doi: 10.1145/2611009.2611013\u003c/li\u003e\n\u003cli\u003eLo, W.-C., Fan, C.-L., Lee, J., Huang, C.-Y., Chen, K.-T., \u0026amp; Hsu, C.-H. (2017). 360\u0026deg; video viewing dataset in head-mounted virtual reality. In Proc. 8th ACM Multimedia Systems Conf. (MMSys \u0026rsquo;17), pp. 211\u0026ndash;216. doi: 10.1145/3083187.3083219\u003c/li\u003e\n\u003cli\u003eMaddams, J. (n.d.). Ambisonics.js - B-format ambisonic decoder for the web. Retrieved September 15, 2024, from [https://github.com/jmaddams/Ambisonics.js](https://github.com/jmaddams/Ambisonics.js)\u003c/li\u003e\n\u003cli\u003eMarighetto, P., Coutrot, A., Riche, N., Guyader, N., Mancas, M., Gosselin, B., \u0026amp; Laganiere, R. (2017). Audio-visual attention: Eye-tracking dataset and analysis toolbox. In Proc. IEEE Int. Conf. Image Processing (ICIP 2017), pp. 1802\u0026ndash;1806. doi: 10.1109/ICIP.2017.8296592\u003c/li\u003e\n\u003cli\u003eMin, J., \u0026amp; Hou, Y. (2021). Audio-visual saliency in omnidirectional videos: a review. IEEE Transactions on Multimedia, 23(5), 1902\u0026ndash;1915. \u003c/li\u003e\n\u003cli\u003eMin, X., Zhai, G., Gao, Z., Hu, C., \u0026amp; Yang, X. (2014). Sound influences visual attention discriminately in videos. In Proc. 6th Int. Workshop on Quality of Multimedia Experience (QoMEX 2014), pp. 153\u0026ndash;158. doi: 10.1109/QoMEX.2014.6982312\u003c/li\u003e\n\u003cli\u003ePolitis, A., \u0026amp; Poirier-Quinot, D. (2020). JSAmbisonics: JavaScript library for first-order and higher-order ambisonic processing. Retrieved September 15, 2024, from [https://github.com/polarch/JSAmbisonics](https://github.com/polarch/JSAmbisonics)\u003c/li\u003e\n\u003cli\u003ePolitis, A. et al. (2022). SPARTA \u0026amp; COMPASS Suite for Spatial Audio. Aalto University Acoustics Lab. Retrieved from [aaltodoc.aalto.fi](https://aaltodoc.aalto.fi/items/72a60211-d51a-4404-b90d-096ae3970b97)\u003c/li\u003e\n\u003cli\u003ePrivitera, A. G., Fontana, F., \u0026amp; Geronazzo, M. (2024). The Role of Audio in Immersive Storytelling: a Systematic Review in Cultural Heritage. Multimedia Tools and Applications. Advance online publication. doi: 10.1007/s11042-024-19288-4\u003c/li\u003e\n\u003cli\u003eTobii Pro. (2018). Tobii Pro VR Integration \u0026ndash; based on HTC Vive Development Kit Description. Retrieved March 27, 2023, from [https://www.tobiipro.com/siteassets/tobii-pro/product-descriptions/tobii-pro-vr-integration-product-description.pdf/?v=1.7](https://www.tobiipro.com/siteassets/tobii-pro/product-descriptions/tobii-pro-vr-integration-product-description.pdf/?v=1.7)\u003c/li\u003e\n\u003cli\u003eTobii Pro Insight. (2020). Tobii Pro eye-tracking technology for research. Tobii Pro. Retrieved from [https://www.tobii.com](https://www.tobii.com)\u003c/li\u003e\n\u003cli\u003eVilkamo, S., Backman, J., \u0026amp; Pulkki, V. (2019). Binaural cue coding and rendering toolbox. Retrieved from [https://github.com/savil/binaural-cue-coding](https://github.com/savil/binaural-cue-coding)\u003c/li\u003e\n\u003cli\u003eWu, C., Tan, Z., Wang, Z., \u0026amp; Yang, S. (2017). A dataset for exploring user behaviors in VR spherical video streaming. In Proc. 8th ACM Multimedia Systems Conf. (MMSys \u0026rsquo;17), New York, NY, USA, pp. 193\u0026ndash;198. doi: 10.1145/3083187.3083210\u003c/li\u003e\n\u003cli\u003eXu, Y., Du, J., Wang, J., Ning, Y., Zhou, S., \u0026amp; Cao, Y. (2024). Panonut360: A Head and Eye Tracking Dataset for Panoramic Video. arXiv preprint arXiv:2403.17708. Retrieved from [https://arxiv.org/abs/2403.17708](https://arxiv.org/abs/2403.17708)\u003c/li\u003e\n\u003cli\u003eBeyerdynamic. (2020). Beyerdynamic DT990 Pro. Retrieved March 27, 2023, from [https://europe.beyerdynamic.com/dt-990-pro.html](https://europe.beyerdynamic.com/dt-990-pro.html)\u003c/li\u003e\n\u003cli\u003eCabello, R. (n.d.). three.js - JavaScript 3D library. Retrieved September 15, 2024, from [https://threejs.org/](https://threejs.org/) \u003c/li\u003e\n\u003cli\u003eChen, C., Zhang, Z., Wu, Y., \u0026amp; Lee, J. (2022). SoundSpaces 2.0: Geometry-Aware Audio Rendering for 3D Environments [arXiv preprint]. Retrieved from [arxiv.org](https://arxiv.org/abs/2206.08312)\u003c/li\u003e\n\u003cli\u003eDavid, E., Gutierrez, J., Coutrot, A., Perreira da Silva, M., \u0026amp; Le Callet, P. (2018). A Dataset of Head and Eye Movements for 360\u0026deg; Videos. Proceedings of the 9th ACM Multimedia Systems Conference, 432\u0026ndash;437. Retrieved from [https://hal.science/hal-01994923](https://hal.science/hal-01994923) \u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"360° video, Spatial Audio, Ambisonics, Audio-Visual attention, Real-Time Visualization, Fixation Maps, Heat Maps, Open-Source","lastPublishedDoi":"10.21203/rs.3.rs-5924870/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5924870/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis study presents an open-source, real-time visualization tool designed to analyse audio-visual attention in 360\u0026deg; video environments under varying sound conditions. Traditional methods, such as static saliency maps and post-hoc analyses, often fail to capture the dynamic and participant-specific nature of attention shifts in immersive environments. To address these limitations, the proposed tool dynamically integrates head pose fixation maps with sound intensity heatmaps, enabling real-time tracking of attention patterns across different audio conditions, including No Sound (NS), Stereo (ST), First-Order Ambisonics (FO), and Third-Order Ambisonics (HO). Attention shifts across sound conditions were quantified using the Jaccard Index, which measures the overlap of the top 5% most-viewed regions across participants.\u003c/p\u003e \u003cp\u003eThe results demonstrate that increasing auditory complexity\u0026mdash;from silence to spatial audio\u0026mdash;significantly broadens visual exploration. First-Order Ambisonics (FO) led to the most dispersed attention patterns, with a 62.4% reduction in attention overlap indoors and 58.8% outdoors compared to NS. Third-Order Ambisonics (HO) resulted in a 61.2% reduction indoors and 52.0% outdoors, suggesting that while FO encourages broader exploration, HO facilitates a more focused distribution of attention. Notably, HO conditions led to a 3.2% increase indoors and a 16.6% increase outdoors in attention overlap compared to FO, indicating that higher-order spatial audio helps guide attention more precisely in complex environments.\u003c/p\u003e \u003cp\u003eUnlike conventional approaches, which rely on static analyses, this tool provides real-time, participant-specific insights into attention shifts, offering a dynamic perspective on how spatial audio influences exploration. These capabilities empower VR content creators and researchers with actionable insights, optimizing spatial audio design and enhancing user engagement. By offering a robust and adaptable framework, this study advances the understanding of audio-visual interactions in immersive media environments.\u003c/p\u003e","manuscriptTitle":"Advancing Audio-Visual Attention Analysis in 360° Videos Through Real-Time Visualization","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-02-10 21:47:31","doi":"10.21203/rs.3.rs-5924870/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"5629c618-e765-4169-bd10-8602f1efb259","owner":[],"postedDate":"February 10th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-05-02T20:09:43+00:00","versionOfRecord":[],"versionCreatedAt":"2025-02-10 21:47:31","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5924870","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5924870","identity":"rs-5924870","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00