Drift-free BIM Alignment for Mixed Reality Visualization through Image Style Transfer and Feature Matching | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Drift-free BIM Alignment for Mixed Reality Visualization through Image Style Transfer and Feature Matching Mohamed Zahlan Abdul Muthalif, Davood Shojaei, Kourosh Khoshelham, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7812864/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This research introduces a novel methodology that automates the precise alignment between real and virtual environments in Mixed Reality (MR) applications, specifically tailored for the construction industry. A significant challenge in MR systems is the accumulation of camera pose estimation errors, leading to trajectory drift and reduced localization accuracy over time. Our approach addresses this by integrating HoloLens' spatial mapping capabilities with Image style transfer and geometric feature matching, enabling robust alignment between real-world HoloLens images and Building Information Modeling (BIM). By bridging the visual domain gap through image style transfer, we enhance feature correspondence, effectively eliminating drift errors that accumulate during device movement. A comprehensive evaluation using 1,408 image pairs demonstrates improved localization accuracy and reliable alignment of BIM in the real world for enhancing efficiency in the construction industry. Localization Mixed Reality Image Style Transfer BIM HoloLens Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 1. Introduction Mixed Reality (MR) is a technology that facilitates the integration of physical environments with virtual elements, thereby creating immersive user experiences. MR enables the interaction between real and virtual components, leading to a seamless blend of the two realms (M. Muthalif et al., 2022 ). These technologies are increasingly applied in industries such as education (Osadchyi et al., 2021 ), tourism (Gharaibeh et al., 2021 ), navigation (Liu et al., 2022 ), military (Livingston et al., 2010 ), and construction (Bouchlaghem et al., 2005 ; Shin & Dunston, 2008 ), where accurate visual representation and manipulation of digital data are crucial. Building Information Modeling (BIM) plays a vital role in enhancing the effectiveness of MR in the construction industry (Garbett et al., 2021 ; Irizarry et al., 2013 ; Volk et al., 2014 ). BIM serves as a digital representation of the physical and functional characteristics of a building, allowing for enhanced visualization and facilitating better decision-making and project management (Alizadehsalehi et al., 2020 ; Li et al., 2018 ). Integrating BIM with MR allows for more intuitive and real-time interaction between the virtual and the real world (M. Radanovic, 2023). This combination enables more effective visualization of hidden elements (Abdul Muthalif et al., 2024 ; M. Z. A. Muthalif et al., 2022 ) and facilitates tasks like progress tracking, maintenance, and scenario simulation in construction (Albahbah et al., 2021 ; Hsieh et al., 2023 ). A key requirement for MR visualization of BIM geometries is accurate estimation of the MR camera pose, which involves determining the position and orientation of the devices within an indoor space. The absence of Global Navigation Satellite System (GNSS) signals indoors complicates this process, prompting research into alternative methods that can provide reliable, real-time localization without relying on GNSS (Milad Ramezani et al., 2017 ). To address these challenges, infrastructure-based techniques such as WiFi, Bluetooth, ultrasound, and ultra-wideband (UWB) have been developed. These systems estimate position based on metrics like signal strength and time-of-flight, but require considerable infrastructure investments, which may not always be practical (Williams et al., 2015 ). As a result, there is growing interest in infrastructure-independent methods that do not depend on additional hardware. Infrastructure-independent methods, like visual-odometry, utilize solely visual observations to estimate the movement of a device along the trajectory (Ramezani et al., 2018 ). This method relies heavily on the quality of the images, and any degradation in image clarity or detail can significantly impact the accuracy of the motion estimates (Qin et al., 2019 ). Another popular infrastructure-independent method is Simultaneous Localization and Mapping (SLAM) is a process by which the device constructs a map of an unknown environment while simultaneously determining its position within that map, using sensor data like cameras, LiDAR, or inertial measurements (Mur-Artal et al., 2015 ). However, these methods suffer from the accumulation of errors that can arise along the trajectory over time and the distance from the initialization of the device (Acharya, 2020 ; Hsieh et al., 2023 ; Mur-Artal et al., 2015 ). Model-based localization methods have gained increasing attention for their ability to align camera poses using digital representations such as BIM. These approaches offer an infrastructure-independent solution by leveraging pre-existing 3D models of the environment to estimate camera positions without the need for physical markers or external hardware (Acharya, Khoshelham, et al., 2019; Acharya, Ramezani, et al., 2019 ; K. Chen et al., 2019 ; Mahmood et al., 2020 ; Vermandere et al., 2022 ). However, the practical deployment of these systems faces persistent limitations. The disparity in visual appearance between synthetic BIM renderings and real-world camera images caused by lighting variations, lack of texture in BIM, and differences in environmental conditions often leads to inaccurate feature matching. Additionally, indoor environments with symmetrical architectural layouts can introduce ambiguity in pose estimation. These challenges become more pronounced in large-scale or dynamic settings, where visual drift and cumulative error along the device’s trajectory compromise localization accuracy and reliability. In recent years, domain adaptation techniques like Cycle-Consistent Generative Adversarial Network (CycleGAN) have gained popularity in addressing visual mismatches between synthetic BIM renderings and real-world images. By translating synthetic images into photorealistic styles and vice versa, image feature correspondence is improved, which enhances camera pose estimation accuracy (Acharya et al., 2023 ; J. Chen, Li, Liu, et al., 2022 ; J. Chen et al., 2021 ). However, existing CycleGAN-based approaches face several limitations. Their performance often degrades in visually uniform or repetitive environments, where the lack of strong visual cues hinders accurate matching. Additionally, artifacts from GAN training, such as texture inconsistencies or noise, can introduce distortions that compromise localization precision. Moreover, these methods typically focus on the image translation task in isolation, without integrating geometric alignment procedures such as Perspective-n-Point (PnP), which limits their effectiveness in practical AR/MR applications requiring precise spatial alignment. To overcome these limitations, this paper introduces a novel approach that leverages the spatial mapping capabilities of the Microsoft HoloLens, an advanced MR headset equipped with integrated depth sensors, combined with CycleGAN-based image translation to enhance camera pose estimation accuracy in MR applications, specifically in construction, leading to more efficient project execution, progress monitoring and quality assurance. Our proposed method effectively eliminates the drift that can occur in MR devices, facilitating continuous and accurate localization throughout the trajectory at any time. Moreover, it can effectively navigate anywhere in the environment, including visually repetitive environments, as it combines multiple components, such as HoloLens, Cycle-GAN, and PnP. The contributions of this paper are as follows: A new method is developed for refining camera pose estimation by combining HoloLens’ spatial mapping capabilities with CycleGAN-based domain adaptation and geometric matching, enabling robust alignment between real-world HoloLens images and BIM. An investigation of image style transfer to bridge the visual domain gap between BIM images and HoloLens captures, enhancing feature correspondence and effectively eliminating drift errors that accumulate during device movement. A comprehensive evaluation of the proposed method using 1,408 image pairs demonstrates improved localization accuracy and reliable alignment in indoor environments. The remainder of this paper is organized as follows: Section 2 presents a comprehensive review of related literature; Section 3 outlines the proposed methodology; Section 4 provides a detailed account of the experimental procedures; Section 5 presents the results and discussions. Section 6 concludes the findings, and finally, Section 7 suggests directions for future research. 2. Related works Marker-based visual localization (Einizinab et al., 2023 ; Saito et al., 2007 ) methods are the traditional localization methods which uses physical markers placed throughout a building as fixed points for tracking the camera's position. Despite its reliability, this method has limitations, including the need for manual marker placement and vulnerability to occlusion or damage, which can affect system performance. Although recent advancements aim to automate marker placement, reliance on physical markers remains a constraint. Markerless localization techniques eliminate the need for physical markers by leveraging natural features within the environment to estimate camera pose. These methods utilize textures, geometry, and other environmental characteristics, providing greater flexibility in dynamic or expansive spaces (Abhishek et al., 2018 ; Scargill, 2021 ). One prominent markerless localization approach is visual odometry (Qin et al., 2019 ), SLAM (Mur-Artal et al., 2015 ), VISLAM (Jinyu et al., 2019 ) and RGB-D SLAM (Qin et al., 2019 ) which have proven to be an effective solution in MR applications. SLAM enables the simultaneous construction of maps while tracking the camera's position in real-time (Mur-Artal et al., 2015 ). This capability is particularly crucial in scenarios where continuous localization and map-building are essential (Qin et al., 2019 ). However, SLAM systems can encounter challenges, particularly issues of drift, where small errors accumulate over time, resulting in significant misalignment between the virtual and real environments (Hansen et al., 2021 ). Model-based localization has been instrumental in improving the accuracy of indoor localization systems. For instance, Acharya, Ramezani, et al. ( 2019 ) introduced the BIM-Tracker model, which aligns real-time camera views with BIM to provide accurate pose estimation without the need for map-building during runtime. Similarly, Mahmood et al. ( 2020 ) improved localization accuracy by integrating point cloud data with BIM models. The integration of SLAM with model-based tracking has further extended its applications, particularly in Augmented Reality (AR) presentations for indoor construction sites. One significant advancement is BIM-PoseNet, developed by Acharya, Khoshelham, et al. (2019) which utilizes synthetic images generated from 3D indoor models to estimate camera pose. However, this approach encountered challenges in environments with symmetrical features, leading to localization ambiguities. To address these limitations, Acharya et al. ( 2020 ) enhanced BIM-PoseNet by incorporating recurrent deep networks that leverage image sequences, thereby improving error reduction and robustness in complex environments through the use of temporal data. Similarly, Sattler et al. ( 2019 ) analyzed CNN-based absolute pose regression (APR) methods, noting that these models tend to approximate poses rather than accurately generalizing to real-world environments and have poor accuracy in a dynamic environment. Ha et al. ( 2018 ) encountered similar limitations in their work while addressing indoor localization by matching indoor real images to BIM images using VGG-16 features. While effective, reliance on single images limited performance, especially in dynamic environments. Radanovic et al. ( 2023 ) developed an end-to-end CNN that used real and synthetic BIM image pairs to estimate the 6 DoF (Degrees of Freedom) relative camera pose. Some of the main challenges of these methods are in environments with repetitive architectural features, where similarities can lead to localization ambiguities. Additionally, the application of these techniques in large-scale construction projects remains an area ripe for further research (Hsieh et al., 2023 ; Vermandere et al., 2022 ). In addition, the above studies highlight deep learning’s potential to improve indoor localization while revealing ongoing challenges, such as discrepancies between real-world environments and BIM-rendered geometry. Variations in furniture, lighting, and other dynamic elements can affect the accuracy of alignment, signaling the need for more refined pose estimation techniques, particularly in highly dynamic indoor environments where geometric changes are frequent. The domain adaptation techniques like CycleGAN have emerged as important tools for bridging the gap between BIM and real-world images. Domain adaptation is crucial to enhancing the accuracy of pose estimation when models trained on synthetic data are applied to real environments. Zhu et al. ( 2017 ) introduced CycleGAN, which addresses this challenge by transforming synthetic BIM images into photorealistic versions, minimizing visual differences between synthetic and real images. This transformation improves feature correspondence and camera pose estimation accuracy across domains. Recent work by J. Chen, Li, Liu, et al. ( 2022 ) demonstrated CycleGAN's effectiveness in indoor localization. By converting BIM renderings into photorealistic images, their method achieved a camera pose accuracy of 1.38 meters and 10.1°, significantly reducing the visual gap between synthetic and real images. However, deep learning methods like CycleGAN still face limitations in uniform architectural environments, where the lack of distinctive features makes it difficult to generate detailed images (Acharya et al., 2023 ) Sufiyan et al. ( 2024 ) approached the problem differently by introducing a deep CNN-based workflow for indoor localization using 360-degree panoramic images. Their approach leveraged synthetic data generated from photogrammetry, Open Street Map (OSM), and 3D building models to create comprehensive datasets, leading to improved localization accuracy. Similarly, Hong et al. ( 2020 ) utilized CycleGAN to enhance scene understanding in indoor facility management. However, like many others, they encountered noise pattern issues during GAN training, affecting the quality of the synthetic data, highlighting the need for better GAN stabilization techniques to ensure higher-quality datasets. To address some of these challenges, H. Chen et al. ( 2024 ) proposed the CycleGAN-Swin Transformer-SRPnP framework, which optimized global image retrieval and 2D-to-3D image coordinate detection. This approach improved computation time and enhanced robustness against noise and motion blur, common challenges in indoor environments. CycleGAN played a crucial role in reducing visual discrepancies between BIM renderings and real images, further improving localization accuracy. Acharya et al. ( 2023 ) also investigated synthetic-to-real (S2R-PoseNet) and real-to-synthetic (R2S-PoseNet) adaptation strategies for indoor pose regression. Their findings revealed that real-to-synthetic adaptation outperformed synthetic-to-real adaptation, reducing artifacts from motion blur and incomplete data coverage. This shift emphasizes a growing trend in real-to-synthetic domain adaptation, which simplifies visual matching and reduces the need for highly detailed BIM models, thereby improving localization accuracy in complex environments. While the aforementioned studies demonstrate the growing success of domain adaptation techniques in improving camera pose estimation, several limitations persist. Model-based approaches such as BIM-PoseNet and CNN-based regressors often struggle in geometrically repetitive indoor environments, where ambiguous visual cues lead to mislocalization and drift accumulation over time. These errors are further exacerbated by discrepancies between synthetic BIM visuals and real-world images, as well as by changes in indoor scenes due to lighting, occlusion, or layout variations. While CycleGAN-based methods have helped bridge the visual domain gap, they have been found to produce unstable outputs in texture-sparse or visually uniform environments, limiting their effectiveness in challenging conditions. To overcome these limitations, this research leverages the mapping and spatial tracking capabilities of Microsoft HoloLens, which offers robust sensor fusion and real-time scene understanding. The HoloLens’s integrated VISLAM and RGB-D SLAM helps mitigate pose ambiguities in environments with repetitive features, offering a more consistent localization baseline. Building upon this foundation, the proposed method introduces a hybrid pipeline that integrates HoloLens imagery with initial pose information with BIM, employs CycleGAN to transform real images into BIM-style visuals, and then performs precise feature matching using KAZE descriptors. This is followed by geometric pose estimation through the PnP algorithm. By integrating deep learning for domain adaptation and geometry-based localization in a unified workflow, the proposed method effectively reduces accumulated trajectory drift and enhances pose estimation accuracy in HoloLens. This approach not only addresses key limitations of prior markerless and model-based systems but also demonstrates robustness in complex indoor environments where traditional methods falter. 3. Methodology The proposed workflow for improving indoor localization accuracy is based on matching image features with the corresponding view of the BIM which provides an accurate estimate of the camera pose in the coordinate system of the BIM and ensures alignment between the BIM and the image. The different appearances of real-world and BIM images lead to numerous false matches. To tackle this issue, we employ an image style transfer method known as CycleGAN to convert real images into BIM-looking images. Next, we perform keypoint extraction through image matching, followed by estimating the camera pose using PnP. The methodology is divided into four key stages, as described in Fig. 1 : Generating BIM, Image Capture, Domain Adaptation, Feature Matching, and Error Analysis. Each stage is designed to progressively bridge the visual gap between synthetic and real images and refine the alignment between virtual and physical environments. 3.1 Generating BIM The proposed method requires that the BIM be a faithful replica of the real building. The BIM was created using Autodesk Revit, leveraging a dense point cloud acquired through a mobile laser scanning system, specifically the GeoSLAM Zeb Horizon. This scanner facilitated the detailed capture of architectural and structural elements with an accuracy of 1 ± 3 cm, including walls, doors, windows, and other critical features essential for accurate spatial modeling. Each architectural component was carefully modeled in Revit to replicate the actual spatial geometry, dimensions, and materials observed in the real-world environment. To ensure the precision of the model, the inter-wall distances and other critical dimensions were manually verified using an Electro Distance Measurement (EDM) laser device. This verification step served to cross-check the geometric fidelity of the BIM against the actual structure, thereby increasing the confidence in the model's suitability for synthetic image generation and subsequent 3D coordinate mapping. Once the BIM was finalized, it was imported into Unity, which is a real-time 3D development platform chosen for its compatibility with complex 3D models and its capability to render environments with high efficiency (Büyüksalih et al., 2020 ). In Unity, the imported BIM was further optimized for real-time performance. This included configuring lighting conditions to reflect those in the actual house and minimizing computational overhead by excluding interior furnishings and detailed textures. As such, the resulting model was classified as a Low Level of Detail (LoD 300) BIM, suitable for pose estimation and localization experiments (Fig. 2 ). 3.2 Image Capture The image acquisition phase involves the simultaneous collection of real-world and synthetic datasets, both of which are fundamental to pose estimation and alignment evaluation. A trajectory was completed to capture Real-world images, the image poses and point clouds (processed from the depth information captured by the depth cameras) were obtained using the HoloLens, drawing upon a repository initially introduced by Ungureanu et al. ( 2020 ) Each image was accompanied by pose information derived from the HoloLens’ internal sensors, including its depth sensors and grayscale tracking cameras. BIM images were captured within Unity, including the 3D coordinates of each pixel in every image frame in CSV file format. These coordinates are the primary ground truth that will be used to calculate the updated pose to correct the HoloLens drift. In total, 1,408 real-world images (Fig. 3 (b)) and their corresponding synthetic BIM images (Fig. 3 (a)) were captured. 3.3 Domain Adaptation and Feature Matching To bridge the visual discrepancies between real-world HoloLens images and synthetic BIM renderings, a Cycle-Consistent Generative Adversarial Network (CycleGAN) was employed. CycleGANs are particularly effective for unpaired image-to-image translation tasks, allowing for domain adaptation without requiring exact correspondence between source and target image sets. In this context, the CycleGAN was trained on unpaired datasets comprising BIM images and real-world HoloLens captures. The objective was to produce style-transferred images that preserved geometric structure while simulating the texture and lighting characteristics of the real-world scenes (Zhu et al., 2017 ). Following successful training, CycleGAN-generated images were matched with their corresponding synthetic BIM images using KAZE feature detection. KAZE is a robust feature descriptor designed to operate efficiently across varying scales and image nonlinearity (Tareen & Saleem, 2018 ; Zhang & Yan, 2023 ) and was experimentally found more suitable for this task. The detection process involved grayscale conversion of both CycleGAN and BIM images, followed by extraction and matching of salient features. These feature correspondences were crucial for subsequent pose estimation. Subsequently, the PnP algorithm was then used to estimate camera poses, leveraging the 2D-3D correspondences identified in the previous step. The PnP algorithm computed the optimal rotation and translation vectors that aligned the HoloLens images with the BIM-derived virtual scene. This transformation enabled the accurate projection of 3D points onto the corresponding 2D image plane (Wu & Hu, 2006 ). 3.4 Error Analysis The final phase of the methodology focuses on evaluating the alignment accuracy achieved through the PnP-based transformation. Using the transformation matrix derived from the PnP algorithm (Gao et al., 2003 ), 3D coordinates of matched features were reprojected onto the image plane (2D image coordinate system). These were then compared against the original 2D correspondences in the CycleGAN-translated HoloLens images to compute the Root Mean Square Error (RMSE) after the PnP (after-PnP). This error metric quantitatively represents the alignment accuracy between virtual and real environments and is defined as the Euclidean distance between projected and actual 2D points (Lepetit & Fua, 2005 ). To evaluate the effectiveness of the workflow, the RMSE-before, which is the RMSE before performing PnP, was calculated using the transformation matrix obtained during the image registration phase, before any CycleGAN or PnP processing. Elimination of this error after performing the PnP application would suggest that the proposed pipeline, incorporating CycleGAN-based domain adaptation and PnP-based pose estimation, effectively aligns the HoloLens camera with the Unity camera. This outcome substantiates the capability of the proposed approach to enhance the accuracy of real-to-virtual world registration, thereby supporting its applicability in practical MR localization scenarios. 4. Experiments This study was conducted to evaluate the robustness, consistency, and effectiveness of the proposed localization enhancement methodology within a controlled offline setting. MATLAB served as the primary computational environment due to its versatile image processing, computer vision, and mathematical analysis capabilities. All datasets, including those captured from Unity, HoloLens, and CycleGAN, were carefully imported and organized within MATLAB to enable an integrated and iterative experimental workflow. 4.1 HoloLens Data Acquisition Before initiating formal image acquisition, the head-mounted HoloLens moved along a predefined trajectory within a residential indoor environment. This preliminary phase was essential for allowing the HoloLens to build a consistent spatial understanding of the environment, thereby minimizing tracking drift and improving the accuracy of pose estimation during actual image capture. The goal was to establish a stable operational context, which is critical for reliable data acquisition in real-world MR scenarios. It is important to note that it is not always possible during practical situations. Following this initialization, the HoloLens device captured 1,454 real-world RGB images at a consistent frame rate of 30 frames per second (fps). The trajectory followed by the operator is illustrated in Fig. 4 , with the start and end locations designated as point "A." The chosen path covered varied lighting, geometry, and material conditions within the indoor environment. A significant challenge was encountered in a narrow corridor denoted by "B" in Fig. 4 . This section involved two abrupt turns over a short distance, which posed difficulties for the HoloLens in maintaining accurate pose estimation. Consequently, 46 images from this section were deemed unreliable due to incorrect or missing pose data. Despite multiple attempts to re-capture data in this specific corridor, the localization failures persisted, and the associated frames were ultimately excluded from the final dataset. The dataset, post-processing, included RGB images, corresponding camera pose information in the HoloLens local coordinate system, and spatially contextualized point cloud segments generated from the HoloLens’ internal depth sensors. These elements together formed a foundational multimodal dataset necessary for subsequent alignment, synthetic data generation, and evaluation stages. 4.2 Registration of HoloLens point cloud with BIM It was crucial to align the HoloLens and BIM coordinate systems to capture BIM images within Unity. This alignment step provided the spatial transformation required to convert real-world camera poses into the coordinate space used by the BIM, thereby ensuring consistency between synthetic and real-world datasets. The initial step in this alignment process involved merging several segmented point clouds obtained from HoloLens image segments into a single cohesive 3D point cloud. This comprehensive point cloud was imported into CloudCompare software, which was used to perform a two-step registration process. The first step involved a coarse alignment using point-pair registration, allowing rough alignment based on manually selected reference features. The second step involved fine-tuning through ICP registration, which minimized the Euclidean distance between corresponding point features in the merged HoloLens point cloud and the BIM-derived point cloud (Fig. 5 ). The final registration achieved an error with an RMSE of 0.024. The transformation matrix, denoted as T HC , which accurately mapped HoloLens spatial data into the BIM coordinate system: T HC = \(\:\left[\begin{array}{cccc}1.000&\:0.006\:&\:-0.006\:&\:-1.0395\\\:-0.035\:&\:-0.001\:&\:-1.000\:&\:0.479\\\:-0.007\:&\:1.000\:&\:0.000\:&\:1.1738\:\\\:0&\:0&\:0&\:1\end{array}\right]\) This matrix served as a critical spatial bridge, enabling the transformation of all HoloLens poses into the BIM’s reference frame for direct comparison and data fusion. Although this step is performed manually in this work, a range of algorithms exists that can facilitate automated registration in real-time applications (J. Chen, Li, & Lu, 2022 ; M. Radanovic, 2023; Radanovic et al., 2023 ; Vermandere et al., 2022 ). 4.3 Unity Data Preparation and Image Capture After generating the BIM, it was subsequently imported into Unity for the purpose of capturing BIM imagery. To ensure that the BIM and real images have identical geometry, the same intrinsic camera settings as those of the HoloLens RGB camera, employed for capturing real-world images, were integrated in the virtual camera in Unity. Thus, the initial step involved calibrating the HoloLens RGB camera. To accurately configure. the Unity camera, the HoloLens RGB camera, was calibrated to determine the intrinsic matrix (K values) using the “detectCheckerboardPoints” library in MATLAB and a checkerboard pattern for this calibration process. 4.4 CycleGAN Training for Domain Adaptation In order to reduce domain discrepancies between real and synthetic images, a CycleGAN model was trained for unpaired image-to-image translation (Zhu et al., 2017 ). The dataset comprised BIM-rendered images and HoloLens images with no one-to-one pairing; these were split in a 9:1:1 ratio into training, validation, and test subsets. Following the original architecture, we used generators based on nine residual‑block architectures and discriminators employing the PatchGAN paradigm. Instance normalization was applied consistently across both generators and discriminators to stabilize style transfer and preserve structure. The model’s training objective balanced three loss components: adversarial loss to encourage realism in translated images, cycle‑consistency loss to enforce that mapping there and back returns the original image, and identity loss to prevent unnecessary style shifts when input already lies in the target domain. During training, various checkpoints were evaluated using the validation set to assess the trade-off between visual realism and structural fidelity. Ultimately, the model at epoch 200 produced the best style-transferred results, renderings that most closely matched real-world textures while maintaining the geometric integrity of BIM structures (Fig. 6 (a), (b)). 4.5 Image Rescaling Although the original image resolution for both BIM and HoloLens datasets was 760×428 pixels, CycleGAN’s architecture resized all training inputs to 256×256 pixels. To restore spatial consistency, a rescaling operation was conducted using factors of 2.968 (width) and 1.672 (height), bringing the generated CycleGAN outputs back to their original resolution. During the rescaling, the nearest-neighbor interpolation resampling is employed (Fig. 7 ). 4.6 Image matching The rescaled CycleGAN-transformed images and their corresponding BIM images underwent feature matching using the KAZE (Zhang & Yan, 2023 ) algorithm implemented in MATLAB using “detectKAZEFeatures” function. The images were first converted to grayscale, and keypoints were extracted using KAZE, known for its robustness to non-linear illumination and scale changes. The descriptors were matched between image pairs, and the matched keypoints were visualized and color-coded for interpretability (Fig. 8 ). 4.7 Perspective-n-Point (PnP) Pose Estimation To refine the estimated camera poses and correct accumulated drift, the PnP algorithm was employed to compute the transformation between the image space and the BIM’s 3D coordinate system. Specifically, the “estimateWorldCameraPose” function in MATLAB was used to solve the PnP problem by aligning 2D image coordinates extracted from CycleGAN images with their corresponding 3D points from the BIM, as explained in section 3.2. The 3D spatial coordinates were retrieved from a pre-generated dataset of BIM exported during Unity rendering, while the 2D image coordinates were extracted through feature matching as outlined in Section 3.3. These correspondences were passed to the PnP solver, which uses the Perspective-Three-Point (P3P) algorithm as its underlying method. The P3P approach provides an efficient closed-form solution, especially suitable when at least four 2D–3D point correspondences are available. To enhance robustness against mismatches and noise, the M-estimator Sample Consensus (MSAC) (Aijazi et al., 2019 ; M. Ramezani et al., 2017 ) was used to reject outlier correspondences with reprojection errors exceeding 2 pixels. The MSAC implementation involved a maximum of 2,000 iterations and a 99% confidence level, ensuring reliable pose estimation even in the presence of challenging visual conditions or erroneous matches. 4.8 Reprojection of 3D Points To validate the improvement in localisation accuracy, the reprojection of 3D BIM points onto the 2D image plane has been carried out before and after applying the PnP algorithm. This comparison has been used to quantify the drift errors present in the initial HoloLens poses and to demonstrate the refinement achieved through the proposed method. The initial camera poses R tran , T tran have been extracted from HoloLens tracking data and used to project known 3D BIM coordinates into the 2D image plane, resulting in the initial set of reprojected points. The corrected camera poses R cam, T cam have been estimated through the CycleGAN-enhanced PnP algorithm based on matched 2D–3D keypoint correspondences. Both sets of projections have been computed using the intrinsic parameters of the HoloLens RGB camera, calibrated before experimentation. The 2D correspondences have been extracted from CycleGAN-translated images using geometric feature matching techniques (as described in Section 3.3). These 2D image points have been compared with the reprojected BIM points derived from both the initial and corrected poses. As illustrated in Fig. 9 , green points denote the projections based on the refined pose, representing the expected location of features in the absence of drift. In contrast, red points have represented the projections from the initial HoloLens poses, highlighting the effect of accumulated drift. 4.9 Error Evaluation The accuracy of the camera pose refinement has been quantitatively evaluated by computing RMSE between the 2D image correspondences and the reprojected points generated using both initial and corrected poses using the following formulas. EMSE-before has been calculated for the initial HoloLens poses, whereas the RMSE-after has been derived using the refined values. The value is in image pixels. RMSE-before = \(\:\sqrt{\frac{1}{N}\sum\:_{1=1}^{N}{‖{P}_{i}^{C}-{P}_{i}^{before}‖}^{2}}\) RMSE-after = \(\:\sqrt{\frac{1}{N}\sum\:_{1=1}^{N}{‖{P}_{i}^{C}-{P}_{i}^{after}‖}^{2}}\) Where, P i after is the reprojected point using the estimated camera pose (R cam, T cam ), P i beforel is the reprojected point using the HoloLens pose (R tran , T tran ), N is the number of inlier points/correspondence in each image pair, and \(\:{P}_{i}^{C}\) is the corresponding 2D image point. These RMSE values have been used to assess the geometric accuracy of the alignment process and to validate the impact of the proposed method in correcting accumulated drift. The 2D correspondences have been treated as ground truth, and reductions in RMSE have indicated improved localisation performance. The evaluation has confirmed that the proposed CycleGAN-enhanced pose refinement pipeline significantly reduced trajectory drift and improved spatial alignment between real and virtual environments across the 1,408 tested image pairs. 5. Results and Discussion The comprehensive evaluation of the proposed methodology was conducted systematically, repeating the entire process for all 1,408 captured image pairs. To ensure statistical significance and enhance the reliability of the findings, a MATLAB-based computational workflow was executed iteratively 100 times for each image pair. This thorough approach facilitated the calculation of the RMSE for each pair, effectively capturing the average reprojection error between the initial and refined camera poses. Figure 10 illustrates the distribution of RMSE values for each image pair, providing a comparative analysis of pose estimation accuracy across the entire dataset. The red line in the graph represents the RMSE prior to the application of the PnP, with values ranging approximately from 1 to 90 pixels, indicating a substantial degree of drift. In contrast, the blue line illustrates the RMSE after the implementation of the PnP, showcasing a remarkable reduction in error to a range of 1 to 2 pixels. The vertical axis is scaled logarithmically to improve visibility, allowing a clearer comparison of the differences between the two phases of the methodology and highlighting the effectiveness of the pose refinement process. Notably, the gaps observed in the graphs correspond to specific images that were excluded from the PnP evaluation due to an insufficient number of required correspondences identified within those image pairs after feature matching. Additionally, certain images were omitted because of challenges encountered during rapid movements or quick turns with the HoloLens, which resulted in temporary loss of localization. To further illustrate the results, Fig. 11 presents a colorized RMSE plot along the trajectory of the camera. This visualization delineates both the initial and final RMSE values in relation to the camera's movement throughout the captured scene. Gray points in this plot indicate the images that contributed to the gaps noted in Fig. 10 , resulting from insufficient correspondences. The gaps in the trajectory also reflect images captured during periods of compromised localization, thereby enhancing the understanding of how pose estimation accuracy varied under different conditions. It is important to note that during these periods, HoloLens tracking remains active with only minor drift, which gradually increases from below a 2-pixel value until the next drift correction is applied. This contrasts with the significantly larger initial drift observed at the same location when the proposed method is not employed. It is important to note that not all 1,408 image pairs were included in the RMSE analysis. Specifically, 398 image pairs were excluded due to an insufficient number of reliable feature correspondences required for accurate PnP estimation. A minimum threshold of 10 inlier correspondences was established based on empirical tuning, ensuring a balance between analytical coverage and pose estimation accuracy. Lowering this threshold increased the total number of usable image pairs, but it also led to a higher occurrence of erroneous or spurious feature matches, thereby compromising the reliability of the estimated poses. This trade-off is illustrated in Fig. 12 , where an example of an image pair with erroneous correspondences is shown, emphasizing the necessity of enforcing a minimum inlier constraint. Further, the environment was segmented into distinct sections to facilitate a region-specific analysis. Section A, located at the beginning of the trajectory, demonstrated consistently low RMSE values, indicating high accuracy in localization during the initial phase (Fig. 13 ). Minimal reprojection error observed in Section A (Fig. 14 (a)) in the start of the trajectory. However, in sections involving turns, such as Turning Points “T” and “U”, a noticeable increase in RMSE was observed (Fig. s 14 (b) and (c)). This rise in error is attributed to motion-induced blur and reduced image sharpness, which affected the CycleGAN-generated imagery and led to compromised localization from the HoloLens. Sections B and C, characterized by a wider hallway and fewer distinctive features, showed progressively increasing RMSE values. The larger scale and uniform textures of these regions posed challenges for HoloLens mapping and further contributed to the accumulation of pose estimation errors (Fig. s 14 (d) and (e)). In Section D, RMSE continued to increase due to compounding trajectory errors. Nevertheless, a sharp reduction in RMSE occurred at the start of Section E. This improvement resulted from the camera’s ability to view extended spatial features, allowing the relocalization process to self-correct based on the broader field of view and increased environmental cues (Fig. s 14 (f) and (g)). Conversely, Section E’s confined geometry limited feature visibility, preventing effective relocalization (Fig. s 14 (h) and (i)). Section F presented some of the highest RMSE values across the entire dataset. This trend is associated with the prolonged accumulation of errors due to the drift and the limited number of distinctive features available for accurate relocalization (Fig. 14 (j)). The final segment, spanning image pairs from index 1351 to 1408, was particularly problematic. Many of these frames were excluded from RMSE calculations due to insufficient feature correspondences, often because the camera's view was dominated by homogeneous elements such as plain doors or featureless walls (Fig. 15 ). To further quantify the distribution of pose estimation accuracy, we generated a Cumulative Distribution Function (CDF) plot of RMSE-before (red line) and RMSE-after (blue line), as illustrated in Fig. 16 . The x-axis is scaled logarithmically to enhance the visual representation of both lines on the graph. This plot provides an aggregated statistical perspective, indicating that only 60% of RMSE values are around 20 pixels, while 90% of RMSE values exceed 30 pixels before implementing the PnP method. However, after applying PnP, all 100% of RMSE values are less than 2 pixels. These results validate the effectiveness of the proposed localization refinement framework and underscore its robustness in eliminating drift errors that can accumulate along the trajectory over distance and time. 6. Conclusion The primary objective of this research is to demonstrate the effectiveness of a novel method for aligning the real and virtual worlds, particularly for MR applications in the construction industry. MR devices like the HoloLens commonly experience cumulative trajectory errors, such as mapping inaccuracies, which typically increase progressively over time and distance from the initial reference point. These incremental errors can significantly degrade localization accuracy, affecting the overall effectiveness and reliability of MR applications. The proposed approach calculates an updated camera pose for the HoloLens device based on a virtual coordinate system. Our experimental findings clearly illustrate that the proposed methodology effectively eliminate these trajectory-induced errors, substantially improving localization accuracy throughout the trajectory. The experiment results provide quantitative evidence supporting the assertion that MR devices frequently encounter localization inaccuracies, which can be significantly minimized by integrating domain adaptation methods, specifically CycleGAN, with BIM. By utilizing CycleGAN-generated synthetic images that closely match real-world visuals, the feature correspondence quality is enhanced, leading to more accurate camera pose estimation and thus reducing overall localization errors in practical MR scenarios, reducing the need for manual calibration and intervention, thereby streamlining construction workflows and enhancing efficiency. However, the current implementation has been conducted in an offline environment using MATLAB, which limits its immediate applicability to real-time MR experiences. In its current form, the method supports camera pose correction and BIM alignment after data acquisition, but does not yet enable dynamic visualisation or interactive overlay of virtual elements in live settings. As such, future work will focus on adapting and optimising this pipeline for real-time deployment on MR devices like the HoloLens, addressing computational constraints and integrating more efficient inference strategies. Additionally, challenges remain in environments with low texture, occlusion, or architectural symmetry, where feature matching becomes difficult. To extend the robustness of the system, future research should investigate advanced correspondence extraction methods, including multi-view tracking and learning-based keypoint detection, that can support accurate pose estimation under more varied and complex conditions. By addressing these limitations, the proposed method has the potential to evolve into a fully automated, real-time MR localisation and tracking solution that enhances not only visualization, but also task automation, progress monitoring, and decision-making across construction and infrastructure applications. Statements and Declarations Acknowledgement The author would like to express sincere appreciation to Dr Davood Shojaei, Professor Kourosh Khoshelham and Dr Debaditya Acharya for their continued supervision and valuable guidance, technical insights and collaborative support during the implementation and experimentation phases throughout the development of this work. Their input played a vital role in advancing the localization component of this study. The assistance provided by the University of Melbourne in facilitating access to necessary tools and computational resources is also gratefully acknowledged. The authors have no competing interests to declare that are relevant to the content of this article. During the preparation of this work, the corresponding author (Mohamed Zahlan Abdul Muthalif) used Grammarly in order to check grammar. After using this tool, the author reviewed and edited the content as needed and takes full responsibility for the content of the published article. The source codes are available for downloading at the link: https://github.com/Mabdulmuthal/MR-localization The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request. This work was supported by the University of Melbourne (Application Reference: 644655. 2020). Authorship contribution Mohamed Zahlan Abdul Muthalif is the Corresponding Author and main researcher, who led the research and contributed approximately 60% of the work, including experiment design and execution, data analysis, code development, and manuscript writing. Dr Davood Shojaei served, the principal supervisor, who has contributed to the conceptual development of the methodology and provided ongoing guidance throughout the research. Professor Kourosh Khoshelham is a secondary supervisor who has provided supervisory support, contributed to the development of ideas, and reviewed and revised the manuscript. Dr Debaditya Acharya is a secondary supervisor who has contributed to the conceptual framing, supervised technical implementation, assisted in code development, and reviewed the manuscript. References Abdul Muthalif, M. Z., Shojaei, D., & Khoshelham, K. (2024). Interactive Mixed Reality Methods for Visualization of Underground Utilities. PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science . doi:10.1007/s41064-024-00295-x Abhishek, M. T., Aswin, P. S., Akhil, N. C., Souban, A., Muhammedali, S. K., & Vial, A. (2018, 4-7 Dec. 2018). Virtual Lab Using Markerless Augmented Reality. Paper presented at the 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE). Acharya, D. (2020). Visual indoor localisation using a 3D building model. UNIVERSITY OF MELBOURNE, Acharya, D., Khoshelham, K., & Winter, S. (2019). BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. ISPRS Journal of Photogrammetry and Remote Sensing, 150 , 245-258. Acharya, D., Ramezani, M., Khoshelham, K., & Winter, S. (2019). BIM-Tracker: A model-based visual tracking approach for indoor localisation using a 3D building model. ISPRS Journal of Photogrammetry and Remote Sensing, 150 , 157-171. doi:10.1016/j.isprsjprs.2019.02.014 Acharya, D., Singha Roy, S., Khoshelham, K., & Winter, S. (2020). A Recurrent Deep Network for Estimating the Pose of Real Indoor Images from Synthetic Image Sequences. Sensors, 20 (19), 5492. Acharya, D., Tatli, C. J., & Khoshelham, K. (2023). Synthetic-real image domain adaptation for indoor camera pose regression using a 3D model. ISPRS Journal of Photogrammetry and Remote Sensing, 202 , 405-421. doi:https://doi.org/10.1016/j.isprsjprs.2023.06.013 Aijazi, A. K., Malaterre, L., Trassoudaine, L., Chateau, T., & Checchin, P. (2019). Automatic Detection and Modeling of Underground Pipes Using a Portable 3D LiDAR System. Sensors (Basel), 19 (24). doi:10.3390/s19245345 Albahbah, M., Kıvrak, S., & Arslan, G. (2021). Application areas of augmented reality and virtual reality in construction project management: A scoping review. Journal of Construction Engineering, 4 (3), 151-172. Alizadehsalehi, S., Hadavi, A., & Huang, J. C. (2020). From BIM to extended reality in AEC industry. Automation in Construction, 116 . doi:10.1016/j.autcon.2020.103254 Bouchlaghem, D., Shang, H., Whyte, J., & Ganah, A. (2005). Visualisation in architecture, engineering and construction (AEC). Automation in Construction, 14 (3), 287-295. doi:10.1016/j.autcon.2004.08.012 Büyüksalih, G., Kan, T., Özkan, G. E., Meriç, M., Isın, L., & Kersten, T. P. (2020). Preserving the Knowledge of the Past Through Virtual Visits: From 3D Laser Scanning to Virtual Reality Visualisation at the Istanbul Çatalca İnceğiz Caves. PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 88 (2), 133-146. doi:10.1007/s41064-020-00091-3 Chen, H., Yang, H., Chen, J., Zhang, S., & Jing, X. (2024). Bim Aided Indoor Camera Pose Estimation Based on Cross-Domain Image Retrieval. Available at SSRN 4913115 . Chen, J., Li, S., Liu, D., & Lu, W. (2022). Indoor camera pose estimation via style-transfer 3D models. Computer-Aided Civil and Infrastructure Engineering, 37 (3), 335-353. doi:https://doi.org/10.1111/mice.12714 Chen, J., Li, S., & Lu, W. (2022). Align to locate: Registering photogrammetric point clouds to BIM for robust indoor localization. Building and Environment, 209 , 108675. doi:https://doi.org/10.1016/j.buildenv.2021.108675 Chen, J., Li, S., Lu, W., Liu, D., Hu, D., & Tang, M. (2021). Markerless Augmented Reality for Facility Management: Automated Spatial Registration based on Style Transfer Generative Network. Paper presented at the Proceedings of the 38th International Symposium on Automation and Robotics in Construction (ISARC), International Association for Automation and Robotics in Construction (IAARC). Chen, K., Chen, W., Li, C. T., & Cheng, J. C. (2019). A BIM-based location aware AR collaborative framework for facility maintenance management. J. Inf. Technol. Constr., 24 , 360-380. Einizinab, S., Khoshelham, K., Winter, S., & Christopher, P. (2023, 8-11 Oct. 2023). Offset-Based Marker Placement for BIM Alignment in Mixed Reality. Paper presented at the 2023 IEEE International Conference on Image Processing Challenges and Workshops (ICIPCW). Gao, X.-S., Hou, X.-R., Tang, J., & Cheng, H.-F. (2003). Complete solution classification for the perspective-three-point problem. IEEE transactions on pattern analysis and machine intelligence, 25 (8), 930-943. Garbett, J., Hartley, T., & Heesom, D. (2021). A multi-user collaborative BIM-AR system to support design and construction. Automation in Construction, 122 . doi:10.1016/j.autcon.2020.103487 Gharaibeh, M. K., Gharaibeh, N. K., Khan, M. A., karim Abu-ain, W. A., & Alqudah, M. K. (2021). Intention to Use Mobile Augmented Reality in the Tourism Sector. Ha, I., Kim, H., Park, S., & Kim, H. (2018). Image-based Indoor Localization using BIM and Features of CNN. In (Vol. 35, pp. 1-4). Waterloo: IAARC Publications. Hansen, L. H., Fleck, P., Stranner, M., Schmalstieg, D., & Arth, C. (2021). Augmented Reality for Subsurface Utility Engineering, Revisited. IEEE transactions on visualization and computer graphics, 27 (11), 4119-4128. Hong, Y., Park, S., & Kim, H. (2020). Synthetic data generation for indoor scene understanding using BIM. Paper presented at the ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction. Hsieh, C.-C., Chen, H.-M., & Wang, S.-K. (2023). On-site Visual Construction Management System Based on the Integration of SLAM-based AR and BIM on a Handheld Device. KSCE Journal of Civil Engineering, 27 (11), 4688-4707. doi:10.1007/s12205-023-1939-2 Irizarry, J., Karan, E. P., & Jalaei, F. (2013). Integrating BIM and GIS to improve the visual monitoring of construction supply chain management. Automation in Construction, 31 , 241-254. doi:10.1016/j.autcon.2012.12.005 Jinyu, L., Bangbang, Y., Danpeng, C., Nan, W., Guofeng, Z., & Hujun, B. (2019). Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality. Virtual Reality & Intelligent Hardware, 1 (4), 386-410. doi:10.1016/j.vrih.2019.07.002 Lepetit, V., & Fua, P. (2005). Monocular model-based 3D tracking of rigid objects : Now Publishers Inc. Li, X., Yi, W., Chi, H.-L., Wang, X., & Chan, A. P. C. (2018). A critical review of virtual and augmented reality (VR/AR) applications in construction safety. Automation in Construction, 86 , 150-162. doi:10.1016/j.autcon.2017.11.003 Liu, B., Ding, L., Wang, S., & Meng, L. (2022). Designing Mixed Reality-Based Indoor Navigation for User Studies. KN-Journal of Cartography and Geographic Information , 1-10. Livingston, M. A., Ai, Z., Karsch, K., & Gibson, G. O. (2010). User interface design for military AR applications. Virtual Reality, 15 (2-3), 175-184. doi:10.1007/s10055-010-0179-1 M. Radanovic, K. K., C. S. Fraser, D. Acharya. (2023). CONTINUOUS BIM ALIGNMENT FOR MIXED REALITY VISUALISATION. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, X-1/W1-2023 , 279-286. doi:10.5194/isprs-annals-X-1-W1-2023-279-2023 Mahmood, B., Han, S., & Lee, D.-E. (2020). BIM-Based Registration and Localization of 3D Point Clouds of Indoor Scenes Using Geometric Features for Augmented Reality. Remote Sensing, 12 (14), 2302. Retrieved from https://www.mdpi.com/2072-4292/12/14/2302 Mur-Artal, R., Montiel, J. M. M., & Tardos, J. D. (2015). ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics, 31 (5), 1147-1163. doi:10.1109/tro.2015.2463671 Muthalif, M., Shojaei, D., & Khoshelham, K. (2022). A review of augmented reality visualization methods for subsurface utilities. Advanced Engineering Informatics, 51 , 101498. Muthalif, M. Z. A., Shojaei, D., & Khoshelham, K. (2022). RESOLVING PERCEPTUAL CHALLENGES OF VISUALIZING UNDERGROUND UTILITIES IN MIXED REALITY. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLVIII-4/W4-2022 , 101-108. doi:10.5194/isprs-archives-XLVIII-4-W4-2022-101-2022 Osadchyi, V., Valko, N., & Kuzmich, L. (2021). Using augmented reality technologies for STEM education organization. Paper presented at the Journal of Physics: Conference Series. Qin, J., Li, M., Liao, X., & Zhong, J. (2019). Accumulative Errors Optimization for Visual Odometry of ORB-SLAM2 Based on RGB-D Cameras. ISPRS International Journal of Geo-Information, 8 (12), 581. Retrieved from https://www.mdpi.com/2220-9964/8/12/581 Radanovic, M., Khoshelham, K., Fraser, C., & Acharya, D. (2023). Continuous Bim Alignment for Mixed Reality Visualisation. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 10 , 279-286. Ramezani, M., Acharya, D., Gu, F., & Khoshelham, K. (2017). INDOOR POSITIONING BY VISUAL-INERTIAL ODOMETRY. ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences, 4 . Ramezani, M., Acharya, D., Gu, F., & Khoshelham, K. (2017). Indoor Positioning by Visual-Inertial Odometry. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, IV-2/W4 , 371-376. doi:10.5194/isprs-annals-IV-2-W4-371-2017 Ramezani, M., Khoshelham, K., & Fraser, C. (2018). Pose estimation by Omnidirectional Visual-Inertial Odometry. Robotics and Autonomous Systems, 105 , 26-37. doi:10.1016/j.robot.2018.03.007 Saito, S., Hiyama, A., Tanikawa, T., & Hirose, M. (2007, 10-14 March 2007). Indoor Marker-based Localization Using Coded Seamless Pattern for Interior Decoration. Paper presented at the 2007 IEEE Virtual Reality Conference. Sattler, T., Zhou, Q., Pollefeys, M., & Leal-Taixe, L. (2019). Understanding the limitations of cnn-based absolute camera pose regression. Paper presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Scargill, T. (2021). Context-Aware Markerless Augmented Reality for Shared Educational Spaces. Paper presented at the 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). Shin, D. H., & Dunston, P. S. (2008). Identification of application areas for Augmented Reality in industrial construction based on technology suitability. Automation in Construction, 17 (7), 882-894. doi:doi.org/10.1016/j.autcon.2008.02.012 Sufiyan, D., Win, L. S. T., Win, S. K. H., Tan, U. X., & Foong, S. (2024, 15-19 July 2024). Direct Aerial Visual Localization using Panoramic Synthetic Images and Domain Adaptation. Paper presented at the 2024 IEEE International Conference on Advanced Intelligent Mechatronics (AIM). Tareen, S. A. K., & Saleem, Z. (2018, 3-4 March 2018). A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK. Paper presented at the 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET). Ungureanu, D., Bogo, F., Galliani, S., Sama, P., Duan, X., Meekhof, C., . . . Schönberger, J. L. (2020). Hololens 2 research mode as a tool for computer vision research. arXiv preprint arXiv:2008.11239 . Vermandere, J., Bassier, M., & Vergauwen, M. (2022). Two-Step Alignment of Mixed Reality Devices to Existing Building Data. Remote Sensing, 14 (11), 2680. Retrieved from https://www.mdpi.com/2072-4292/14/11/2680 Volk, R., Stengel, J., & Schultmann, F. (2014). Building Information Modeling (BIM) for existing buildings — Literature review and future needs. Automation in Construction, 38 , 109-127. doi:10.1016/j.autcon.2013.10.023 Williams, G., Gheisari, M., Chen, P.-J., & Irizarry, J. (2015). BIM2MAR: An Efficient BIM Translation to Mobile Augmented Reality Applications. Journal of Management in Engineering, 31 (1). doi:10.1061/(asce)me.1943-5479.0000315 Wu, Y., & Hu, Z. (2006). PnP problem revisited. Journal of Mathematical Imaging and Vision, 24 , 131-141. Zhang, P., & Yan, X. (2023). Application of Improved KAZE Algorithm in Image Feature Extraction and Matching. IEEE Access, 11 , 122625-122637. doi:10.1109/ACCESS.2023.3328778 Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. Paper presented at the Proceedings of the IEEE international conference on computer vision. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7812864","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":527214601,"identity":"eb3d7f21-3468-42ce-9b22-3a6970e4da1e","order_by":0,"name":"Mohamed Zahlan Abdul Muthalif","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABRUlEQVRIie3RPUvDQBjA8ScU6nLQ9aCN36BwJRCUCH6VHB2ypNJJOjicCJelrWsLfoi6uJpw0CxnXQMRmi6dgmR0COJVrTS+0NXh/stzHPnxHARAp/uXIcDQVxODERabi3B7DzWuZu13QtTAANFkcwrBYB/E2E8E2iHwF2kHD1GKyB29b15l4qQs6TgexZnBn1rt4JJnMHAoO5Bkh9jyrOsgktJha05EjxM6lQvKDL5GtowCAtKjDPkVEvp2851gVxFG6CzxO4oIZCeUY3WgDKrkMd8SrxBHpSLL/JMsV4q8KtLIKyT52uITAfXNFrTdYijCFMHVLclz9/iGpNYw8fvRiFvWVPqdibtQRNKAuHPP4njdrzysFyX5IDWDiXebvZSH5jiWpCjOxakdi3lWXDjmdaM7+/lzvufuHur7v9fpdDpdtTfAw4wTUPS23wAAAABJRU5ErkJggg==","orcid":"","institution":"The University of Melbourne","correspondingAuthor":true,"prefix":"","firstName":"Mohamed","middleName":"Zahlan Abdul","lastName":"Muthalif","suffix":""},{"id":527214603,"identity":"aecc23c6-bd5f-40ed-93d0-b37b19a653e1","order_by":1,"name":"Davood Shojaei","email":"","orcid":"","institution":"The University of Melbourne","correspondingAuthor":false,"prefix":"","firstName":"Davood","middleName":"","lastName":"Shojaei","suffix":""},{"id":527214605,"identity":"80cdd9d8-1440-4855-ae22-e745be951820","order_by":2,"name":"Kourosh Khoshelham","email":"","orcid":"","institution":"The University of Melbourne","correspondingAuthor":false,"prefix":"","firstName":"Kourosh","middleName":"","lastName":"Khoshelham","suffix":""},{"id":527214607,"identity":"9c408368-ff40-4545-a659-3ba9ff81f9a6","order_by":3,"name":"Debaditya Acharya","email":"","orcid":"","institution":"RMIT University","correspondingAuthor":false,"prefix":"","firstName":"Debaditya","middleName":"","lastName":"Acharya","suffix":""}],"badges":[],"createdAt":"2025-10-09 04:23:23","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7812864/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7812864/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":93343291,"identity":"c819e697-b1b4-47ff-8611-9a122688bb4a","added_by":"auto","created_at":"2025-10-12 14:50:56","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12690381,"visible":true,"origin":"","legend":"","description":"","filename":"DriftfreeBIMMRvisualizationthroughimageSFFM.docx","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/19991c1808dedafcbe0291e4.docx"},{"id":93342411,"identity":"71f81d01-5c9a-409b-8d19-a14205399f15","added_by":"auto","created_at":"2025-10-12 14:42:55","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6089,"visible":true,"origin":"","legend":"","description":"","filename":"7bd9bf4479fc4623a9e141e1efeb44b6.json","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/a2cf42bc62d8ecab16675ca7.json"},{"id":93343955,"identity":"b178c246-d805-4ca3-b2c4-b865b890ffe3","added_by":"auto","created_at":"2025-10-12 14:58:55","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":156206,"visible":true,"origin":"","legend":"","description":"","filename":"7bd9bf4479fc4623a9e141e1efeb44b61enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/cf49ceae440512b4d86be210.xml"},{"id":93343276,"identity":"c3ae7caf-22f2-41e3-a45a-8a878a2ffd51","added_by":"auto","created_at":"2025-10-12 14:50:55","extension":"eps","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52821,"visible":true,"origin":"","legend":"","description":"","filename":"drawingimage1.eps","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/475fe1db8ebf11ad55291497.eps"},{"id":93342417,"identity":"b2bc2461-9597-402a-bce1-70e2ab3189c5","added_by":"auto","created_at":"2025-10-12 14:42:55","extension":"eps","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":51875,"visible":true,"origin":"","legend":"","description":"","filename":"drawingimage10.eps","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/1cfb6aa7d79e54ecd6edcaa6.eps"},{"id":93343280,"identity":"cd6c2e9b-0765-4092-9fd3-d4d424fcfeb9","added_by":"auto","created_at":"2025-10-12 14:50:55","extension":"eps","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":51859,"visible":true,"origin":"","legend":"","description":"","filename":"drawingimage2.eps","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/34b672a0a7e0479a74ec2385.eps"},{"id":93342441,"identity":"2f9458ad-42b9-4847-b8d8-412a84c93593","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"eps","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":51933,"visible":true,"origin":"","legend":"","description":"","filename":"drawingimage3.eps","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/b2a1fb65aa039bc0be11ba35.eps"},{"id":93343277,"identity":"c7eded2c-6386-4d0c-adf9-083bedc93f91","added_by":"auto","created_at":"2025-10-12 14:50:55","extension":"eps","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52149,"visible":true,"origin":"","legend":"","description":"","filename":"drawingimage4.eps","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/ef65712d5211438f44f7ca5f.eps"},{"id":93342458,"identity":"20f601e5-9602-4701-b0ef-e1ca3e7a4648","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"eps","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52747,"visible":true,"origin":"","legend":"","description":"","filename":"drawingimage5.eps","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/f36063e12d751609c9d5a365.eps"},{"id":93343312,"identity":"ce88ba61-83eb-4ae3-8fb6-100401bc70cd","added_by":"auto","created_at":"2025-10-12 14:50:58","extension":"eps","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52031,"visible":true,"origin":"","legend":"","description":"","filename":"drawingimage6.eps","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/f1045586adcc675a20a0ba5b.eps"},{"id":93343308,"identity":"e6edb411-35d6-41a0-9300-40d2789bfa0d","added_by":"auto","created_at":"2025-10-12 14:50:57","extension":"eps","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":53213,"visible":true,"origin":"","legend":"","description":"","filename":"drawingimage7.eps","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/abd4442a749aa3ee92c449dd.eps"},{"id":93342446,"identity":"a034ac56-d261-45fd-a2da-2d2690b1ad77","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"eps","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52369,"visible":true,"origin":"","legend":"","description":"","filename":"drawingimage8.eps","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/ba94cab29bd08065697e30ec.eps"},{"id":93343290,"identity":"d82011d8-2432-48c6-bc99-0cfbd34f63b7","added_by":"auto","created_at":"2025-10-12 14:50:56","extension":"eps","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52097,"visible":true,"origin":"","legend":"","description":"","filename":"drawingimage9.eps","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/7cf416a0f563c913a57a1ae5.eps"},{"id":93343956,"identity":"31ef584a-1437-4c99-a94d-03367bd95bff","added_by":"auto","created_at":"2025-10-12 14:58:55","extension":"jpeg","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":96566,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/16397de2d8d0f2ee774bca7c.jpeg"},{"id":93343313,"identity":"1ab108d0-9f25-466c-8a65-63f8618ec51b","added_by":"auto","created_at":"2025-10-12 14:50:58","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":79433,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/828c809200e1479fdc4ce551.png"},{"id":93342473,"identity":"042bb4fb-1b60-4bfe-8ca5-68f9308822c9","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":65723,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/d0c6e376a24db2c0172df781.png"},{"id":93343311,"identity":"7f938f3b-e1dc-4370-b152-75ce8d16720e","added_by":"auto","created_at":"2025-10-12 14:50:58","extension":"png","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":72489,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/8b52cc19bd36b2fbf7b63748.png"},{"id":93343964,"identity":"41cf59e5-5cd4-4a8c-9b02-d46c2f74be6f","added_by":"auto","created_at":"2025-10-12 14:58:57","extension":"png","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":24395,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage13.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/e423c0cc395dd6808596543b.png"},{"id":93342464,"identity":"ff5f72b3-d45c-4751-8ceb-43865430071b","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"png","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1867794,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage14.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/e7d4916d9592eca86b49b802.png"},{"id":93343299,"identity":"3f56649d-632e-4366-b91c-88f8ff96f542","added_by":"auto","created_at":"2025-10-12 14:50:57","extension":"jpeg","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":61224,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage15.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/2f5eff96f41f3684b4985451.jpeg"},{"id":93342429,"identity":"dc055933-c3af-4763-906c-a1c98e32292a","added_by":"auto","created_at":"2025-10-12 14:42:55","extension":"png","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":50586,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage16.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/65316b13023686338e274a9e.png"},{"id":93343285,"identity":"02b7e875-4e0a-4ccf-961f-3b709763fc41","added_by":"auto","created_at":"2025-10-12 14:50:55","extension":"png","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":54318,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage17.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/3c1b2d2059149f1be6400c61.png"},{"id":93342426,"identity":"e6935ca9-b0db-4048-a6bf-0d512b67c803","added_by":"auto","created_at":"2025-10-12 14:42:55","extension":"jpeg","order_by":22,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":348092,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage18.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/ab4660a2c7f3b609a81f1784.jpeg"},{"id":93342431,"identity":"8810db2e-bc96-449e-830e-d722dee819bd","added_by":"auto","created_at":"2025-10-12 14:42:55","extension":"png","order_by":23,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":56361,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage19.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/35dca88275537b189a381a9c.png"},{"id":93342450,"identity":"bf8dd61f-30f4-42f1-91a2-d5e488660ccd","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"png","order_by":24,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":771591,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/8c83962be98886423181b365.png"},{"id":93342459,"identity":"fbe888ec-a1af-4949-870c-d1c2392d2d1b","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"png","order_by":25,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":527515,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage20.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/1fba011985e306b823f63e36.png"},{"id":93343287,"identity":"c8abb962-3476-4575-b0bd-3957e985cfaf","added_by":"auto","created_at":"2025-10-12 14:50:56","extension":"png","order_by":26,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":658251,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage21.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/7f77aa550d7f50e804c5b855.png"},{"id":93343304,"identity":"aabf947b-1afc-48ba-ae64-4a690bd43e65","added_by":"auto","created_at":"2025-10-12 14:50:57","extension":"png","order_by":27,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":522263,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage22.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/765749ad3b5012857226fbbf.png"},{"id":93344086,"identity":"ee0cbeb5-faae-48ee-9e61-dab07a415a3c","added_by":"auto","created_at":"2025-10-12 15:06:56","extension":"png","order_by":28,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":499073,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage23.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/7a51faa21007af24d3c3c118.png"},{"id":93342436,"identity":"21ab1f6e-86e7-4d1b-9876-04e4e8d2f2dd","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"png","order_by":29,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":431866,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage24.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/375a3e64fcf85dc40bb1a339.png"},{"id":93343289,"identity":"bbcaa83f-7611-44ae-9efa-aebd5ae43acc","added_by":"auto","created_at":"2025-10-12 14:50:56","extension":"png","order_by":30,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":396419,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage25.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/b9c2c5976ee9de8e01ba40eb.png"},{"id":93343958,"identity":"9c210f4f-802f-4859-a477-0f17fa315b33","added_by":"auto","created_at":"2025-10-12 14:58:56","extension":"png","order_by":31,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":384363,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage26.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/eb41d4093436f8c37b18a8b8.png"},{"id":93342475,"identity":"2cb2ea06-b9ee-4c8d-81ff-aa6c435cf936","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"png","order_by":32,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":358834,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage27.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/e241f876b6a896ee26b7d1b2.png"},{"id":93342444,"identity":"607ba96d-79cb-4d6a-b059-a6470f0912df","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"png","order_by":33,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":417010,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage28.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/5b32523ea819081a63072610.png"},{"id":93344088,"identity":"2443371c-92df-4480-b81a-e28d6a299e64","added_by":"auto","created_at":"2025-10-12 15:06:57","extension":"png","order_by":34,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":378632,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage29.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/8108a5b00cedd0e1686ee541.png"},{"id":93342457,"identity":"07188f2a-c425-4fdc-bb94-1c6b3ef85e45","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"png","order_by":35,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":888652,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/89d7bb0f513684b203e7202c.png"},{"id":93343957,"identity":"998d2d0d-74cc-4c0a-bb68-9e381c7c7dc1","added_by":"auto","created_at":"2025-10-12 14:58:56","extension":"jpeg","order_by":36,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1074,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage30.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/bddacf5c45654edf56b682a9.jpeg"},{"id":93343298,"identity":"e6a022f4-a20a-44d3-834f-d7bf02a465c1","added_by":"auto","created_at":"2025-10-12 14:50:57","extension":"jpeg","order_by":37,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":564956,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage31.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/97d8eb0ab9333cc2a8824183.jpeg"},{"id":93342470,"identity":"a527cd40-f387-4528-bc0d-0db2b6880982","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"jpeg","order_by":38,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1074,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage30.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/74be2cf46e773b5c8346001f.jpeg"},{"id":93343297,"identity":"3473d430-3bbf-424b-b9d7-aca985d438cc","added_by":"auto","created_at":"2025-10-12 14:50:57","extension":"png","order_by":39,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":24260,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage33.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/d4225adaeffc8b6ce2a9acd8.png"},{"id":93342443,"identity":"c816ea5b-439e-4127-b964-60b0d7f820ca","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"png","order_by":40,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":888652,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/cc87d41b648fe1bc06dc637c.png"},{"id":93342448,"identity":"466b1b88-febe-4801-b252-1fc61e14a450","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"png","order_by":41,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1035796,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/197e1f24a90b9f860c044336.png"},{"id":93342487,"identity":"e97b9d90-0a3e-4d3a-9384-c01776983c3d","added_by":"auto","created_at":"2025-10-12 14:42:58","extension":"png","order_by":42,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1035796,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/5e5c59d6023d524bd79c3fbc.png"},{"id":93342467,"identity":"5b868b9a-0ef0-41d1-ab25-e4a164f82554","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"png","order_by":43,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":68257,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/30a180735d2cb2e324c129ab.png"},{"id":93344089,"identity":"b00b86ae-dc12-4806-9121-e3753d6585e3","added_by":"auto","created_at":"2025-10-12 15:06:58","extension":"png","order_by":44,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":196575,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/ef7d5c860b3285858e9b52c9.png"},{"id":93342461,"identity":"5d3838ef-633d-4ff7-abfa-7689b29a7f35","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"png","order_by":45,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":80046,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/b1f10cff267154d698fcdd1c.png"},{"id":93342453,"identity":"cfeba009-e17d-4be7-8420-d1c26526c381","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"png","order_by":46,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":28154,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/a86eeef8ef49fc79d4e18f20.png"},{"id":93342499,"identity":"45fb4c0d-a569-40a6-a520-e4ac43e53e1c","added_by":"auto","created_at":"2025-10-12 14:42:58","extension":"png","order_by":47,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":15778,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/b0c78e77a4d0b63e10a37982.png"},{"id":93342500,"identity":"d4ef61af-7ce3-4e49-9988-e7f149aad174","added_by":"auto","created_at":"2025-10-12 14:42:59","extension":"png","order_by":48,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":15804,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/7b75665f9bc198d844263487.png"},{"id":93343963,"identity":"7c2398f0-305d-4fb7-8299-01d6b56a08a4","added_by":"auto","created_at":"2025-10-12 14:58:57","extension":"png","order_by":49,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":15747,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/8586170f0bb71ff36ec12c89.png"},{"id":93343968,"identity":"527d8204-2bf9-4373-94b7-56a508b5913d","added_by":"auto","created_at":"2025-10-12 14:58:58","extension":"png","order_by":50,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":9099,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage13.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/e97ed345133bcff57ec14364.png"},{"id":93342484,"identity":"4bf2295b-5826-4d34-9453-7908720a5375","added_by":"auto","created_at":"2025-10-12 14:42:58","extension":"png","order_by":51,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":154063,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage14.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/a065a5f59c402200e68e0e66.png"},{"id":93343293,"identity":"4e1e2e0d-691e-4464-a28b-2c0ef6bb4d04","added_by":"auto","created_at":"2025-10-12 14:50:56","extension":"png","order_by":52,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":28413,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage15.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/613e2cde5ad4eb357b2a1c20.png"},{"id":93343302,"identity":"44a23606-845f-4f89-83b4-a76d0266351b","added_by":"auto","created_at":"2025-10-12 14:50:57","extension":"png","order_by":53,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":33762,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage16.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/3cbd7bfbe32a3d6a0ca00bb3.png"},{"id":93342510,"identity":"ab74dbd5-76bf-4e50-9482-d48e63029e20","added_by":"auto","created_at":"2025-10-12 14:42:59","extension":"png","order_by":54,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":18444,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage17.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/27f1d0d4975e8666aae04fe6.png"},{"id":93342490,"identity":"a1473209-ee57-4dc5-b9ff-4fa3630d38c6","added_by":"auto","created_at":"2025-10-12 14:42:58","extension":"png","order_by":55,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":68375,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage18.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/bfcab81bfa293979cdad3913.png"},{"id":93343959,"identity":"22c0c622-2e19-4dc2-9ed2-07efbc1d0309","added_by":"auto","created_at":"2025-10-12 14:58:56","extension":"png","order_by":56,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":20123,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage19.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/2f90d3be3c03c5882ebd05e4.png"},{"id":93343310,"identity":"e8b99775-0075-4f34-b78a-b84e033ab573","added_by":"auto","created_at":"2025-10-12 14:50:58","extension":"png","order_by":57,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":129154,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/351f0592e80fef43b41b5ba0.png"},{"id":93342489,"identity":"97bc5e17-b7e4-464e-ac3f-e07215fd5145","added_by":"auto","created_at":"2025-10-12 14:42:58","extension":"png","order_by":58,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":37762,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage20.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/c7eac1b79cd7df1276af3aec.png"},{"id":93343301,"identity":"c6f5f615-1534-465a-b955-7411125ce819","added_by":"auto","created_at":"2025-10-12 14:50:57","extension":"png","order_by":59,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":40827,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage21.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/663a61b5c394c1990f00796c.png"},{"id":93343305,"identity":"35dd3af1-42e1-48fb-b0e2-fc492c7775bc","added_by":"auto","created_at":"2025-10-12 14:50:57","extension":"png","order_by":60,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":35586,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage22.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/55438d915005c2e96f12f49a.png"},{"id":93342486,"identity":"dc37323f-b529-4d22-80b2-c97d28d3fd87","added_by":"auto","created_at":"2025-10-12 14:42:58","extension":"png","order_by":61,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":35430,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage23.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/128b4160e11280745f7c0cfe.png"},{"id":93342468,"identity":"e28e7bba-baef-494c-8705-c8c8be795e2a","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"png","order_by":62,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":26389,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage24.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/5a58b63c80fa0a900aabf748.png"},{"id":93343969,"identity":"d1446588-bd9c-4be6-8bc2-14b03f098e3b","added_by":"auto","created_at":"2025-10-12 14:58:59","extension":"png","order_by":63,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":26028,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage25.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/1c28bad9f25a9b7367f4b12e.png"},{"id":93342465,"identity":"a6c39176-95e9-4392-b0d3-dce6f354a802","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"png","order_by":64,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":23845,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage26.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/5ff69ab6561325f162dba6e9.png"},{"id":93342507,"identity":"94c03a36-a512-4d32-8366-3065d11f67dd","added_by":"auto","created_at":"2025-10-12 14:42:59","extension":"png","order_by":65,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":16961,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage27.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/b714b5da4b363b320d260c63.png"},{"id":93343317,"identity":"f9b3e5f9-9201-4fb5-9e5d-82ab566bd141","added_by":"auto","created_at":"2025-10-12 14:50:58","extension":"png","order_by":66,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":25981,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage28.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/cedd3f981342336a797816f8.png"},{"id":93343966,"identity":"b1e809f2-6a31-4258-bc52-e295158e0202","added_by":"auto","created_at":"2025-10-12 14:58:58","extension":"png","order_by":67,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":33381,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage29.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/d8f0cd0ca5faa40a025501c4.png"},{"id":93342496,"identity":"af4ab68d-cd78-4dfe-964b-47e19ecd0782","added_by":"auto","created_at":"2025-10-12 14:42:58","extension":"png","order_by":68,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":61422,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/5935b33271c1b2472e4b38d2.png"},{"id":93342506,"identity":"0f61bd45-ffe5-4f43-9b7b-03ea2b0491a8","added_by":"auto","created_at":"2025-10-12 14:42:59","extension":"png","order_by":69,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":935,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage30.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/ffe31987597ff74ed72a79f6.png"},{"id":93342503,"identity":"d532081f-ce46-404c-a459-916514d09d1c","added_by":"auto","created_at":"2025-10-12 14:42:59","extension":"png","order_by":70,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":94613,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage31.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/042497a614fcc983985cef00.png"},{"id":93342449,"identity":"7781a606-6a05-4437-9ee1-06b49d0d345f","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"png","order_by":71,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":935,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage30.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/9f39b35e03b784d05f6de13b.png"},{"id":93343320,"identity":"a60e88be-3499-4eb2-a1e9-39376957beff","added_by":"auto","created_at":"2025-10-12 14:50:59","extension":"png","order_by":72,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":11737,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage33.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/d9982940cfa0bfc6c5ac15d9.png"},{"id":93342492,"identity":"ef332c8a-dbc1-45db-8088-3e2960dcb731","added_by":"auto","created_at":"2025-10-12 14:42:58","extension":"png","order_by":73,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":61422,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/d11fe98bc9093c8cab970179.png"},{"id":93342488,"identity":"c27a97d6-30e1-4a5a-8be7-fa12b6df7af7","added_by":"auto","created_at":"2025-10-12 14:42:58","extension":"png","order_by":74,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":81562,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/ff7cd1dcb03b75bbb64b7a2f.png"},{"id":93342494,"identity":"50dae4ed-572c-4d59-8acf-010fd781a4d5","added_by":"auto","created_at":"2025-10-12 14:42:58","extension":"png","order_by":75,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":81562,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/1c4a407b10194665ba6af162.png"},{"id":93342456,"identity":"be146a95-a914-4c9d-ab8d-80589dab6205","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"png","order_by":76,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":10760,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/2464390850e0f5e7d5678e78.png"},{"id":93342472,"identity":"2be11dae-f514-467f-9efe-c8bc5bbc4716","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"png","order_by":77,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52426,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/bf1a6af3de6096947443b4dc.png"},{"id":93342483,"identity":"ad3ea3f7-d447-4698-bc97-e1e6ffb3bd67","added_by":"auto","created_at":"2025-10-12 14:42:58","extension":"png","order_by":78,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":21637,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/2393b709f7c3a27a93fc4f2a.png"},{"id":93342477,"identity":"cbf663ac-e09a-4597-959b-7267e3883c53","added_by":"auto","created_at":"2025-10-12 14:42:57","extension":"xml","order_by":79,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":152466,"visible":true,"origin":"","legend":"","description":"","filename":"7bd9bf4479fc4623a9e141e1efeb44b61structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/97f4556cb892d8f114c78642.xml"},{"id":93343967,"identity":"bc8291a1-60f0-4a47-a5fa-c00603ae567f","added_by":"auto","created_at":"2025-10-12 14:58:58","extension":"html","order_by":80,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":168537,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/b6173d590de43a3201b44496.html"},{"id":93343279,"identity":"108b8a0c-adca-4068-85e8-e5555c5de891","added_by":"auto","created_at":"2025-10-12 14:50:55","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":158359,"visible":true,"origin":"","legend":"\u003cp\u003eMethodology of the proposed approach, including four stages: BIM Generation, Image Capture, Domain Adaptation and Feature Matching, and Error Analysis\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/e804934e03e5dececb2431b7.png"},{"id":93342434,"identity":"04d0638d-0f74-4db9-8d1e-c04c9eb30847","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":240254,"visible":true,"origin":"","legend":"\u003cp\u003eGenerated BIM-based on a point cloud captured by a mobile laser scanning system\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/f5c81b8eedfea5a0f80e1193.png"},{"id":93342437,"identity":"938cb9cf-cb0d-4318-871c-fa36014522eb","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":197104,"visible":true,"origin":"","legend":"\u003cp\u003e(a) Sample BIM images along the trajectory (b) Corresponding real images\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/d0b150f7c1834ff39bbe60be.png"},{"id":93342412,"identity":"6aaecced-1a7a-4988-b46f-e37fd2d5af22","added_by":"auto","created_at":"2025-10-12 14:42:55","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":80264,"visible":true,"origin":"","legend":"\u003cp\u003eHoloLens Trajectorywith the start and end locations as point \"A\" and two abrupt turns over a short distance as point “B”.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/4cca500436f6d4275bc8e74d.png"},{"id":93342414,"identity":"63fe815c-c8d3-4bd5-be99-ae1aa4340633","added_by":"auto","created_at":"2025-10-12 14:42:55","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":96297,"visible":true,"origin":"","legend":"\u003cp\u003eRegistered point cloud to BIM\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/b606cdf84a38342ee6b00dc9.png"},{"id":93342422,"identity":"77848fec-295f-48ca-b775-f52e8d16bb90","added_by":"auto","created_at":"2025-10-12 14:42:55","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":133732,"visible":true,"origin":"","legend":"\u003cp\u003e(a) Real Images (b) CycleGAN style-transferred Images\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/fb9fea139e5c9a6fb83ebbfb.png"},{"id":93343961,"identity":"2af37ab6-c7a0-45a3-89b2-8817c8c8332a","added_by":"auto","created_at":"2025-10-12 14:58:57","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":16264,"visible":true,"origin":"","legend":"\u003cp\u003eImage Rescaling\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/72d7700616768f471899f060.png"},{"id":93342424,"identity":"51edb996-7d0f-49fc-88d8-bd48a285eaa3","added_by":"auto","created_at":"2025-10-12 14:42:55","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":145474,"visible":true,"origin":"","legend":"\u003cp\u003eKeypoints (Left: CycleGAN style transferred Image; Right: Corresponding BIM image)\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/482e5042aa350495ef90238e.png"},{"id":93342433,"identity":"4e644c53-2c73-4b4a-a8c9-636b9c8a129d","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":71025,"visible":true,"origin":"","legend":"\u003cp\u003eReprojected points\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/e40bc9f5d6d6ad2f3b7bbad0.png"},{"id":93342423,"identity":"b1bdb73b-061b-43a3-8121-cd905f3627dd","added_by":"auto","created_at":"2025-10-12 14:42:55","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":47221,"visible":true,"origin":"","legend":"\u003cp\u003eRMSE in each image pairalong the trajectory of the camera\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/86e81854396651f537aca131.png"},{"id":93343286,"identity":"e416e334-25d3-45bc-aedd-c3b858e84810","added_by":"auto","created_at":"2025-10-12 14:50:56","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":39547,"visible":true,"origin":"","legend":"\u003cp\u003eColorized Initial and Final RMSE along the trajectorybefore and after PnP\u003c/p\u003e","description":"","filename":"11.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/6d23e8b96d227d238b8405cf.png"},{"id":93343954,"identity":"c1201c59-d436-4f83-bc71-90c1b99a9434","added_by":"auto","created_at":"2025-10-12 14:58:55","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":114681,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003e(a) CycleGAN Image(b) BIM image , \u003c/strong\u003eErroneous correspondence (circled)\u003c/p\u003e","description":"","filename":"12.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/52a8c4be03f6880dd4bd94d3.png"},{"id":93342435,"identity":"91bdc471-b1be-4bc5-8ba9-ec6abbd9293b","added_by":"auto","created_at":"2025-10-12 14:42:56","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":42677,"visible":true,"origin":"","legend":"\u003cp\u003eSection analysis\u003c/p\u003e","description":"","filename":"13.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/75e1c6e4672dcdd89116649f.png"},{"id":93342419,"identity":"814bc59b-d8fb-4787-8501-34794cb803c4","added_by":"auto","created_at":"2025-10-12 14:42:55","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":474757,"visible":true,"origin":"","legend":"\u003cp\u003e(a) Start of the trajectory (S), (b) Turning Point (T), (c) Turning Point (U), (d) Middle of section B, (e) Middle of section C, (f) Middle of section D, (g) Middle of section E, (h) End of Section D (i), the beginning of Section E, (j) Middle of section F.\u003c/p\u003e","description":"","filename":"14.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/7a7283c3771fe2771ac3f997.png"},{"id":93343283,"identity":"fc72285b-daff-4cbe-920b-baf5dd38abbe","added_by":"auto","created_at":"2025-10-12 14:50:55","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":203278,"visible":true,"origin":"","legend":"\u003cp\u003eRemoved image pairs from calculating RMSE (a) location V (b) location W (c) location Y (d) location Z\u003c/p\u003e","description":"","filename":"15.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/2b345d2bbac320b83bb9edb4.png"},{"id":93343282,"identity":"451bcd83-bd7f-401a-817d-334c3947f0d8","added_by":"auto","created_at":"2025-10-12 14:50:55","extension":"png","order_by":16,"title":"Figure 16","display":"","copyAsset":false,"role":"figure","size":42086,"visible":true,"origin":"","legend":"\u003cp\u003eCDF plot of alignment errors before and after PnP\u003c/p\u003e","description":"","filename":"16.png","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/0358a9a9d97b93bb2a1ad630.png"},{"id":99695201,"identity":"f6d7f59d-74d3-4f66-b5a4-ec98e22923df","added_by":"auto","created_at":"2026-01-07 10:55:43","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3227174,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7812864/v1/0f764d8c-89cf-425a-8d3b-140a0496f101.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Drift-free BIM Alignment for Mixed Reality Visualization through Image Style Transfer and Feature Matching","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eMixed Reality (MR) is a technology that facilitates the integration of physical environments with virtual elements, thereby creating immersive user experiences. MR enables the interaction between real and virtual components, leading to a seamless blend of the two realms (M. Muthalif et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). These technologies are increasingly applied in industries such as education (Osadchyi et al., \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), tourism (Gharaibeh et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), navigation (Liu et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), military (Livingston et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2010\u003c/span\u003e), and construction (Bouchlaghem et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2005\u003c/span\u003e; Shin \u0026amp; Dunston, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2008\u003c/span\u003e), where accurate visual representation and manipulation of digital data are crucial.\u003c/p\u003e\u003cp\u003eBuilding Information Modeling (BIM) plays a vital role in enhancing the effectiveness of MR in the construction industry (Garbett et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Irizarry et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Volk et al., \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). BIM serves as a digital representation of the physical and functional characteristics of a building, allowing for enhanced visualization and facilitating better decision-making and project management (Alizadehsalehi et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Li et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Integrating BIM with MR allows for more intuitive and real-time interaction between the virtual and the real world (M. Radanovic, 2023). This combination enables more effective visualization of hidden elements (Abdul Muthalif et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; M. Z. A. Muthalif et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) and facilitates tasks like progress tracking, maintenance, and scenario simulation in construction (Albahbah et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Hsieh et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eA key requirement for MR visualization of BIM geometries is accurate estimation of the MR camera pose, which involves determining the position and orientation of the devices within an indoor space. The absence of Global Navigation Satellite System (GNSS) signals indoors complicates this process, prompting research into alternative methods that can provide reliable, real-time localization without relying on GNSS (Milad Ramezani et al., \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2017\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eTo address these challenges, infrastructure-based techniques such as WiFi, Bluetooth, ultrasound, and ultra-wideband (UWB) have been developed. These systems estimate position based on metrics like signal strength and time-of-flight, but require considerable infrastructure investments, which may not always be practical (Williams et al., \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). As a result, there is growing interest in infrastructure-independent methods that do not depend on additional hardware.\u003c/p\u003e\u003cp\u003eInfrastructure-independent methods, like visual-odometry, utilize solely visual observations to estimate the movement of a device along the trajectory (Ramezani et al., \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). This method relies heavily on the quality of the images, and any degradation in image clarity or detail can significantly impact the accuracy of the motion estimates (Qin et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Another popular infrastructure-independent method is Simultaneous Localization and Mapping (SLAM) is a process by which the device constructs a map of an unknown environment while simultaneously determining its position within that map, using sensor data like cameras, LiDAR, or inertial measurements (Mur-Artal et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). However, these methods suffer from the accumulation of errors that can arise along the trajectory over time and the distance from the initialization of the device (Acharya, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Hsieh et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Mur-Artal et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2015\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eModel-based localization methods have gained increasing attention for their ability to align camera poses using digital representations such as BIM. These approaches offer an infrastructure-independent solution by leveraging pre-existing 3D models of the environment to estimate camera positions without the need for physical markers or external hardware (Acharya, Khoshelham, et al., 2019; Acharya, Ramezani, et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; K. Chen et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Mahmood et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Vermandere et al., \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). However, the practical deployment of these systems faces persistent limitations. The disparity in visual appearance between synthetic BIM renderings and real-world camera images caused by lighting variations, lack of texture in BIM, and differences in environmental conditions often leads to inaccurate feature matching. Additionally, indoor environments with symmetrical architectural layouts can introduce ambiguity in pose estimation. These challenges become more pronounced in large-scale or dynamic settings, where visual drift and cumulative error along the device\u0026rsquo;s trajectory compromise localization accuracy and reliability.\u003c/p\u003e\u003cp\u003eIn recent years, domain adaptation techniques like Cycle-Consistent Generative Adversarial Network (CycleGAN) have gained popularity in addressing visual mismatches between synthetic BIM renderings and real-world images. By translating synthetic images into photorealistic styles and vice versa, image feature correspondence is improved, which enhances camera pose estimation accuracy (Acharya et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; J. Chen, Li, Liu, et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; J. Chen et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). However, existing CycleGAN-based approaches face several limitations. Their performance often degrades in visually uniform or repetitive environments, where the lack of strong visual cues hinders accurate matching. Additionally, artifacts from GAN training, such as texture inconsistencies or noise, can introduce distortions that compromise localization precision. Moreover, these methods typically focus on the image translation task in isolation, without integrating geometric alignment procedures such as Perspective-n-Point (PnP), which limits their effectiveness in practical AR/MR applications requiring precise spatial alignment.\u003c/p\u003e\u003cp\u003eTo overcome these limitations, this paper introduces a novel approach that leverages the spatial mapping capabilities of the Microsoft HoloLens, an advanced MR headset equipped with integrated depth sensors, combined with CycleGAN-based image translation to enhance camera pose estimation accuracy in MR applications, specifically in construction, leading to more efficient project execution, progress monitoring and quality assurance. Our proposed method effectively eliminates the drift that can occur in MR devices, facilitating continuous and accurate localization throughout the trajectory at any time. Moreover, it can effectively navigate anywhere in the environment, including visually repetitive environments, as it combines multiple components, such as HoloLens, Cycle-GAN, and PnP.\u003c/p\u003e\u003cp\u003eThe contributions of this paper are as follows:\u003c/p\u003e\u003cp\u003e\u003cul\u003e\u003cli\u003e\u003cp\u003eA new method is developed for refining camera pose estimation by combining HoloLens\u0026rsquo; spatial mapping capabilities with CycleGAN-based domain adaptation and geometric matching, enabling robust alignment between real-world HoloLens images and BIM.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eAn investigation of image style transfer to bridge the visual domain gap between BIM images and HoloLens captures, enhancing feature correspondence and effectively eliminating drift errors that accumulate during device movement.\u003c/p\u003e\u003c/li\u003e\u003cli\u003e\u003cp\u003eA comprehensive evaluation of the proposed method using 1,408 image pairs demonstrates improved localization accuracy and reliable alignment in indoor environments.\u003c/p\u003e\u003c/li\u003e\u003c/ul\u003e\u003c/p\u003e\u003cp\u003eThe remainder of this paper is organized as follows: Section 2 presents a comprehensive review of related literature; Section 3 outlines the proposed methodology; Section 4 provides a detailed account of the experimental procedures; Section 5 presents the results and discussions. Section 6 concludes the findings, and finally, Section 7 suggests directions for future research.\u003c/p\u003e"},{"header":"2. Related works","content":"\u003cp\u003eMarker-based visual localization (Einizinab et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Saito et al., \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2007\u003c/span\u003e) methods are the traditional localization methods which uses physical markers placed throughout a building as fixed points for tracking the camera's position. Despite its reliability, this method has limitations, including the need for manual marker placement and vulnerability to occlusion or damage, which can affect system performance. Although recent advancements aim to automate marker placement, reliance on physical markers remains a constraint.\u003c/p\u003e\u003cp\u003eMarkerless localization techniques eliminate the need for physical markers by leveraging natural features within the environment to estimate camera pose. These methods utilize textures, geometry, and other environmental characteristics, providing greater flexibility in dynamic or expansive spaces (Abhishek et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Scargill, \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). One prominent markerless localization approach is visual odometry (Qin et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2019\u003c/span\u003e), SLAM (Mur-Artal et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2015\u003c/span\u003e), VISLAM (Jinyu et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) and RGB-D SLAM (Qin et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) which have proven to be an effective solution in MR applications. SLAM enables the simultaneous construction of maps while tracking the camera's position in real-time (Mur-Artal et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). This capability is particularly crucial in scenarios where continuous localization and map-building are essential (Qin et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). However, SLAM systems can encounter challenges, particularly issues of drift, where small errors accumulate over time, resulting in significant misalignment between the virtual and real environments (Hansen et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eModel-based localization has been instrumental in improving the accuracy of indoor localization systems. For instance, Acharya, Ramezani, et al. (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) introduced the BIM-Tracker model, which aligns real-time camera views with BIM to provide accurate pose estimation without the need for map-building during runtime. Similarly, Mahmood et al. (\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) improved localization accuracy by integrating point cloud data with BIM models. The integration of SLAM with model-based tracking has further extended its applications, particularly in Augmented Reality (AR) presentations for indoor construction sites. One significant advancement is BIM-PoseNet, developed by Acharya, Khoshelham, et al. (2019) which utilizes synthetic images generated from 3D indoor models to estimate camera pose. However, this approach encountered challenges in environments with symmetrical features, leading to localization ambiguities. To address these limitations, Acharya et al. (\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) enhanced BIM-PoseNet by incorporating recurrent deep networks that leverage image sequences, thereby improving error reduction and robustness in complex environments through the use of temporal data. Similarly, Sattler et al. (\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) analyzed CNN-based absolute pose regression (APR) methods, noting that these models tend to approximate poses rather than accurately generalizing to real-world environments and have poor accuracy in a dynamic environment. Ha et al. (\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2018\u003c/span\u003e) encountered similar limitations in their work while addressing indoor localization by matching indoor real images to BIM images using VGG-16 features. While effective, reliance on single images limited performance, especially in dynamic environments. Radanovic et al. (\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) developed an end-to-end CNN that used real and synthetic BIM image pairs to estimate the 6 DoF (Degrees of Freedom) relative camera pose. Some of the main challenges of these methods are in environments with repetitive architectural features, where similarities can lead to localization ambiguities. Additionally, the application of these techniques in large-scale construction projects remains an area ripe for further research (Hsieh et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Vermandere et al., \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). In addition, the above studies highlight deep learning\u0026rsquo;s potential to improve indoor localization while revealing ongoing challenges, such as discrepancies between real-world environments and BIM-rendered geometry. Variations in furniture, lighting, and other dynamic elements can affect the accuracy of alignment, signaling the need for more refined pose estimation techniques, particularly in highly dynamic indoor environments where geometric changes are frequent.\u003c/p\u003e\u003cp\u003eThe domain adaptation techniques like CycleGAN have emerged as important tools for bridging the gap between BIM and real-world images. Domain adaptation is crucial to enhancing the accuracy of pose estimation when models trained on synthetic data are applied to real environments. Zhu et al. (\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) introduced CycleGAN, which addresses this challenge by transforming synthetic BIM images into photorealistic versions, minimizing visual differences between synthetic and real images. This transformation improves feature correspondence and camera pose estimation accuracy across domains.\u003c/p\u003e\u003cp\u003eRecent work by J. Chen, Li, Liu, et al. (\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) demonstrated CycleGAN's effectiveness in indoor localization. By converting BIM renderings into photorealistic images, their method achieved a camera pose accuracy of 1.38 meters and 10.1\u0026deg;, significantly reducing the visual gap between synthetic and real images. However, deep learning methods like CycleGAN still face limitations in uniform architectural environments, where the lack of distinctive features makes it difficult to generate detailed images (Acharya et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2023\u003c/span\u003e)\u003c/p\u003e\u003cp\u003eSufiyan et al. (\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) approached the problem differently by introducing a deep CNN-based workflow for indoor localization using 360-degree panoramic images. Their approach leveraged synthetic data generated from photogrammetry, Open Street Map (OSM), and 3D building models to create comprehensive datasets, leading to improved localization accuracy. Similarly, Hong et al. (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) utilized CycleGAN to enhance scene understanding in indoor facility management. However, like many others, they encountered noise pattern issues during GAN training, affecting the quality of the synthetic data, highlighting the need for better GAN stabilization techniques to ensure higher-quality datasets.\u003c/p\u003e\u003cp\u003eTo address some of these challenges, H. Chen et al. (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) proposed the CycleGAN-Swin Transformer-SRPnP framework, which optimized global image retrieval and 2D-to-3D image coordinate detection. This approach improved computation time and enhanced robustness against noise and motion blur, common challenges in indoor environments. CycleGAN played a crucial role in reducing visual discrepancies between BIM renderings and real images, further improving localization accuracy.\u003c/p\u003e\u003cp\u003eAcharya et al. (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) also investigated synthetic-to-real (S2R-PoseNet) and real-to-synthetic (R2S-PoseNet) adaptation strategies for indoor pose regression. Their findings revealed that real-to-synthetic adaptation outperformed synthetic-to-real adaptation, reducing artifacts from motion blur and incomplete data coverage. This shift emphasizes a growing trend in real-to-synthetic domain adaptation, which simplifies visual matching and reduces the need for highly detailed BIM models, thereby improving localization accuracy in complex environments.\u003c/p\u003e\u003cp\u003eWhile the aforementioned studies demonstrate the growing success of domain adaptation techniques in improving camera pose estimation, several limitations persist. Model-based approaches such as BIM-PoseNet and CNN-based regressors often struggle in geometrically repetitive indoor environments, where ambiguous visual cues lead to mislocalization and drift accumulation over time. These errors are further exacerbated by discrepancies between synthetic BIM visuals and real-world images, as well as by changes in indoor scenes due to lighting, occlusion, or layout variations. While CycleGAN-based methods have helped bridge the visual domain gap, they have been found to produce unstable outputs in texture-sparse or visually uniform environments, limiting their effectiveness in challenging conditions.\u003c/p\u003e\u003cp\u003eTo overcome these limitations, this research leverages the mapping and spatial tracking capabilities of Microsoft HoloLens, which offers robust sensor fusion and real-time scene understanding. The HoloLens\u0026rsquo;s integrated VISLAM and RGB-D SLAM helps mitigate pose ambiguities in environments with repetitive features, offering a more consistent localization baseline. Building upon this foundation, the proposed method introduces a hybrid pipeline that integrates HoloLens imagery with initial pose information with BIM, employs CycleGAN to transform real images into BIM-style visuals, and then performs precise feature matching using KAZE descriptors. This is followed by geometric pose estimation through the PnP algorithm. By integrating deep learning for domain adaptation and geometry-based localization in a unified workflow, the proposed method effectively reduces accumulated trajectory drift and enhances pose estimation accuracy in HoloLens. This approach not only addresses key limitations of prior markerless and model-based systems but also demonstrates robustness in complex indoor environments where traditional methods falter.\u003c/p\u003e"},{"header":"3. Methodology","content":"\u003cp\u003eThe proposed workflow for improving indoor localization accuracy is based on matching image features with the corresponding view of the BIM which provides an accurate estimate of the camera pose in the coordinate system of the BIM and ensures alignment between the BIM and the image. The different appearances of real-world and BIM images lead to numerous false matches. To tackle this issue, we employ an image style transfer method known as CycleGAN to convert real images into BIM-looking images. Next, we perform keypoint extraction through image matching, followed by estimating the camera pose using PnP. The methodology is divided into four key stages, as described in Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e: Generating BIM, Image Capture, Domain Adaptation, Feature Matching, and Error Analysis. Each stage is designed to progressively bridge the visual gap between synthetic and real images and refine the alignment between virtual and physical environments.\u003c/p\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n \u003ch2\u003e3.1 Generating BIM\u003c/h2\u003e\n \u003cp\u003eThe proposed method requires that the BIM be a faithful replica of the real building. The BIM was created using Autodesk Revit, leveraging a dense point cloud acquired through a mobile laser scanning system, specifically the GeoSLAM Zeb Horizon. This scanner facilitated the detailed capture of architectural and structural elements with an accuracy of 1\u0026thinsp;\u0026plusmn;\u0026thinsp;3 cm, including walls, doors, windows, and other critical features essential for accurate spatial modeling. Each architectural component was carefully modeled in Revit to replicate the actual spatial geometry, dimensions, and materials observed in the real-world environment. To ensure the precision of the model, the inter-wall distances and other critical dimensions were manually verified using an Electro Distance Measurement (EDM) laser device. This verification step served to cross-check the geometric fidelity of the BIM against the actual structure, thereby increasing the confidence in the model\u0026apos;s suitability for synthetic image generation and subsequent 3D coordinate mapping.\u003c/p\u003e\n \u003cp\u003eOnce the BIM was finalized, it was imported into Unity, which is a real-time 3D development platform chosen for its compatibility with complex 3D models and its capability to render environments with high efficiency (B\u0026uuml;y\u0026uuml;ksalih et al., \u003cspan class=\"CitationRef\"\u003e2020\u003c/span\u003e). In Unity, the imported BIM was further optimized for real-time performance. This included configuring lighting conditions to reflect those in the actual house and minimizing computational overhead by excluding interior furnishings and detailed textures. As such, the resulting model was classified as a Low Level of Detail (LoD 300) BIM, suitable for pose estimation and localization experiments (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\n \u003ch2\u003e3.2 Image Capture\u003c/h2\u003e\n \u003cp\u003eThe image acquisition phase involves the simultaneous collection of real-world and synthetic datasets, both of which are fundamental to pose estimation and alignment evaluation. A trajectory was completed to capture Real-world images, the image poses and point clouds (processed from the depth information captured by the depth cameras) were obtained using the HoloLens, drawing upon a repository initially introduced by Ungureanu et al. (\u003cspan class=\"CitationRef\"\u003e2020\u003c/span\u003e) Each image was accompanied by pose information derived from the HoloLens\u0026rsquo; internal sensors, including its depth sensors and grayscale tracking cameras.\u003c/p\u003e\n \u003cp\u003eBIM images were captured within Unity, including the 3D coordinates of each pixel in every image frame in CSV file format. These coordinates are the primary ground truth that will be used to calculate the updated pose to correct the HoloLens drift. In total, 1,408 real-world images (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e(b)) and their corresponding synthetic BIM images (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e(a)) were captured.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\n \u003ch2\u003e3.3 Domain Adaptation and Feature Matching\u003c/h2\u003e\n \u003cp\u003eTo bridge the visual discrepancies between real-world HoloLens images and synthetic BIM renderings, a Cycle-Consistent Generative Adversarial Network (CycleGAN) was employed. CycleGANs are particularly effective for unpaired image-to-image translation tasks, allowing for domain adaptation without requiring exact correspondence between source and target image sets. In this context, the CycleGAN was trained on unpaired datasets comprising BIM images and real-world HoloLens captures. The objective was to produce style-transferred images that preserved geometric structure while simulating the texture and lighting characteristics of the real-world scenes (Zhu et al., \u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eFollowing successful training, CycleGAN-generated images were matched with their corresponding synthetic BIM images using KAZE feature detection. KAZE is a robust feature descriptor designed to operate efficiently across varying scales and image nonlinearity (Tareen \u0026amp; Saleem, \u003cspan class=\"CitationRef\"\u003e2018\u003c/span\u003e; Zhang \u0026amp; Yan, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e) and was experimentally found more suitable for this task. The detection process involved grayscale conversion of both CycleGAN and BIM images, followed by extraction and matching of salient features. These feature correspondences were crucial for subsequent pose estimation.\u003c/p\u003e\n \u003cp\u003eSubsequently, the PnP algorithm was then used to estimate camera poses, leveraging the 2D-3D correspondences identified in the previous step. The PnP algorithm computed the optimal rotation and translation vectors that aligned the HoloLens images with the BIM-derived virtual scene. This transformation enabled the accurate projection of 3D points onto the corresponding 2D image plane (Wu \u0026amp; Hu, \u003cspan class=\"CitationRef\"\u003e2006\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n \u003ch2\u003e3.4 Error Analysis\u003c/h2\u003e\n \u003cp\u003eThe final phase of the methodology focuses on evaluating the alignment accuracy achieved through the PnP-based transformation. Using the transformation matrix derived from the PnP algorithm (Gao et al., \u003cspan class=\"CitationRef\"\u003e2003\u003c/span\u003e), 3D coordinates of matched features were reprojected onto the image plane (2D image coordinate system). These were then compared against the original 2D correspondences in the CycleGAN-translated HoloLens images to compute the Root Mean Square Error (RMSE) after the PnP (after-PnP). This error metric quantitatively represents the alignment accuracy between virtual and real environments and is defined as the Euclidean distance between projected and actual 2D points (Lepetit \u0026amp; Fua, \u003cspan class=\"CitationRef\"\u003e2005\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eTo evaluate the effectiveness of the workflow, the RMSE-before, which is the RMSE before performing PnP, was calculated using the transformation matrix obtained during the image registration phase, before any CycleGAN or PnP processing. Elimination of this error after performing the PnP application would suggest that the proposed pipeline, incorporating CycleGAN-based domain adaptation and PnP-based pose estimation, effectively aligns the HoloLens camera with the Unity camera. This outcome substantiates the capability of the proposed approach to enhance the accuracy of real-to-virtual world registration, thereby supporting its applicability in practical MR localization scenarios.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"4. Experiments","content":"\u003cp\u003eThis study was conducted to evaluate the robustness, consistency, and effectiveness of the proposed localization enhancement methodology within a controlled offline setting. MATLAB served as the primary computational environment due to its versatile image processing, computer vision, and mathematical analysis capabilities. All datasets, including those captured from Unity, HoloLens, and CycleGAN, were carefully imported and organized within MATLAB to enable an integrated and iterative experimental workflow.\u003c/p\u003e\n\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\n \u003ch2\u003e4.1 HoloLens Data Acquisition\u003c/h2\u003e\n \u003cp\u003eBefore initiating formal image acquisition, the head-mounted HoloLens moved along a predefined trajectory within a residential indoor environment. This preliminary phase was essential for allowing the HoloLens to build a consistent spatial understanding of the environment, thereby minimizing tracking drift and improving the accuracy of pose estimation during actual image capture. The goal was to establish a stable operational context, which is critical for reliable data acquisition in real-world MR scenarios. It is important to note that it is not always possible during practical situations.\u003c/p\u003e\n \u003cp\u003eFollowing this initialization, the HoloLens device captured 1,454 real-world RGB images at a consistent frame rate of 30 frames per second (fps). The trajectory followed by the operator is illustrated in Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e, with the start and end locations designated as point \u0026quot;A.\u0026quot; The chosen path covered varied lighting, geometry, and material conditions within the indoor environment.\u003c/p\u003e\n \u003cp\u003eA significant challenge was encountered in a narrow corridor denoted by \u0026quot;B\u0026quot; in Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e. This section involved two abrupt turns over a short distance, which posed difficulties for the HoloLens in maintaining accurate pose estimation. Consequently, 46 images from this section were deemed unreliable due to incorrect or missing pose data. Despite multiple attempts to re-capture data in this specific corridor, the localization failures persisted, and the associated frames were ultimately excluded from the final dataset.\u003c/p\u003e\n \u003cp\u003eThe dataset, post-processing, included RGB images, corresponding camera pose information in the HoloLens local coordinate system, and spatially contextualized point cloud segments generated from the HoloLens\u0026rsquo; internal depth sensors. These elements together formed a foundational multimodal dataset necessary for subsequent alignment, synthetic data generation, and evaluation stages.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\n \u003ch2\u003e4.2 Registration of HoloLens point cloud with BIM\u003c/h2\u003e\n \u003cp\u003eIt was crucial to align the HoloLens and BIM coordinate systems to capture BIM images within Unity. This alignment step provided the spatial transformation required to convert real-world camera poses into the coordinate space used by the BIM, thereby ensuring consistency between synthetic and real-world datasets.\u003c/p\u003e\n \u003cp\u003eThe initial step in this alignment process involved merging several segmented point clouds obtained from HoloLens image segments into a single cohesive 3D point cloud. This comprehensive point cloud was imported into CloudCompare software, which was used to perform a two-step registration process. The first step involved a coarse alignment using point-pair registration, allowing rough alignment based on manually selected reference features. The second step involved fine-tuning through ICP registration, which minimized the Euclidean distance between corresponding point features in the merged HoloLens point cloud and the BIM-derived point cloud (Fig. \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e). The final registration achieved an error with an RMSE of 0.024. The transformation matrix, denoted as T\u003csub\u003eHC\u003c/sub\u003e, which accurately mapped HoloLens spatial data into the BIM coordinate system:\u003c/p\u003e\n \u003cp\u003eT\u003csub\u003eHC\u003c/sub\u003e= \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\left[\\begin{array}{cccc}1.000\u0026amp;\\:0.006\\:\u0026amp;\\:-0.006\\:\u0026amp;\\:-1.0395\\\\\\:-0.035\\:\u0026amp;\\:-0.001\\:\u0026amp;\\:-1.000\\:\u0026amp;\\:0.479\\\\\\:-0.007\\:\u0026amp;\\:1.000\\:\u0026amp;\\:0.000\\:\u0026amp;\\:1.1738\\:\\\\\\:0\u0026amp;\\:0\u0026amp;\\:0\u0026amp;\\:1\\end{array}\\right]\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003eThis matrix served as a critical spatial bridge, enabling the transformation of all HoloLens poses into the BIM\u0026rsquo;s reference frame for direct comparison and data fusion. Although this step is performed manually in this work, a range of algorithms exists that can facilitate automated registration in real-time applications (J. Chen, Li, \u0026amp; Lu, \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e; M. Radanovic, 2023; Radanovic et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Vermandere et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n \u003ch2\u003e4.3 Unity Data Preparation and Image Capture\u003c/h2\u003e\n \u003cp\u003eAfter generating the BIM, it was subsequently imported into Unity for the purpose of capturing BIM imagery. To ensure that the BIM and real images have identical geometry, the same intrinsic camera settings as those of the HoloLens RGB camera, employed for capturing real-world images, were integrated in the virtual camera in Unity. Thus, the initial step involved calibrating the HoloLens RGB camera.\u003c/p\u003e\n \u003cp\u003eTo accurately configure. the Unity camera, the HoloLens RGB camera, was calibrated to determine the intrinsic matrix (K values) using the \u0026ldquo;detectCheckerboardPoints\u0026rdquo; library in MATLAB and a checkerboard pattern for this calibration process.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n \u003ch2\u003e4.4 CycleGAN Training for Domain Adaptation\u003c/h2\u003e\n \u003cp\u003eIn order to reduce domain discrepancies between real and synthetic images, a CycleGAN model was trained for unpaired image-to-image translation (Zhu et al., \u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e). The dataset comprised BIM-rendered images and HoloLens images with no one-to-one pairing; these were split in a 9:1:1 ratio into training, validation, and test subsets. Following the original architecture, we used generators based on nine residual‑block architectures and discriminators employing the PatchGAN paradigm. Instance normalization was applied consistently across both generators and discriminators to stabilize style transfer and preserve structure. The model\u0026rsquo;s training objective balanced three loss components: adversarial loss to encourage realism in translated images, cycle‑consistency loss to enforce that mapping there and back returns the original image, and identity loss to prevent unnecessary style shifts when input already lies in the target domain. During training, various checkpoints were evaluated using the validation set to assess the trade-off between visual realism and structural fidelity. Ultimately, the model at epoch 200 produced the best style-transferred results, renderings that most closely matched real-world textures while maintaining the geometric integrity of BIM structures (Fig. \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e (a), (b)).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003e4.5 Image Rescaling\u003c/h2\u003e\n \u003cp\u003eAlthough the original image resolution for both BIM and HoloLens datasets was 760\u0026times;428 pixels, CycleGAN\u0026rsquo;s architecture resized all training inputs to 256\u0026times;256 pixels. To restore spatial consistency, a rescaling operation was conducted using factors of 2.968 (width) and 1.672 (height), bringing the generated CycleGAN outputs back to their original resolution. During the rescaling, the nearest-neighbor interpolation resampling is employed (Fig. \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n \u003ch2\u003e4.6 Image matching\u003c/h2\u003e\n \u003cp\u003eThe rescaled CycleGAN-transformed images and their corresponding BIM images underwent feature matching using the KAZE (Zhang \u0026amp; Yan, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e) algorithm implemented in MATLAB using \u0026ldquo;detectKAZEFeatures\u0026rdquo; function. The images were first converted to grayscale, and keypoints were extracted using KAZE, known for its robustness to non-linear illumination and scale changes. The descriptors were matched between image pairs, and the matched keypoints were visualized and color-coded for interpretability (Fig. \u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\n \u003ch2\u003e4.7 Perspective-n-Point (PnP) Pose Estimation\u003c/h2\u003e\n \u003cp\u003eTo refine the estimated camera poses and correct accumulated drift, the PnP algorithm was employed to compute the transformation between the image space and the BIM\u0026rsquo;s 3D coordinate system. Specifically, the \u0026ldquo;estimateWorldCameraPose\u0026rdquo; function in MATLAB was used to solve the PnP problem by aligning 2D image coordinates extracted from CycleGAN images with their corresponding 3D points from the BIM, as explained in section 3.2.\u003c/p\u003e\n \u003cp\u003eThe 3D spatial coordinates were retrieved from a pre-generated dataset of BIM exported during Unity rendering, while the 2D image coordinates were extracted through feature matching as outlined in Section 3.3. These correspondences were passed to the PnP solver, which uses the Perspective-Three-Point (P3P) algorithm as its underlying method. The P3P approach provides an efficient closed-form solution, especially suitable when at least four 2D\u0026ndash;3D point correspondences are available.\u003c/p\u003e\n \u003cp\u003eTo enhance robustness against mismatches and noise, the M-estimator Sample Consensus (MSAC) (Aijazi et al., \u003cspan class=\"CitationRef\"\u003e2019\u003c/span\u003e; M. Ramezani et al., \u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e) was used to reject outlier correspondences with reprojection errors exceeding 2 pixels. The MSAC implementation involved a maximum of 2,000 iterations and a 99% confidence level, ensuring reliable pose estimation even in the presence of challenging visual conditions or erroneous matches.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\n \u003ch2\u003e4.8 Reprojection of 3D Points\u003c/h2\u003e\n \u003cp\u003eTo validate the improvement in localisation accuracy, the reprojection of 3D BIM points onto the 2D image plane has been carried out before and after applying the PnP algorithm. This comparison has been used to quantify the drift errors present in the initial HoloLens poses and to demonstrate the refinement achieved through the proposed method.\u003c/p\u003e\n \u003cp\u003eThe initial camera poses R\u003csub\u003etran\u003c/sub\u003e, T\u003csub\u003etran\u003c/sub\u003e have been extracted from HoloLens tracking data and used to project known 3D BIM coordinates into the 2D image plane, resulting in the initial set of reprojected points. The corrected camera poses R\u003csub\u003ecam,\u003c/sub\u003e T\u003csub\u003ecam\u003c/sub\u003e have been estimated through the CycleGAN-enhanced PnP algorithm based on matched 2D\u0026ndash;3D keypoint correspondences. Both sets of projections have been computed using the intrinsic parameters of the HoloLens RGB camera, calibrated before experimentation.\u003c/p\u003e\n \u003cp\u003eThe 2D correspondences have been extracted from CycleGAN-translated images using geometric feature matching techniques (as described in Section 3.3). These 2D image points have been compared with the reprojected BIM points derived from both the initial and corrected poses.\u003c/p\u003e\n \u003cp\u003eAs illustrated in Fig. \u003cspan class=\"InternalRef\"\u003e9\u003c/span\u003e, green points denote the projections based on the refined pose, representing the expected location of features in the absence of drift. In contrast, red points have represented the projections from the initial HoloLens poses, highlighting the effect of accumulated drift.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\n \u003ch2\u003e4.9 Error Evaluation\u003c/h2\u003e\n \u003cp\u003eThe accuracy of the camera pose refinement has been quantitatively evaluated by computing RMSE between the 2D image correspondences and the reprojected points generated using both initial and corrected poses using the following formulas. EMSE-before has been calculated for the initial HoloLens poses, whereas the RMSE-after has been derived using the refined values. The value is in image pixels.\u003c/p\u003e\n \u003cp\u003eRMSE-before = \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\sqrt{\\frac{1}{N}\\sum\\:_{1=1}^{N}{‖{P}_{i}^{C}-{P}_{i}^{before}‖}^{2}}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003eRMSE-after = \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:\\sqrt{\\frac{1}{N}\\sum\\:_{1=1}^{N}{‖{P}_{i}^{C}-{P}_{i}^{after}‖}^{2}}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\n \u003cp\u003eWhere, P\u003csub\u003ei\u003c/sub\u003e\u003csup\u003eafter\u003c/sup\u003e is the reprojected point using the estimated camera pose (R\u003csub\u003ecam,\u003c/sub\u003e T\u003csub\u003ecam\u003c/sub\u003e), P\u003csub\u003ei\u003c/sub\u003e\u003csup\u003ebeforel\u003c/sup\u003e is the reprojected point using the HoloLens pose (R\u003csub\u003etran\u003c/sub\u003e, T\u003csub\u003etran\u003c/sub\u003e), \u003cem\u003eN\u003c/em\u003e is the number of inlier points/correspondence in each image pair, and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\:{P}_{i}^{C}\\)\u003c/span\u003e\u003c/span\u003e is the corresponding 2D image point.\u003c/p\u003e\n \u003cp\u003eThese RMSE values have been used to assess the geometric accuracy of the alignment process and to validate the impact of the proposed method in correcting accumulated drift. The 2D correspondences have been treated as ground truth, and reductions in RMSE have indicated improved localisation performance.\u003c/p\u003e\n \u003cp\u003eThe evaluation has confirmed that the proposed CycleGAN-enhanced pose refinement pipeline significantly reduced trajectory drift and improved spatial alignment between real and virtual environments across the 1,408 tested image pairs.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"5. Results and Discussion","content":"\u003cp\u003eThe comprehensive evaluation of the proposed methodology was conducted systematically, repeating the entire process for all 1,408 captured image pairs. To ensure statistical significance and enhance the reliability of the findings, a MATLAB-based computational workflow was executed iteratively 100 times for each image pair. This thorough approach facilitated the calculation of the RMSE for each pair, effectively capturing the average reprojection error between the initial and refined camera poses.\u003c/p\u003e\n\u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e10\u003c/span\u003e illustrates the distribution of RMSE values for each image pair, providing a comparative analysis of pose estimation accuracy across the entire dataset. The red line in the graph represents the RMSE prior to the application of the PnP, with values ranging approximately from 1 to 90 pixels, indicating a substantial degree of drift. In contrast, the blue line illustrates the RMSE after the implementation of the PnP, showcasing a remarkable reduction in error to a range of 1 to 2 pixels. The vertical axis is scaled logarithmically to improve visibility, allowing a clearer comparison of the differences between the two phases of the methodology and highlighting the effectiveness of the pose refinement process.\u003c/p\u003e\n\u003cp\u003eNotably, the gaps observed in the graphs correspond to specific images that were excluded from the PnP evaluation due to an insufficient number of required correspondences identified within those image pairs after feature matching. Additionally, certain images were omitted because of challenges encountered during rapid movements or quick turns with the HoloLens, which resulted in temporary loss of localization.\u003c/p\u003e\n\u003cp\u003eTo further illustrate the results, Fig. \u003cspan class=\"InternalRef\"\u003e11\u003c/span\u003e presents a colorized RMSE plot along the trajectory of the camera. This visualization delineates both the initial and final RMSE values in relation to the camera\u0026apos;s movement throughout the captured scene. Gray points in this plot indicate the images that contributed to the gaps noted in Fig. \u003cspan class=\"InternalRef\"\u003e10\u003c/span\u003e, resulting from insufficient correspondences. The gaps in the trajectory also reflect images captured during periods of compromised localization, thereby enhancing the understanding of how pose estimation accuracy varied under different conditions. It is important to note that during these periods, HoloLens tracking remains active with only minor drift, which gradually increases from below a 2-pixel value until the next drift correction is applied. This contrasts with the significantly larger initial drift observed at the same location when the proposed method is not employed.\u003c/p\u003e\n\u003cp\u003eIt is important to note that not all 1,408 image pairs were included in the RMSE analysis. Specifically, 398 image pairs were excluded due to an insufficient number of reliable feature correspondences required for accurate PnP estimation. A minimum threshold of 10 inlier correspondences was established based on empirical tuning, ensuring a balance between analytical coverage and pose estimation accuracy. Lowering this threshold increased the total number of usable image pairs, but it also led to a higher occurrence of erroneous or spurious feature matches, thereby compromising the reliability of the estimated poses. This trade-off is illustrated in Fig. \u003cspan class=\"InternalRef\"\u003e12\u003c/span\u003e, where an example of an image pair with erroneous correspondences is shown, emphasizing the necessity of enforcing a minimum inlier constraint.\u003c/p\u003e\n\u003cp\u003eFurther, the environment was segmented into distinct sections to facilitate a region-specific analysis. Section A, located at the beginning of the trajectory, demonstrated consistently low RMSE values, indicating high accuracy in localization during the initial phase (Fig. \u003cspan class=\"InternalRef\"\u003e13\u003c/span\u003e).\u003c/p\u003e\n\u003cp\u003eMinimal reprojection error observed in Section A (Fig. \u003cspan class=\"InternalRef\"\u003e14\u003c/span\u003e (a)) in the start of the trajectory. However, in sections involving turns, such as Turning Points \u0026ldquo;T\u0026rdquo; and \u0026ldquo;U\u0026rdquo;, a noticeable increase in RMSE was observed (Fig. s 14 (b) and (c)). This rise in error is attributed to motion-induced blur and reduced image sharpness, which affected the CycleGAN-generated imagery and led to compromised localization from the HoloLens.\u003c/p\u003e\n\u003cp\u003eSections B and C, characterized by a wider hallway and fewer distinctive features, showed progressively increasing RMSE values. The larger scale and uniform textures of these regions posed challenges for HoloLens mapping and further contributed to the accumulation of pose estimation errors (Fig. s 14 (d) and (e)).\u003c/p\u003e\n\u003cp\u003eIn Section D, RMSE continued to increase due to compounding trajectory errors. Nevertheless, a sharp reduction in RMSE occurred at the start of Section E. This improvement resulted from the camera\u0026rsquo;s ability to view extended spatial features, allowing the relocalization process to self-correct based on the broader field of view and increased environmental cues (Fig. s 14 (f) and (g)). Conversely, Section E\u0026rsquo;s confined geometry limited feature visibility, preventing effective relocalization (Fig. s 14 (h) and (i)).\u003c/p\u003e\n\u003cp\u003eSection F presented some of the highest RMSE values across the entire dataset. This trend is associated with the prolonged accumulation of errors due to the drift and the limited number of distinctive features available for accurate relocalization (Fig. \u003cspan class=\"InternalRef\"\u003e14\u003c/span\u003e (j)). The final segment, spanning image pairs from index 1351 to 1408, was particularly problematic. Many of these frames were excluded from RMSE calculations due to insufficient feature correspondences, often because the camera\u0026apos;s view was dominated by homogeneous elements such as plain doors or featureless walls (Fig. \u003cspan class=\"InternalRef\"\u003e15\u003c/span\u003e).\u003c/p\u003e\n\u003cp\u003eTo further quantify the distribution of pose estimation accuracy, we generated a Cumulative Distribution Function (CDF) plot of RMSE-before (red line) and RMSE-after (blue line), as illustrated in Fig. \u003cspan class=\"InternalRef\"\u003e16\u003c/span\u003e. The x-axis is scaled logarithmically to enhance the visual representation of both lines on the graph. This plot provides an aggregated statistical perspective, indicating that only 60% of RMSE values are around 20 pixels, while 90% of RMSE values exceed 30 pixels before implementing the PnP method. However, after applying PnP, all 100% of RMSE values are less than 2 pixels. These results validate the effectiveness of the proposed localization refinement framework and underscore its robustness in eliminating drift errors that can accumulate along the trajectory over distance and time.\u003c/p\u003e"},{"header":"6. Conclusion","content":"\u003cp\u003eThe primary objective of this research is to demonstrate the effectiveness of a novel method for aligning the real and virtual worlds, particularly for MR applications in the construction industry. MR devices like the HoloLens commonly experience cumulative trajectory errors, such as mapping inaccuracies, which typically increase progressively over time and distance from the initial reference point. These incremental errors can significantly degrade localization accuracy, affecting the overall effectiveness and reliability of MR applications. The proposed approach calculates an updated camera pose for the HoloLens device based on a virtual coordinate system.\u003c/p\u003e\u003cp\u003eOur experimental findings clearly illustrate that the proposed methodology effectively eliminate these trajectory-induced errors, substantially improving localization accuracy throughout the trajectory. The experiment results provide quantitative evidence supporting the assertion that MR devices frequently encounter localization inaccuracies, which can be significantly minimized by integrating domain adaptation methods, specifically CycleGAN, with BIM. By utilizing CycleGAN-generated synthetic images that closely match real-world visuals, the feature correspondence quality is enhanced, leading to more accurate camera pose estimation and thus reducing overall localization errors in practical MR scenarios, reducing the need for manual calibration and intervention, thereby streamlining construction workflows and enhancing efficiency.\u003c/p\u003e\u003cp\u003eHowever, the current implementation has been conducted in an offline environment using MATLAB, which limits its immediate applicability to real-time MR experiences. In its current form, the method supports camera pose correction and BIM alignment after data acquisition, but does not yet enable dynamic visualisation or interactive overlay of virtual elements in live settings. As such, future work will focus on adapting and optimising this pipeline for real-time deployment on MR devices like the HoloLens, addressing computational constraints and integrating more efficient inference strategies. Additionally, challenges remain in environments with low texture, occlusion, or architectural symmetry, where feature matching becomes difficult. To extend the robustness of the system, future research should investigate advanced correspondence extraction methods, including multi-view tracking and learning-based keypoint detection, that can support accurate pose estimation under more varied and complex conditions.\u003c/p\u003e\u003cp\u003eBy addressing these limitations, the proposed method has the potential to evolve into a fully automated, real-time MR localisation and tracking solution that enhances not only visualization, but also task automation, progress monitoring, and decision-making across construction and infrastructure applications.\u003c/p\u003e"},{"header":"Statements and Declarations","content":"\u003cp\u003eAcknowledgement\u003c/p\u003e\n\u003cp\u003eThe author would like to express sincere appreciation to Dr Davood Shojaei, Professor Kourosh Khoshelham and Dr Debaditya Acharya for their continued supervision and valuable guidance, technical insights and collaborative support during the implementation and experimentation phases throughout the development of this work. Their input played a vital role in advancing the localization component of this study. The assistance provided by the University of Melbourne in facilitating access to necessary tools and computational resources is also gratefully acknowledged.\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003eThe authors have no competing interests to declare that are relevant to the content of this article.\u003c/li\u003e\n\u003cli\u003eDuring the preparation of this work, the corresponding author (Mohamed Zahlan Abdul Muthalif) used Grammarly in order to check grammar. After using this tool, the author reviewed and edited the content as needed and takes full responsibility for the content of the published article.\u003c/li\u003e\n\u003cli\u003eThe source codes are available for downloading at the link: https://github.com/Mabdulmuthal/MR-localization\u003c/li\u003e\n\u003cli\u003eThe datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.\u003c/li\u003e\n\u003cli\u003eThis work was supported by the University of Melbourne (Application Reference: 644655. 2020).\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eAuthorship contribution\u003c/p\u003e\n\u003cp\u003eMohamed Zahlan Abdul Muthalif is the Corresponding Author and main researcher, who led the research and contributed approximately 60% of the work, including experiment design and execution, data analysis, code development, and manuscript writing.\u003c/p\u003e\n\u003cp\u003eDr Davood Shojaei served, the principal supervisor, who has contributed to the conceptual development of the methodology and provided ongoing guidance throughout the research.\u003c/p\u003e\n\u003cp\u003eProfessor Kourosh Khoshelham is a secondary supervisor who has provided supervisory support, contributed to the development of ideas, and reviewed and revised the manuscript.\u003c/p\u003e\n\u003cp\u003eDr Debaditya Acharya is a secondary supervisor who has contributed to the conceptual framing, supervised technical implementation, assisted in code development, and reviewed the manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAbdul Muthalif, M. Z., Shojaei, D., \u0026amp; Khoshelham, K. (2024). Interactive Mixed Reality Methods for Visualization of Underground Utilities. \u003cem\u003ePFG \u0026ndash; Journal of Photogrammetry, Remote Sensing and Geoinformation Science\u003c/em\u003e. doi:10.1007/s41064-024-00295-x\u003c/li\u003e\n\u003cli\u003eAbhishek, M. T., Aswin, P. S., Akhil, N. C., Souban, A., Muhammedali, S. K., \u0026amp; Vial, A. (2018, 4-7 Dec. 2018). \u003cem\u003eVirtual Lab Using Markerless Augmented Reality.\u003c/em\u003e Paper presented at the 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE).\u003c/li\u003e\n\u003cli\u003eAcharya, D. (2020). \u003cem\u003eVisual indoor localisation using a 3D building model.\u003c/em\u003e UNIVERSITY OF MELBOURNE,\u003c/li\u003e\n\u003cli\u003eAcharya, D., Khoshelham, K., \u0026amp; Winter, S. (2019). BIM-PoseNet: Indoor camera localisation using a 3D indoor model and deep learning from synthetic images. \u003cem\u003eISPRS Journal of Photogrammetry and Remote Sensing, 150\u003c/em\u003e, 245-258.\u003c/li\u003e\n\u003cli\u003eAcharya, D., Ramezani, M., Khoshelham, K., \u0026amp; Winter, S. (2019). BIM-Tracker: A model-based visual tracking approach for indoor localisation using a 3D building model. \u003cem\u003eISPRS Journal of Photogrammetry and Remote Sensing, 150\u003c/em\u003e, 157-171. doi:10.1016/j.isprsjprs.2019.02.014\u003c/li\u003e\n\u003cli\u003eAcharya, D., Singha Roy, S., Khoshelham, K., \u0026amp; Winter, S. (2020). A Recurrent Deep Network for Estimating the Pose of Real Indoor Images from Synthetic Image Sequences. \u003cem\u003eSensors, 20\u003c/em\u003e(19), 5492.\u003c/li\u003e\n\u003cli\u003eAcharya, D., Tatli, C. J., \u0026amp; Khoshelham, K. (2023). Synthetic-real image domain adaptation for indoor camera pose regression using a 3D model. \u003cem\u003eISPRS Journal of Photogrammetry and Remote Sensing, 202\u003c/em\u003e, 405-421. doi:https://doi.org/10.1016/j.isprsjprs.2023.06.013\u003c/li\u003e\n\u003cli\u003eAijazi, A. K., Malaterre, L., Trassoudaine, L., Chateau, T., \u0026amp; Checchin, P. (2019). Automatic Detection and Modeling of Underground Pipes Using a Portable 3D LiDAR System. \u003cem\u003eSensors (Basel), 19\u003c/em\u003e(24). doi:10.3390/s19245345\u003c/li\u003e\n\u003cli\u003eAlbahbah, M., Kıvrak, S., \u0026amp; Arslan, G. (2021). Application areas of augmented reality and virtual reality in construction project management: A scoping review. \u003cem\u003eJournal of Construction Engineering, 4\u003c/em\u003e(3), 151-172.\u003c/li\u003e\n\u003cli\u003eAlizadehsalehi, S., Hadavi, A., \u0026amp; Huang, J. C. (2020). From BIM to extended reality in AEC industry. \u003cem\u003eAutomation in Construction, 116\u003c/em\u003e. doi:10.1016/j.autcon.2020.103254\u003c/li\u003e\n\u003cli\u003eBouchlaghem, D., Shang, H., Whyte, J., \u0026amp; Ganah, A. (2005). Visualisation in architecture, engineering and construction (AEC). \u003cem\u003eAutomation in Construction, 14\u003c/em\u003e(3), 287-295. doi:10.1016/j.autcon.2004.08.012\u003c/li\u003e\n\u003cli\u003eB\u0026uuml;y\u0026uuml;ksalih, G., Kan, T., \u0026Ouml;zkan, G. E., Meri\u0026ccedil;, M., Isın, L., \u0026amp; Kersten, T. P. (2020). Preserving the Knowledge of the Past Through Virtual Visits: From 3D Laser Scanning to Virtual Reality Visualisation at the Istanbul \u0026Ccedil;atalca İnceğiz Caves. \u003cem\u003ePFG \u0026ndash; Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 88\u003c/em\u003e(2), 133-146. doi:10.1007/s41064-020-00091-3\u003c/li\u003e\n\u003cli\u003eChen, H., Yang, H., Chen, J., Zhang, S., \u0026amp; Jing, X. (2024). Bim Aided Indoor Camera Pose Estimation Based on Cross-Domain Image Retrieval. \u003cem\u003eAvailable at SSRN 4913115\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eChen, J., Li, S., Liu, D., \u0026amp; Lu, W. (2022). Indoor camera pose estimation via style-transfer 3D models. \u003cem\u003eComputer-Aided Civil and Infrastructure Engineering, 37\u003c/em\u003e(3), 335-353. doi:https://doi.org/10.1111/mice.12714\u003c/li\u003e\n\u003cli\u003eChen, J., Li, S., \u0026amp; Lu, W. (2022). Align to locate: Registering photogrammetric point clouds to BIM for robust indoor localization. \u003cem\u003eBuilding and Environment, 209\u003c/em\u003e, 108675. doi:https://doi.org/10.1016/j.buildenv.2021.108675\u003c/li\u003e\n\u003cli\u003eChen, J., Li, S., Lu, W., Liu, D., Hu, D., \u0026amp; Tang, M. (2021). \u003cem\u003eMarkerless Augmented Reality for Facility Management: Automated Spatial Registration based on Style Transfer Generative Network.\u003c/em\u003e Paper presented at the Proceedings of the 38th International Symposium on Automation and Robotics in Construction (ISARC), International Association for Automation and Robotics in Construction (IAARC).\u003c/li\u003e\n\u003cli\u003eChen, K., Chen, W., Li, C. T., \u0026amp; Cheng, J. C. (2019). A BIM-based location aware AR collaborative framework for facility maintenance management. \u003cem\u003eJ. Inf. Technol. Constr., 24\u003c/em\u003e, 360-380.\u003c/li\u003e\n\u003cli\u003eEinizinab, S., Khoshelham, K., Winter, S., \u0026amp; Christopher, P. (2023, 8-11 Oct. 2023). \u003cem\u003eOffset-Based Marker Placement for BIM Alignment in Mixed Reality.\u003c/em\u003e Paper presented at the 2023 IEEE International Conference on Image Processing Challenges and Workshops (ICIPCW).\u003c/li\u003e\n\u003cli\u003eGao, X.-S., Hou, X.-R., Tang, J., \u0026amp; Cheng, H.-F. (2003). Complete solution classification for the perspective-three-point problem. \u003cem\u003eIEEE transactions on pattern analysis and machine intelligence, 25\u003c/em\u003e(8), 930-943.\u003c/li\u003e\n\u003cli\u003eGarbett, J., Hartley, T., \u0026amp; Heesom, D. (2021). A multi-user collaborative BIM-AR system to support design and construction. \u003cem\u003eAutomation in Construction, 122\u003c/em\u003e. doi:10.1016/j.autcon.2020.103487\u003c/li\u003e\n\u003cli\u003eGharaibeh, M. K., Gharaibeh, N. K., Khan, M. A., karim Abu-ain, W. A., \u0026amp; Alqudah, M. K. (2021). Intention to Use Mobile Augmented Reality in the Tourism Sector.\u003c/li\u003e\n\u003cli\u003eHa, I., Kim, H., Park, S., \u0026amp; Kim, H. (2018). Image-based Indoor Localization using BIM and Features of CNN. In (Vol. 35, pp. 1-4). Waterloo: IAARC Publications.\u003c/li\u003e\n\u003cli\u003eHansen, L. H., Fleck, P., Stranner, M., Schmalstieg, D., \u0026amp; Arth, C. (2021). Augmented Reality for Subsurface Utility Engineering, Revisited. \u003cem\u003eIEEE transactions on visualization and computer graphics, 27\u003c/em\u003e(11), 4119-4128.\u003c/li\u003e\n\u003cli\u003eHong, Y., Park, S., \u0026amp; Kim, H. (2020). \u003cem\u003eSynthetic data generation for indoor scene understanding using BIM.\u003c/em\u003e Paper presented at the ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction.\u003c/li\u003e\n\u003cli\u003eHsieh, C.-C., Chen, H.-M., \u0026amp; Wang, S.-K. (2023). On-site Visual Construction Management System Based on the Integration of SLAM-based AR and BIM on a Handheld Device. \u003cem\u003eKSCE Journal of Civil Engineering, 27\u003c/em\u003e(11), 4688-4707. doi:10.1007/s12205-023-1939-2\u003c/li\u003e\n\u003cli\u003eIrizarry, J., Karan, E. P., \u0026amp; Jalaei, F. (2013). Integrating BIM and GIS to improve the visual monitoring of construction supply chain management. \u003cem\u003eAutomation in Construction, 31\u003c/em\u003e, 241-254. doi:10.1016/j.autcon.2012.12.005\u003c/li\u003e\n\u003cli\u003eJinyu, L., Bangbang, Y., Danpeng, C., Nan, W., Guofeng, Z., \u0026amp; Hujun, B. (2019). Survey and evaluation of monocular visual-inertial SLAM algorithms for augmented reality. \u003cem\u003eVirtual Reality \u0026amp; Intelligent Hardware, 1\u003c/em\u003e(4), 386-410. doi:10.1016/j.vrih.2019.07.002\u003c/li\u003e\n\u003cli\u003eLepetit, V., \u0026amp; Fua, P. (2005). \u003cem\u003eMonocular model-based 3D tracking of rigid objects\u003c/em\u003e: Now Publishers Inc.\u003c/li\u003e\n\u003cli\u003eLi, X., Yi, W., Chi, H.-L., Wang, X., \u0026amp; Chan, A. P. C. (2018). A critical review of virtual and augmented reality (VR/AR) applications in construction safety. \u003cem\u003eAutomation in Construction, 86\u003c/em\u003e, 150-162. doi:10.1016/j.autcon.2017.11.003\u003c/li\u003e\n\u003cli\u003eLiu, B., Ding, L., Wang, S., \u0026amp; Meng, L. (2022). Designing Mixed Reality-Based Indoor Navigation for User Studies. \u003cem\u003eKN-Journal of Cartography and Geographic Information\u003c/em\u003e, 1-10.\u003c/li\u003e\n\u003cli\u003eLivingston, M. A., Ai, Z., Karsch, K., \u0026amp; Gibson, G. O. (2010). User interface design for military AR applications. \u003cem\u003eVirtual Reality, 15\u003c/em\u003e(2-3), 175-184. doi:10.1007/s10055-010-0179-1\u003c/li\u003e\n\u003cli\u003eM. Radanovic, K. K., C. S. Fraser, D. Acharya. (2023). CONTINUOUS BIM ALIGNMENT FOR MIXED REALITY VISUALISATION. \u003cem\u003eISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, X-1/W1-2023\u003c/em\u003e, 279-286. doi:10.5194/isprs-annals-X-1-W1-2023-279-2023\u003c/li\u003e\n\u003cli\u003eMahmood, B., Han, S., \u0026amp; Lee, D.-E. (2020). BIM-Based Registration and Localization of 3D Point Clouds of Indoor Scenes Using Geometric Features for Augmented Reality. \u003cem\u003eRemote Sensing, 12\u003c/em\u003e(14), 2302. Retrieved from https://www.mdpi.com/2072-4292/12/14/2302\u003c/li\u003e\n\u003cli\u003eMur-Artal, R., Montiel, J. M. M., \u0026amp; Tardos, J. D. (2015). ORB-SLAM: A Versatile and Accurate Monocular SLAM System. \u003cem\u003eIEEE Transactions on Robotics, 31\u003c/em\u003e(5), 1147-1163. doi:10.1109/tro.2015.2463671\u003c/li\u003e\n\u003cli\u003eMuthalif, M., Shojaei, D., \u0026amp; Khoshelham, K. (2022). A review of augmented reality visualization methods for subsurface utilities. \u003cem\u003eAdvanced Engineering Informatics, 51\u003c/em\u003e, 101498.\u003c/li\u003e\n\u003cli\u003eMuthalif, M. Z. A., Shojaei, D., \u0026amp; Khoshelham, K. (2022). RESOLVING PERCEPTUAL CHALLENGES OF VISUALIZING UNDERGROUND UTILITIES IN MIXED REALITY. \u003cem\u003eInt. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLVIII-4/W4-2022\u003c/em\u003e, 101-108. doi:10.5194/isprs-archives-XLVIII-4-W4-2022-101-2022\u003c/li\u003e\n\u003cli\u003eOsadchyi, V., Valko, N., \u0026amp; Kuzmich, L. (2021). \u003cem\u003eUsing augmented reality technologies for STEM education organization.\u003c/em\u003e Paper presented at the Journal of Physics: Conference Series.\u003c/li\u003e\n\u003cli\u003eQin, J., Li, M., Liao, X., \u0026amp; Zhong, J. (2019). Accumulative Errors Optimization for Visual Odometry of ORB-SLAM2 Based on RGB-D Cameras. \u003cem\u003eISPRS International Journal of Geo-Information, 8\u003c/em\u003e(12), 581. Retrieved from https://www.mdpi.com/2220-9964/8/12/581\u003c/li\u003e\n\u003cli\u003eRadanovic, M., Khoshelham, K., Fraser, C., \u0026amp; Acharya, D. (2023). Continuous Bim Alignment for Mixed Reality Visualisation. \u003cem\u003eISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 10\u003c/em\u003e, 279-286.\u003c/li\u003e\n\u003cli\u003eRamezani, M., Acharya, D., Gu, F., \u0026amp; Khoshelham, K. (2017). INDOOR POSITIONING BY VISUAL-INERTIAL ODOMETRY. \u003cem\u003eISPRS Annals of Photogrammetry, Remote Sensing \u0026amp; Spatial Information Sciences, 4\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eRamezani, M., Acharya, D., Gu, F., \u0026amp; Khoshelham, K. (2017). Indoor Positioning by Visual-Inertial Odometry. \u003cem\u003eISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, IV-2/W4\u003c/em\u003e, 371-376. doi:10.5194/isprs-annals-IV-2-W4-371-2017\u003c/li\u003e\n\u003cli\u003eRamezani, M., Khoshelham, K., \u0026amp; Fraser, C. (2018). Pose estimation by Omnidirectional Visual-Inertial Odometry. \u003cem\u003eRobotics and Autonomous Systems, 105\u003c/em\u003e, 26-37. doi:10.1016/j.robot.2018.03.007\u003c/li\u003e\n\u003cli\u003eSaito, S., Hiyama, A., Tanikawa, T., \u0026amp; Hirose, M. (2007, 10-14 March 2007). \u003cem\u003eIndoor Marker-based Localization Using Coded Seamless Pattern for Interior Decoration.\u003c/em\u003e Paper presented at the 2007 IEEE Virtual Reality Conference.\u003c/li\u003e\n\u003cli\u003eSattler, T., Zhou, Q., Pollefeys, M., \u0026amp; Leal-Taixe, L. (2019). \u003cem\u003eUnderstanding the limitations of cnn-based absolute camera pose regression.\u003c/em\u003e Paper presented at the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.\u003c/li\u003e\n\u003cli\u003eScargill, T. (2021). \u003cem\u003eContext-Aware Markerless Augmented Reality for Shared Educational Spaces.\u003c/em\u003e Paper presented at the 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct).\u003c/li\u003e\n\u003cli\u003eShin, D. H., \u0026amp; Dunston, P. S. (2008). Identification of application areas for Augmented Reality in industrial construction based on technology suitability. \u003cem\u003eAutomation in Construction, 17\u003c/em\u003e(7), 882-894. doi:doi.org/10.1016/j.autcon.2008.02.012\u003c/li\u003e\n\u003cli\u003eSufiyan, D., Win, L. S. T., Win, S. K. H., Tan, U. X., \u0026amp; Foong, S. (2024, 15-19 July 2024). \u003cem\u003eDirect Aerial Visual Localization using Panoramic Synthetic Images and Domain Adaptation.\u003c/em\u003e Paper presented at the 2024 IEEE International Conference on Advanced Intelligent Mechatronics (AIM).\u003c/li\u003e\n\u003cli\u003eTareen, S. A. K., \u0026amp; Saleem, Z. (2018, 3-4 March 2018). \u003cem\u003eA comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK.\u003c/em\u003e Paper presented at the 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET).\u003c/li\u003e\n\u003cli\u003eUngureanu, D., Bogo, F., Galliani, S., Sama, P., Duan, X., Meekhof, C., . . . Sch\u0026ouml;nberger, J. L. (2020). Hololens 2 research mode as a tool for computer vision research. \u003cem\u003earXiv preprint arXiv:2008.11239\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eVermandere, J., Bassier, M., \u0026amp; Vergauwen, M. (2022). Two-Step Alignment of Mixed Reality Devices to Existing Building Data. \u003cem\u003eRemote Sensing, 14\u003c/em\u003e(11), 2680. Retrieved from https://www.mdpi.com/2072-4292/14/11/2680\u003c/li\u003e\n\u003cli\u003eVolk, R., Stengel, J., \u0026amp; Schultmann, F. (2014). Building Information Modeling (BIM) for existing buildings \u0026mdash; Literature review and future needs. \u003cem\u003eAutomation in Construction, 38\u003c/em\u003e, 109-127. doi:10.1016/j.autcon.2013.10.023\u003c/li\u003e\n\u003cli\u003eWilliams, G., Gheisari, M., Chen, P.-J., \u0026amp; Irizarry, J. (2015). BIM2MAR: An Efficient BIM Translation to Mobile Augmented Reality Applications. \u003cem\u003eJournal of Management in Engineering, 31\u003c/em\u003e(1). doi:10.1061/(asce)me.1943-5479.0000315\u003c/li\u003e\n\u003cli\u003eWu, Y., \u0026amp; Hu, Z. (2006). PnP problem revisited. \u003cem\u003eJournal of Mathematical Imaging and Vision, 24\u003c/em\u003e, 131-141.\u003c/li\u003e\n\u003cli\u003eZhang, P., \u0026amp; Yan, X. (2023). Application of Improved KAZE Algorithm in Image Feature Extraction and Matching. \u003cem\u003eIEEE Access, 11\u003c/em\u003e, 122625-122637. doi:10.1109/ACCESS.2023.3328778\u003c/li\u003e\n\u003cli\u003eZhu, J.-Y., Park, T., Isola, P., \u0026amp; Efros, A. A. (2017). \u003cem\u003eUnpaired image-to-image translation using cycle-consistent adversarial networks.\u003c/em\u003e Paper presented at the Proceedings of the IEEE international conference on computer vision.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Localization, Mixed Reality, Image Style Transfer, BIM, HoloLens","lastPublishedDoi":"10.21203/rs.3.rs-7812864/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7812864/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis research introduces a novel methodology that automates the precise alignment between real and virtual environments in Mixed Reality (MR) applications, specifically tailored for the construction industry. A significant challenge in MR systems is the accumulation of camera pose estimation errors, leading to trajectory drift and reduced localization accuracy over time. Our approach addresses this by integrating HoloLens' spatial mapping capabilities with Image style transfer and geometric feature matching, enabling robust alignment between real-world HoloLens images and Building Information Modeling (BIM). By bridging the visual domain gap through image style transfer, we enhance feature correspondence, effectively eliminating drift errors that accumulate during device movement. A comprehensive evaluation using 1,408 image pairs demonstrates improved localization accuracy and reliable alignment of BIM in the real world for enhancing efficiency in the construction industry.\u003c/p\u003e","manuscriptTitle":"Drift-free BIM Alignment for Mixed Reality Visualization through Image Style Transfer and Feature Matching","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-12 14:42:50","doi":"10.21203/rs.3.rs-7812864/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"6a92a8a1-d6da-4149-8ca9-4887aa673dfb","owner":[],"postedDate":"October 12th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-01-07T10:55:15+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-12 14:42:50","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7812864","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7812864","identity":"rs-7812864","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.