Whole Genome of Petroleum Hydrocarbon Degrading Rhodococcus indonesiensis isolated from Nacharam, Hyderabad, India

doi:10.21203/rs.3.rs-6309542/v1

Whole Genome of Petroleum Hydrocarbon Degrading Rhodococcus indonesiensis isolated from Nacharam, Hyderabad, India

2025 · doi:10.21203/rs.3.rs-6309542/v1

preprint OA: closed

Full text JSON View at publisher

Full text 309,214 characters · extracted from preprint-html · click to expand

Whole Genome of Petroleum Hydrocarbon Degrading Rhodococcus indonesiensis isolated from Nacharam, Hyderabad, India | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Whole Genome of Petroleum Hydrocarbon Degrading Rhodococcus indonesiensis isolated from Nacharam, Hyderabad, India Syed Arshi Uz Zaman, Khushboo Sharma, Anuraj Nayarisseri, Kamal A. Khazanehdari, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6309542/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 02 Dec, 2025 Read the published version in Scientific Reports → Version 1 posted 11 You are reading this latest preprint version Abstract Petroleum contamination presents a significant environmental challenge, contributing to soil and water pollution. Bioremediation provides a sustainable and cost-effective approach. In this study, we isolated and characterized a novel petroleum-degrading strain, Rhodococcus indonesiensis SARSHI1. Whole-genome sequencing of SARSHI1 was conducted using a hybrid sequencing approach, integrating Oxford Nanopore Technologies (ONT) (PromethION) and Illumina (NovaSeq 6000) platforms. The complete genome of SARSHI1 comprises 5.7 Mbp, along with a plasmid of 159,118 bp, encoding a total of 5,150 coding sequences (CDS). The genome consists of 5,695,289 base pairs, with 5,220 identified genes comprising 5,094 protein-coding genes. Additionally, it contains 12 ribosomal RNA (rRNA) genes, 55 transfer RNA (tRNA) genes, one non-coding RNA, one CRISPR array, 56 pseudogenes, and 243 hypothetical proteins. The raw reads obtained were 13,900,477 from Illumina and 2,539,063 from ONT, with processed reads of 13,169,190 and 1,567,736, respectively. Genome assembly achieved 100% completeness, confirming the reconstruction of a fully intact genome without missing sequences. A total of 570 single-copy marker genes were identified, resulting in a coding density of 91.4%. Functional annotation and comparative genomic analysis revealed key genes associated with hydrocarbon degradation, including alkB , ahyA , and almA (Group I) families for long-chain alkane degradation, as well as bph , ben , and xylC clusters for aromatic hydrocarbon degradation under aerobic conditions. Additionally, multiple antibiotic resistance genes, including those conferring resistance to beta-lactams, were identified. Secondary metabolite analysis identified 19 distinct biosynthetic gene clusters (BGCs), encoding variants of known compounds, highlighting the genomic potential for diverse secondary metabolite production. The complete genome sequence has been deposited in GenBank under accession numbers CP180630 (chromosome) and CP180631 (plasmid). The raw sequencing reads have been submitted to the Sequence Read Archive (SRA), NCBI, under accession numbers SRX27520007 (Illumina) and SRX27520006 (ONT). Biological sciences/Biological techniques Biological sciences/Biotechnology Biological sciences/Computational biology and bioinformatics Biological sciences/Microbiology Petroleum hydrocarbon-degrading bacteria Bioremediation and degradation Whole Genome Sequencing Oxford Nanopore Illumina Genome Annotation Petroleum hydrocarbon-degrading genes Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 1. Introduction Petroleum-derived products, including fuels and petrochemicals, are indispensable to modern economics and daily life. Since the 19th century, global crude oil consumption has surged, with projections indicating a demand of 121.5 million barrels per day (BPD) by 2050 [ 1 ]. India, a major consumer, recorded 137.6 million metric tonnes of petroleum consumption between April and October 2023, reflecting a 3% year-on-year increase [ 2 ]. However, petroleum extraction, refining, and transportation pose significant environmental and health risks, particularly due to accidental spills and industrial emissions. In India, petroleum refineries are major contributors to air, soil, and water pollution, exacerbating greenhouse gas emissions and environmental degradation[ 3 ]. The persistence of petroleum hydrocarbons (PHs) in ecosystems leads to extensive ecological disruptions[ 4 ]. In aquatic environments, PHs integrate into sediments, bioaccumulate in organisms, and disrupt trophic interactions, ultimately compromising the food chain[ 5 ][ 6 ]. In terrestrial ecosystems, PH contamination at high concentrations (> 20,000 mg/kg) significantly reduces microbial diversity, impairing essential biogeochemical cycles. Conversely, moderate contamination (4,000–20,000 mg/kg) can enhance microbial species coexistence[ 7 ]. The toxicological impact of PH exposure in humans varies with concentration, duration, and route of exposure. Chronic exposure has been linked to carcinogenicity, immunosuppression, and developmental toxicity, while acute exposure can lead to dermatological reactions, respiratory distress, and ocular infections[ 8 ]. A study conducted on Indian petroleum workers in 2013 reported significant DNA damage even at low exposure levels, with benzene and its metabolites identified as key contributors [ 9 ]. Conventional petroleum remediation strategies, including physical methods such as skimming and chemical dispersants, are often inefficient for large-scale spills and may induce secondary ecological disturbances [ 10 ][ 11 ]. Bioremediation is a cost-effective, sustainable approach that leverages microbial metabolic pathways to degrade hydrocarbons into non-toxic byproducts. However, the degradation is influenced by various environmental factors, including temperature, pH, oxygen availability, nutrient concentrations, and hydrocarbon bioavailability. Among microbial degraders, bacteria exhibit exceptional adaptability and metabolic versatility, making them prime candidates for bioremediation[ 12 ]. Notable genera include Pseudomonas , Burkholderia , Gordonia , and Rhodococcus , all of which utilize hydrocarbons as sole carbon and energy sources[ 13 ]. For instance, Rhodococcus strains isolated from Antarctic oil-contaminated soils have demonstrated hydrocarbon metabolism at sub-zero temperatures (-2°C)[ 14 ]. Likewise, recent research has identified three petroleum-degrading bacterial species— Bacillus thuringiensis, Bacillus pumilus , and Rhodococcus hoagie isolated from the rhizosphere of Panicum aquaticum Poir. , a plant that thrives in oil-contaminated soils[ 15 ]. Rhodococcus species have garnered significant interest due to their metabolic plasticity and resilience under extreme environmental conditions, such as high salinity, low temperatures, and nutrient scarcity[ 16 ]. These bacteria exhibit an extraordinary ability to degrade a wide spectrum of petroleum hydrocarbons and xenobiotic compounds[ 17 ]. However, despite their recognized potential, genomic insights into the enzymatic pathways and regulatory networks governing hydrocarbon degradation remain limited. Recent advancements in whole-genome sequencing (WGS) have facilitated the identification of key genetic determinants and metabolic pathways involved in hydrocarbon degradation, enhancing bacterial applications for large-scale environmental remediation[ 18 ]. WGS enables comprehensive genomic analysis, pinpointing critical genetic elements and protein-coding sequences involved in petroleum biodegradation [ 19 ][ 20 ]. However, traditional next-generation sequencing (NGS), relying primarily on short-read sequencing, encounters limitations in resolving complex genomic architectures, particularly repetitive sequences, structural variations, and large insertions often present in bacterial genomes [ 21 ]. Hybrid sequencing, which integrates short-read and long-read sequencing technologies, effectively addresses these challenges. While short-read sequencing provides high precision in detecting small-scale genetic variations, long-read sequencing resolves large genomic rearrangements, repetitive regions, and complex structural variations[ 22 ]. This synergistic approach offers an unprecedentedly detailed view of bacterial genomes, facilitating the identification of novel genes, enzymatic pathways, and functional networks essential for hydrocarbon metabolism [ 23 ]. This approach provides insights into the metabolic networks and regulatory pathways that enable bacterial resilience in polluted environments[ 24 ]. In the present study, we have isolated a novel petroleum-degrading bacterial strain and performed an in-depth genomic characterization using hybrid sequencing [Figure 1 ]. By integrating advanced sequencing platforms, we achieved high-resolution insights into the strain’s genetic framework, elucidating the metabolic pathways and enzymatic systems involved in hydrocarbon degradation. These findings provide a deeper understanding of the molecular mechanisms underpinning petroleum biodegradation and underscore the potential of the strain for application in environmental bioremediation strategies. Our research highlights the transformative role of genomic approaches in discovering novel bacterial resources for sustainable and eco-friendly remediation of petroleum-contaminated environments. 2. Methodology 2.1 Sample Collection and Isolation of Petroleum Hydrocarbon Degrading Bacteria Oil-contaminated soil samples were obtained from polluted sites near service and gas stations in Nacharam, Hyderabad, by excavating surface layers at a depth of 10–12 inches. The collected samples were homogenized and stored at 4°C before enrichment in nutrient agar (NA) containing crude oil. The soil samples were inoculated into minimal salt medium (MSM) supplemented with 2% (w/v) crude oil, while an additional flask containing nutrient media with crude oil served as a control. The flasks were incubated at 30°C for 24 hours, and bacterial growth was monitored by assessing turbidity. All experiments were performed in triplicate. The cultures were serially diluted and spread onto MSM plates containing 2% (w/v) crude oil to isolate petroleum-degrading bacterial strains. The plates were incubated at 30°C for 24–48 hours[ 17 ][ 25 ]. Among the isolates, only SARSHI1 exhibited significant hydrocarbon degradation. The characterization of SARSHI1 was performed following Bergey’s Manual of Determinative Bacteriology[ 26 ][ 27 ]. Biochemical and microbial analyses included motility assays, Gram staining, and metabolic tests such as indole, Methyl Red (MR), and Voges-Proskauer (VP) assays to evaluate glucose oxidation and non-acidic end-product formation[ 28 ][ 29 ]. The biodegradation efficiency of SARSHI1 was further assessed using crude oil concentrations of 5%, 10%, and 15% (w/v) over a seven-day incubation period. The experiment was performed in replicates. Following incubation, cultures were centrifuged at 3,000–5,000 rpm for 5 minutes to pellet bacterial cells. The supernatant was discarded, and the pellet was washed with 0.9% physiological saline. Bacterial growth was quantified spectrophotometrically at OD600[ 30 ][ 31 ][ 32 ]. 2.2 Genomic DNA Extraction, Qualitative and Quantitative Analysis Genomic DNA was extracted from a pure bacterial culture of isolate SARSHI1 using the QIAamp DNA Mini Kit (Qiagen), following the manufacturer’s protocol. Further, the quality and quantity of the genomic DNA were assessed to ensure integrity and purity. The purity of the DNA was evaluated by determining the A260/280 ratio using a NanoDrop spectrophotometer (ThermoFisher Scientific). To further verify the integrity of the genomic DNA, a portion of the extracted DNA was resolved on a 1% agarose gel with a ladder[ 33 ]. The absence of smearing or degradation bands on the gel confirmed the integrity of the extracted genomic DNA, confirming its suitability for downstream analysis[ 34 ][ 35 ]. 2.3 Molecular characterization using 16S rRNA Sequencing The isolated DNA was amplified using universal 16S primers, with the forward primer sequence GGATGAGCCCGCGGCCTA and the reverse primer sequence CGGTGTGTACAAGGCCCGG. The PCR mix contained template DNA, primers, Taq DNA polymerase, 10X buffer, MgCl2, and dNTPs. The PCR conditions are initial denaturation at 95°C for 5 minutes, followed by 30 cycles of denaturation at 95°C for 40 seconds, primer annealing at 65°C for 1 minute, extension at 72°C for 2 minutes, and final extension step at 65°C for 1 minute[ 36 ]. A single discrete PCR amplicon band was observed on agarose gel electrophoresis [ 37 ][ 38 ]. The amplified DNA fragments were purified using the GeneJet Gel Extraction PCR Purification Kit ( GeneJet Gel Extraction Kit , 2015) [ 39 ]. The PCR amplicon was sequenced using Sanger dideoxy sequencing. The forward and reverse chromatogram files were assembled and analyzed with DNA Baser v- 5.15( DNA Baser v5.15 , 2022) [ 40 ]. The sequence nucleotide composition, molecular weight, and GC content were determined using EMBOSS software[ 41 ][ 42 ]. 2.4 Phylogenetic Analysis and rRNA Structure Prediction The top 20 sequences with over 95% similarity to the query sequence were retrieved in FASTA format for the phylogenetic analysis. These sequences were imported into MEGA XI and aligned using the MUSCLE algorithm with up to 16 iterations and UPGMA clustering[ 43 ][ 44 ]. The alignment was exported in MEGA format, and a phylogenetic tree was generated using the neighbor-joining method with 1,000 bootstrap replicates. The evolutionary relationships were evaluated using the Maximum Composite Likelihood model[ 45 ][ 46 ]. The rRNA secondary structure was predicted using UNAfold. The Mfold server was incorporated to predict minimum free energy (MFE) structure based on the nearest-neighbor model that provides insight into the thermodynamic stability of the predicted secondary structures[ 47 ][ 48 ]. 2.5 Preparation of 2X150 WGS Library The paired-end sequencing library was prepared from the extracted DNA using the Illumina TruSeq Nano DNA Library Prep Kit (TruSeq, DNA Library Prep Kits, 2012). 100 ng of DNA was fragmented using the Covaris M220 system, resulting in an average fragment size of 350 bp. Covaris shearing generates double-stranded DNA fragments with 3′ or 5′ overhangs, subjected to end-repair to produce blunt ends. The repaired DNA fragments were ligated with adapters, followed by size selection using AMPure XP beads to ensure uniformity. The size-selected fragments were PCR-amplified using an index-specific primers kit, integrating indexing adapters. These adapters facilitated the hybridization of DNA fragments onto the flow cell for sequencing. The PCR-enriched library was assessed for quality and fragment distribution using the Agilent 4200 TapeStation system with High Sensitivity D1000 Screen Tape, following the manufacturer’s protocol[ 49 ]. 2.6 Cluster Generation and Sequencing The paired-end (PE) Illumina sequencing library was quantified using Qubit, assessed for fragment size with the Agilent TapeStation, and then loaded onto the NovaSeq 6000 for cluster generation and high-throughput sequencing. PE sequencing captures DNA fragments in both directions, enhancing accuracy and facilitating the detection of structural variants and repeats. Cluster generation involves hybridizing the prepared library to adapter-bound oligos on the flow cell, enabling selective forward-strand cleavage after reverse-strand synthesis. This process ensures bidirectional sequencing with high-quality data and improved genome assembly [ 34 ][ 50 ]. 2.7 Preparation of Nanopore Library The Nanopore sequencing library was prepared from the quality-controlled genomic DNA sample using the Ligation Sequencing-gDNA Native Barcoding Kit. Initially, 800 ng of the DNA sample was subjected to DNA repair, followed by purification using AMPureXP beads and adaptor ligation. The samples were pooled according to the kit protocol. The pooled library was further purified using AMPureXP beads to maintain high-quality preparation. The concentration of the purified pooled library was quantified using a Qubit fluorometer. Finally, the purified library was loaded onto an Oxford Nanopore PromethION P2 Solo flow cell for sequencing [ 51 ]. Nanopore sequencing facilitates the real-time selective sequencing of single DNA molecules by dynamically reversing the voltage across individual nanopores. 2.8 Genome Assembly andQuality Control The Raw sequencing reads from Illumina and Oxford Nanopore platforms underwent rigorous quality control and preprocessing to ensure high-fidelity assembly. Quality control of Illumina paired-end reads was conducted using FastQC (v0.11.9) ( http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ), with reads achieving a Phred quality score > 30 prioritized for downstream analysis[ 52 ]. The Oxford Nanopore reads were evaluated by read length distribution and per-base quality scores. The low-quality bases were trimmed and adapters were removed using Trimmomatic v0.39 ( https://github.com/timflutre/trimmomatic ) [ 53 ]. A sliding window approach (window size = 4, minimum average quality score = 20) was applied to refine read quality. The bases at the ends of reads with quality scores below 3 were removed, and reads shorter than 36 bp were discarded to ensure high-quality data. The error correction was performed using SPAdesv3.15( https://ablab.github.io/spades/ ), employing a k-mer-based strategy (k = 21–51)[ 54 ]. The Oxford nanopore reads were filtered using Filtlong( https://github.com/rrwick/Filtlong),retainin g only reads exceeding 1,000 bp with a mean quality score of at least 10[ 55 ], followed by adapter removal using Porechop( https://github.com/rrwick/Porechop )[ 56 ]. Hybrid genome assembly was performed using Unicycler ( https://github.com/rrwick/Unicycler ) in bold mode (--mode bold) to maximize scaffold continuity. Short-read error correction was conducted with SPAdes (k = 21, 33, 55), while long reads were assembled using miniasm, followed by two rounds of polishing with Racon. A depth filter (--depth_filter 2.5× median depth) was applied to exclude low-confidence contigs, and sequences shorter than 1,000 bp were removed to retain only high-quality contigs. Structural variations and repeat regions were resolved through automatic bridging (--bridging auto), leveraging long reads to enhance assembly completeness[ 57 ]. Genome assembly quality was assessed using QUAST-5.3( https://github.com/ablab/quast ), evaluating key metrics such as N50, total genome size, and misassembly rates[ 58 ]. Contiguity was determined based on total assembly length and N50, while genome completeness was quantified by the genome fraction representing reference-mapped base coverage. Additionally, genome circularity and structural coherence were examined using Bandage( https://github.com/rrwick/Bandage ), by employing hierarchical layouts to detect potential assembly discrepancies[ 59 ]. 2.9 Functional Annotation Comprehensive genome annotation was performed using integrated bioinformatics tools to ensure high-resolution structural and functional characterization. The genome was initially annotated using BAKTA v1.10.3 ( https://github.com/oschwengers/bakta ) [ 60 ], followed by NCBI Prokaryotic Genome Annotation Pipeline (PGAP) ( https://github.com/ncbi/pgap ) and Blast2GO( https://www.blast2go.com/ ). PGAP annotation was performed using the best-placed reference methodology, while gene prediction was conducted with GeneMarkS-2 + v6.9 ( https://github.com/karlgem/GeneMarkS-2 ), ( https://genemark.bme.gatech.edu/ ). EggNOG-mapper( https://github.com/eggnogdb/eggnog-mapper ) was utilized for functional annotation, incorporating orthology-based assignments, Gene Ontology (GO) terms, KEGG pathway mapping, and Clusters of Orthologous Groups (COG) classifications [ 61 ]. Genome assembly completeness and contamination were assessed using CheckM ( https://github.com/Ecogenomics/CheckM ) and CheckM2 ( https://github.com/chklovski/CheckM2 ) [ 62 ][ 63 ]. CheckM v1.0.2 evaluates genome integrity based on lineage-specific marker genes, while CheckM2 leverages machine-learning models for enhanced accuracy. The quality assessment process includes gene identification, metadata calculation, and DIAMOND-based genome annotation, followed by machine-learning-driven prediction of genome completeness and contamination. Pathway analysis was performed using Blast2GO to elucidate the biological functions of annotated genes. This encompasses pathway classifications, species mapping, and statistical evaluations based on Fisher’s Exact Test and Gene Set Enrichment Analysis (GSEA). Metabolic annotation including the identification of key biogeochemical cycles genes was performed with DRAM (Distilled and Refined Annotation of Metabolism)[ 64 ], ( https://github.com/WrightonLabCSU/DRAM ). The analysis was conducted with default parameters. The protein domains and motifs were identified using InterProScan( https://github.com/ebi-pf-team/interproscan ). The completeness and accuracy of the SARSHI1 genome assembly were further validated using BUSCO v5( https://github.com/metashot/busco )[ 65 ]. The assembled genome was compared against the actinobacteria _phylum_odb10 dataset, comprising 292 orthologous genes from 893 actinobacteria genomes. The analysis was performed in genome mode to quantify genome integrity based on complete, fragmented, or missing genes. 2.10 Comparative Genomics and Taxonomic Analysis Average Nucleotide Identity (ANI) analysis was performed using FastANI( https://github.com/ParBLiSS/FastANI ) with default parameters [ 66 ]. The SARSHI1 genome was compared bidirectional with Rhodococcus indonesiensis CSLK01-03. FastANI fragment-based k-mer mapping strategy enables efficient and accurate ANI estimation, yielding key metrics such as ANI percentage based on fragment matches and total comparisons. To further refine taxonomic classification, digital DNA-DNA hybridization (dDDH) analysis was performed against the closest reference genome (GCA_030360185.1) using Genome-to-Genome Distance Calculator (GGDC 3.0) ( https://ggdc.dsmz.de/ ). The analysis leverages three distinct formulas length-normalized intergenomic distance, high-scoring segment pair-based alignment, and sum-of-bits approach to calculate distance. Bootstrapping was applied and species delineation was performed against the 70% dDDH threshold, a critical microbial taxonomic criterion to determine confidence intervals. The G + C content differences and evolutionary distances further validated genomic congruence[ 67 ] [ 68 ]. Whole-genome-based taxonomic classification was performed using the Type (Strain) Genome Server (TYGS), ( https://tygs.dsmz.de/ ). The ten closest types (strains) were selected using the MASH distance algorithm and 16S rRNA gene-based comparisons. The intergenomic distances were computed under the trimming algorithm with 100 replicates using the Genome BLAST Distance Phylogeny (GBDP) approach. The digital DNA-DNA hybridization (dDDH) values were determined, followed by phylogenomic tree construction using the FASTME algorithm with subtree pruning and regrafting (SPR ). Phylogenetic trees were visualized using PhyD3, with species clustering based on the 70% dDDH threshold and subspecies delineation at 7 9% dDDH[ 69 ]. 2.11 Annotation of Hydrocarbon Degradation Genes 2.11.1 CANT_HYD Analysis The aerobic and anaerobic hydrocarbon degradation genes were annotated using CANT_HYD( https://github.com/dgittins/CANT-HYD-HydrocarbonBiodegradation ) analysis. The genomic sequences were processed using HMMER 3.4 ( http://hmmer.org/ ) to detect hydrocarbon-degrading enzymes. Hidden Markov Models (HMMs) were built from multiple sequence alignments (MSAs) of known enzymes, including alkane 1-monooxygenase (AlkB), nitrate and sulfite reductases (AhyA), and flavin-containing monooxygenases (AlmA_GroupI). The hmmscan algorithm was employed to search against the curated HMM database, using an E-value threshold of < 1e-5. The KEGG and UniProt databases were used to map the identified genes to aerobic hydrocarbon degradation pathways [ 70 ]. 2.11.2 HADEG Analysis The HADEG (Hydrocarbon Aerobic Degradation Enzymes and Genes) database was utilized to characterize aerobic hydrocarbon degradation genes. The alignment was performed in BLASTp mode using DIAMOND v2.0 ( https://github.com/bbuchfink/diamond)agains t the HADEG database. The orthologous gene clusters were detected and classified using Proteinortho 6 ( https://gitlab.com/paulklemm_PHD/proteinortho )[ 71 ]. 2.11.3 HMDB Analysis The hydrocarbon monooxygenase genes were annotated using DIAMOND-based sequence searches against the Hydrocarbon Monooxygenase Gene Database (HMDB) [ 72 ]. A homology-based functional annotation was performed using DIAMOND with an E-value threshold of ≤ 1e-5 and a minimum identity threshold of 25% to ensure high-confidence hits. The identified sequences were then filtered based on sequence identity and bit score to retain only significant matches. Further, the results were validated based on sequence conservation, domain architecture, and functional relevance with well-characterized hydrocarbon monooxygenases, including alkane 1-monooxygenase ( alkB ) and rubredoxin electron transfer systems ( prmA, prmB, prmC, and prmD ). 2.11.4 antiSMASH Analysis: Secondary Metabolite Biosynthesis The antiSMASH (Antibiotics & Secondary Metabolite Analysis Shell) ( https://github.com/antismash ) analysis, enabling ClusterBlas t and Known ClusterBlast was performed to identify Biosynthetic gene clusters (BGCs)[ 73 ]. The analysis facilitates comparative genomic analysis by aligning predicted BGCs with known clusters. The Procluster Region Analysis approach was used with relaxed strictness to ensure the comprehensive detection and annotation of secondary metabolite gene clusters. 3. Results 3.1 Screening and Isolation of Bacteria Comprehensive biochemical and microbial tests were performed to characterize the strain SARSHI1. The bacterium tested negative for the indole and Methyl Red (MR) tests, indicating the absence of tryptophan metabolism and mixed acid fermentation pathways. The positive Voges-Proskauer (VP) test, confirms its ability to produce neutral end products from glucose fermentation. Further, assays confirmed that SARSHI1 was Gram-positive and oxidase-negative. The gelatin liquefaction test indicated the presence of extracellular proteolytic activity. The motility test showed a negative result, confirming the non-motile nature of the strain (Fig. 1 Supplementary Information 1). The SARSHI1 was further checked for growth at different crude oil concentrations, and optical density (OD) measurements at 600 nm were recorded at 24-hour intervals. The growth dynamics of SARSHI1 under varying crude oil concentrations as the sole carbon source are presented in Fig. 2 . A pronounced lag phase was observed at a crude oil concentration of 15% (w/v), indicating delayed adaptation to higher hydrocarbon levels. 3.2 DNA Quality Control and Quantification 3.2.1 QC of extracted DNA on Agarose Gel The quality of extracted DNA was checked using Agarose gel electrophoresis. As shown in Fig. 2 Supplementary Information 1 , Lane M contains a molecular weight marker (a DNA ladder) with distinct bands. Lane 1 displays a single, high-molecular-weight DNA band with no visible smearing, indicating high quality and intact samples. The absence of degraded fragments suggests minimal DNA fragmentation or contamination. 3.2.2 Quantification of extracted DNA using NanoDrop The NanoDrop quantified DNA concentration was 54.2 ng/µL. The OD ratio at A260/280 was 1.83, within the standard range for pure DNA, indicating minimal protein contamination. The OD ratio of 1.71 at A260/230 implies reasonable purity. However, a value closer to 2.0 would be ideal for minimal organic contaminant interference (e.g., phenol or other solvents). The sample passed the quality control (QC) criteria for downstream analysis (Table 1 ). Table 1 Quantification of extracted DNA sample on NanoDrop Sr No Sample ID NanoDrop Readings (ng/ul) NanoDrop OD A260/280 NanoDrop OD A260/230 QC Status 1 SARSHI1 54.2 1.83 1.71 Pass 3.3 Molecular Characterization by Ribotyping The forward and reverse abi trace files obtained from Sanger dideoxy sequencing were assembled using DNA Baser software and saved into FASTA format. Comparative sequence analysis was performed using the NCBI-BLAST (Basic Local Alignment Search Tool), employing an e-value threshold of 0.0 and a coverage cut-off of 200%. Based on morphological, biochemical, and 16S rRNA sequence analyses, the strain SARSHI1 was identified as a Gram-positive bacterium, Rhodococcus indonesiensis . The 16S rRNA sequence of strain Rhodococcus indonesiensis SARSHI1 has been submitted to the NCBI GenBank database under the accession number PV034287. The 16S rRNA sequence exhibited a GC content of 60.69%, a molecular weight of 439,033 Daltons, and a total length of 720 bp. Further sequence comparisons revealed that SARSHI1 shared a high degree of similarity with various Rhodococcus species, displaying 98.47% identity with the partial sequences of Rhodococcus ruber strain JC435 and Rhodococcus ruber strain DSM 43338, as well as 98.33% similarity with the partial sequences of Rhodococcus aetherivorans strain DSM44752 and strain 10bc312. The strain also exhibited 97.22% sequence identity with Rhodococcus nanhaiensis strain SCSIO10187. Notably, the strain 10bc312 was isolated from a petrochemical sludge bioreactor. The top 20 BLAST hits with > 95% sequence similarity were retrieved for phylogenetic tree construction. The phylogenetic tree demonstrated in Fig. 3 was generated using the neighbor-joining method, applying the Maximum Composite Likelihood model. Evolutionary parameters, including base composition bias, Disparity Index, average substitution rates per site, and Transition/Transversion bias, were estimated using the Kimura 2-parameter model (Table 2 ). The substitution patterns were computed based on the Tamura-Nei model. The summary statistics and maximum Likelihood (ML) scores for tree topology are provided in Table 1 , Supplementary Information 1. Table 2 Evolutionary Analysis using MEGAXI Conserved Sites 172 Variable Sites 672 Parsim-info Sites 552 Singleton Sites 120 Average Disparity Index 0.946420 Average Composition distance 1.047846 Average Pairwise Distance 4.313008 Overall Mean Distance 4.31 Transition /Transversion Bias 0.63 A detailed structural analysis of the rRNA sequence was conducted to identify conserved functional motifs essential for ribosomal activity. Secondary structure predictions were generated using UNAFOLD under specific conditions, including a window size of 12, a folding temperature of 37°C, and an ionic concentration of 1 M NaCl. The predicted RNA structure, along with its respective free energy value (ΔG= -256.00 kcal/mol) is illustrated in Fig. 3 , Supplementary Information 1. Functional annotations highlighted key regions, and thermodynamic properties were summarized in Table 2 , Supplementary Information 1. Furthermore, an entropy assessment was performed using RNAfold illustrated in a hill plot in Fig. 4 . 3.4 Library QC using Tape Station The electropherogram depicted in Fig. 5 provides a detailed fragment size distribution analysis of a nucleic acid sample, obtained through TapeStation. The data indicate a primary fragment size range between 197 bp and 963 bp, with an average fragment length of 409 bp. The sample exhibits a high concentration of nucleic acid fragments, quantified at 26.7 ng/µl, and a region molarity of 110 nmol/l indicates sufficient nucleic acid yield for downstream analysis. Notably, the detected fragment population accounts for 95.85% of the total sample, indicating minimal degradation and contamination. The well-defined lower and upper markers ensure a high-integrity sample, suitable for sequencing. 3.5 Data Generation and Analytics The raw sequencing reads of SARSHI1 underwent comprehensive quality control and preprocessing using Trimmomatic v0.39. The adapter sequences containing more than 5% ambiguous nucleotides ("N"), and low-quality reads with over 10% of bases exhibiting a Phred quality score below 25 were systematically removed. A stringent filtering strategy was implemented, incorporating sliding window trimming of 10 bp, wherein bases with average quality within the window fell below (Phred < 25) were discarded. Additionally, leading and trailing bases with quality scores below 25 were removed to ensure sequence integrity. Post-trimming only reads exceeding a minimum length threshold of 100 nucleotides were retained, resulting in high-confidence data comprising approximately 12.89 million high-quality reads ( Table 3 ). Table 3 Read statistics (Illumina PE Short reads) Sample Name Raw Reads Raw Total bases Raw Data in GB HQ Reads HQ total bases HQ data in GB SARSHI1 13,900, 477 4,182,022,022 4.18 13,169,190 3,904,771,108 3.9 3.5.1 ONT Read Statistics For sample SARSHI1, single-end sequencing was employed, and the raw data were processed in FASTQ format to ensure compatibility. The sequencing achieved an approximate genome coverage of 800X, underscoring the depth and reliability of the generated data. The read statistics are summarized in Table 4 , with read lengths ranging from a minimum of 68 bp to a maximum of 936,949 bp, highlighting the Nanopore sequencer’s capability to produce both short and ultra-long reads. The total high-quality read data comprises 4.83 GB. Table 4 ONT read statistics Sample Name No of reads Total number of bases HQ reads HQ Total Bases Min_length Max_length Data in GB SARSHI1 2,539,063 4,826,079,667 1,567,736 4,112,035,435 68 936949 4.83 3.6 Hybrid De-novo Genome Assembly The hybrid genome assembly of high-quality Illumina paired-end reads and Oxford Nanopore Technology (ONT) reads for sample SARSHI1 was performed using Unicycler v0.48 (Fig. 6 ). This assembly successfully reconstructed one chromosome and one plasmid, achieving an N50 value of 5536171 bp ( Table 5 ). The assembly robustness was further validated using QUAST and Bandage, with comprehensive metrics detailed in Table 1 , Supplementary Information 2 . The genome assembly consisted of only two contigs, reflecting minimal fragmentation and high continuity. The total genome length was 5,695,289 bp, with the largest contig spanning 5,536,171 bp, demonstrating that the majority of the genome was assembled into a single dominant contig. The N50 and N90 values of 5,536,171 bp further confirm high contiguity and structural integrity. The GC content of 70.08% was consistent with known bacterial genomes, reinforcing assembly reliability. Moreover, the L50 and L90 values (1) denoted that a single contig contained at least 50% and 90% of the genome. The absence of Ns per 100 kbp (0.00) confirmed a gap-free assembly, while the auN value of 5,385,944 pinpoints genome completeness and continuity ( Figs. 1 and 2 , Supplementary Information 2). The raw sequencing data for SARSHI1 Rhodococcus indonesiensis have been deposited in the Sequence Read Archive (SRA) under Accession SRX27520007 (Illumina data) and Accession SRX27520006 (ONT data), as part of BioProject PRJNA1217105 and BioSample SAMN46479200. The genome assembly visualization using Bandage revealed a nearly circular contig representing the complete genome, and a small circular structure corresponding to the plasmid ( Fig. 3 , Supplementary Information 2) . A minor discontinuity was observed, possibly indicating an unresolved repeat region. The polygonal and jagged shape of the contig, rather than a perfectly smooth circle, refers to sequencing depth variations or unresolved repeats. Table 5 Unicyler Genome Assembly Statistics Description SARSHI1 Sequence 2 Total Length(bp) 5,695,289 N50 5536171 Maximum Length of scaffold(bp) 5536171 GC% 70.08 3.7 Genome Representation and Comprehensive Annotation The circular genomic map of Rhodococcus indonesiensis SARSHI1 (5.7 Mbp), illustrated in Fig. 7 , provides a comprehensive visualization of its structural and functional organization. The circular genome representation facilitates the interpretation of gene distribution and evolutionary characteristics. The outermost ring highlights the distribution of protein-coding sequences (CDS) in grey, alongside key genetic elements such as transfer RNA (tRNA, green), ribosomal RNA (rRNA, red), and non-coding RNAs (ncRNA, shades of blue). The presence of CRISPR loci (dark green) suggests active defense mechanisms against phage and plasmid invasion, signifying adaptive immunity. The second ring represents GC content variation, implying regions of high transcriptional activity or horizontal gene transfer. The innermost rings depict GC skew, where the positive skew (green) corresponds to the leading strand and the negative skew (red) to the lagging strand, alluding to replication dynamics. The complete genome of Rhodococcus indonesiensis strain SARSHI1 was annotated using the Prokaryotic Genome Annotation Pipeline (PGAP), revealing a genomic architecture comprising two contigs, each of which contains a chromosome and a plasmid. The complete genome sequence has been deposited in the GenBank database under accession numbers CP180630 (complete genome) and CP180631 (plasmid). A total of 5,220 genes were identified, including 5,150 coding sequences, of which 5,094 were classified as protein-coding genes. Additionally, 70 RNA genes were annotated, comprising 12 ribosomal RNA (rRNA) genes (four copies each of 5S, 16S, and 23S rRNAs), 55 transfer RNAs (tRNAs), and a single non-coding RNA (ncRNA). The 56 pseudogenes were characterized by various structural impairments as 28 contained frameshift mutations, 38 (incomplete), four exhibited internal stop codons, and nine displayed multiple disruptions. Notably, no pseudogenes with ambiguous residues were detected. Functional annotation revealed key genetic determinants involved in hydrocarbon metabolism, including alkane monooxygenases and oxidoreductases. The pRiA4b ORF-3 family and plasmid maintenance gene systems were identified, highlighting intrinsic mechanisms for plasmid stability and inheritance. The antibiotic biosynthesis and antimicrobial resistance genetic elements revealed potential regulatory pathways governing adaptive mechanisms in contaminated environments. 3.7.1 Functional Annotation using Bakta The annotated genome contains a single chromosomal replication origin ( oriC ) and lacks plasmid replication ( oriV ) or transfer ( oriT ) origins, highlighting the absence of conjugative transfer elements. The high GC content (~ 70.08%) is consistent with its classification within high-GC bacterial lineages, likely Actinobacteria . The genome encodes a robust translational and regulatory network with 18 ncRNA regions. The presence of a CRISPR array further suggests an adaptive immunity against phages or plasmids. The genome harbors 5,169 coding sequences (CDSs), with 243 hypothetical proteins. Notably, a small open reading frame (sORF) was predicted to encode a putative small peptide, which may play a role in regulatory or metabolic processes. Each CDS was assigned a unique locus tag and mapped to specific genomic coordinates, with most genes exhibiting high query coverage (~ 1.0) and strong sequence identity with known proteins. Several CDSs featured low e-values (approaching 0) and high bit scores, indicating well-characterized proteins with potential roles in metabolic regulation. Conversely, a subset of CDSs remained uncharacterized, highlighting the presence of novel or hypothetical proteins (Supplementary Information 3) . Functional annotation using EggNOG provided insights into gene classifications, functions, associated pathways, and Gene Ontology (GO) terms. The analysis identified genes encoding metal-dependent hydrolases, ATP-binding proteins, and conserved hypothetical proteins with unknown functions. Furthermore, genes encoding alkane hydroxylases, monooxygenases, aromatic hydroxylases, ortho-cleavage, and meta-cleavage were detected, pinpointing complete mineralization of aliphatic and aromatic hydrocarbons via ring-cleaving mechanisms. Genes associated with oxidative stress response and membrane transporters indicate adaptive mechanisms for survival in hydrocarbon-rich environments (Supplementary Information 4). The comprehensive genome annotation using Bakta, BLAST2GO, and NCBI PGAP confirms that contig_2 is a plasmid with 134 coding sequences (CDSs) and 2 pseudogenes. Notably, the absence of transfer RNAs (tRNAs), transfer-messenger RNAs (tmRNAs), ribosomal RNAs (rRNAs), non-coding RNAs (ncRNAs), regulatory non-coding RNAs, CRISPR arrays, origins of replication (oriCs/oriVs), and origins of transfer (oriTs) and presence of several plasmid-associated elements pinpointed extra chromosomal nature. The key hallmarks include the presence of mobile genetic elements, IS256 family transposases (ENBDON_05209), TnsA- like transposases (ENBDON_05215), and site-specific integrases (ENBDON_05214). The plasmid stability and inheritance-related toxin-antitoxin (TA) systems comprise the RelE/ParE toxin-antitoxin system (ENBDON_05204) and the HigA family antitoxin (ENBDON_05205). Partitioning system components, ParA (ENBDON_05247) and ParB (ENBDON_05246) facilitate accurate plasmid segregation during bacterial cell division, further reinforcing the extra chromosomal nature of contig_2. The annotation also identified transporter proteins, including the ABC transporter ATP-binding protein (ENBDON_05239) and an MFS transporter (ENBDON_05230), which may imply adaptive responses and antibiotic resistance. The type IV secretion system (T4SS), including VirB4 and TraM facilitates conjugative transfer mechanisms. The presence of the TrwC relaxase, a pivotal component of conjugative plasmids, further substantiates this classification. BLAST2GO annotation corroborated these findings by identifying plasmid pRiA4b ORF-3 family homologs and proteins with Mu transposase C-terminal domains. The TraM domain-containing proteins play a pivotal role in horizontal gene transfer, potentially expediting the dissemination of antibiotic resistance genes within bacterial populations (Supplementary Information 4). 3.7.2 Blast2GO Annotation The assembly metrics, characterized by an N50 value of 5.5 Mbp, indicate a high-quality and contiguous genome assembly. Sequence alignment across varying coverage thresholds, including High-Scoring Segment Pair per Hit (HSP/Hit) and HSP per Query Sequence (HSP/Seq), exhibits a high degree of consistency, particularly at 100% coverage. Notably, the highest number of alignment hits was observed in Rhodococcus ruber BKS 20–38 (1929 hits) and Rhodococcus indonesiensis (1905 hits), suggesting a strong phylogenetic and functional association with these taxa ( Fig. 1 , Supplementary Information 5). A minor discrepancy exists between the total sequences identified by BLAST (5144) and those annotated with GO terms (4190), suggesting that some sequences remain unassigned to GO terms. The GO distribution graph highlights the predominance of catalytic and electron transfer activities in the MF category and detoxification and localization functions in the BP category ( Fig. 2 , Supplementary Information 5). The majority of sequences (803) were assigned a single Gene Ontology (GO) term, while only a small subset exhibited more than ten GO terms, suggesting a multifunctional role. The enzymatic classification highlights the dominance of oxidoreductases, pinpointing the oxidation potential ( Fig. 3 Supplementary Information 5). The SARSHI1 genome abundantly contains sequences with significant similarity, associated with essential biological processes. For instance, sequence ENBDON_00001, comprising 548 amino acids, exhibits a high degree of similarity (91.89%) to known protein sequences in the Non-Redundant (NR) protein database. BLAST analysis identified 20 significant hits with an E-value of 0.00E + 00, demonstrating 100% query coverage and, in several instances, 100% sequence identity. Similarly, the ENBDON_00007 corresponds to an alpha/beta fold hydrolase consisting of 300 amino acids and exhibited 95.44% sequence similarity to known hydrolases in the NR protein database. BLAST analysis produced multiple significant hits (E-value 0.00E + 00), reinforcing its strong homology to characterized hydrolases. Moreover, the genome encodes diverse gene families associated with hydrocarbon metabolism( Supplementary Information 6). 3.8 Genome Assembly and QualityAssessment The SARSHI1.fna_assembly genome belongs to the order Actinomycetales , exhibiting 100% completeness without contamination. The 570 single-copy marker genes with minimal duplication notion a high-quality genome. Most marker genes are present in single copies, with ≥ 90% amino acid identity (AAI), emphasizing the completeness and absence of contamination ( Fig. 8 A). The unimodal distribution observed in the GC content plot validates genomic homogeneity, while the percent coding density plot corroborates the high proportion of functional genes. The ΔTD plot highlights a well-assembled genome with low ΔTD values (~ 0.1–0.2), indicating relative homogeneity, whereas higher ΔTD values (~ 0.3–0.6) in a small fraction of sequences suggest horizontal gene transfer (HGT). Most sequences cluster near 0.07 ΔTD, aligning well with the expected genome profile, while a few deviations (~ 0.12–0.14 ΔTD) may correspond to plasmids. Notably, a large contig (~ 5000 kbp) with a very low ΔTD suggests the presence of a dominant genome with minimal variation (Fig. 8 B ). Further assessment using CheckM2 with a Neural Network-Specific model demonstrated 100% completeness, with no missing sequences. The genome predominantly comprises functional protein-coding sequences with a coding density of 91.4%. The assembly consists of only two contigs, with the largest spanning 5.5 million base pairs and an N50 value of 5,536,171, signifying high contiguity and minimal fragmentation. The genome contains 5,170 protein-coding genes with an average gene length of 333.85 base pairs, inferred functionally rich genetic composition. Overall, the SARSHI1.fna_assembly genome represents a high-quality, well-assembled complete genome with minimal contamination and high contiguity. 3.9 Genomic Insights into Metabolism The integrated Gene Ontology (GO) annotation framework provides a structured and hierarchical representation of sequence localization and molecular activities within cellular components (CC), molecular functions (MF), and biological processes (BP). The hierarchical organization of GO terms highlights functional interdependencies, where deeper nodes denote increasingly specific molecular functions. CC ontology analysis reveals significant sequences associated with cellular anatomical structures (GO: 0110165) and intracellular components (GO: 0005622), including cytoplasm (GO: 0005737) and cytosol (GO: 0005829), suggesting enriched intracellular processes. Moreover, the abundance of protein complexes (GO: 0032991) signifies structural integrity and enriched enzymatic activity. ( Fig. 1 Supplementary Information 7). The MF ontology predominantly features binding (GO: 0005488) and catalytic (GO: 0003824) activities. The dominance of critical subcategories such as nucleic acid binding (GO: 0003676), ion binding (GO: 0043167), and small molecule binding (GO: 0036094) is rich in regulatory mechanisms. The dominance of oxidoreductase (GO: 0016491), hydrolase (GO: 0016787), and transferase(GO:0016740) activities reflects robust oxidative metabolism ( Fig. 2 Supplementary Information 7). The BP ontology highlights functional domains associated with primary and secondary metabolic pathways. A significant presence of sequences involved in response to external stimuli (GO: 0050896), stress conditions (GO: 0006950), and signal transduction pathways (GO: 0007165) highlights adaptation to diverse environmental changes. Moreover, the augmented oxidoreductase and hydrolase activities substantiate robust hydrocarbon metabolism ( Fig. 3 , Supplementary Information 7). The BUSCO analysis further confirms the complete genome, with no fragmented or missing genes. 100% of BUSCO genes were identified, including 98.97% single-copy complete genes (289 genes) and 1.03% duplicated complete genes (3 genes) ( Fig. 9 ) . The presence of all expected orthologous genes suggests a well-assembled genome with minimal duplication. 3.10 Functional Annotation of Metabolism The SARSHI1 genome encodes an array of metabolic pathways facilitating various biochemical transformations. It possesses a fully functional glycolytic pathway (Embden-Meyerhof pathway) and a complete tricarboxylic acid (TCA) cycle (Krebs cycle), complemented by a fully integrated pentose phosphate pathway. However, alternative carbon fixation mechanisms, including the 3-hydroxypropionate bi-cycle and the Reductive Acetyl-CoA pathway (Wood-Ljungdahl pathway), are incomplete, implying limited autotrophic potential. The genome encodes key components of the electron transport chain (ETC), including Complex I (NADH dehydrogenase), Complex III (cytochrome bc1 ), and Complex IV (various cytochrome oxidases), supporting aerobic respiration. The absence of certain subunits denotes metabolic flexibility and enables the exploration of alternative electron acceptors. Methanogenesis pathways are partially encoded, particularly those mediating CO₂ reduction to methane and acetate conversion to methane, suggesting a possible role in anaerobic carbon cycling. Nevertheless, the lack of a fully assembled methanogenesis pathway indicates constrained methane biosynthesis ( Fig. 10 A ). Furthermore, the genome harbors an extensive repertoire of nitrogen metabolism genes, encompassing nitrate and nitrite reduction, nitrogen fixation, and ammonia oxidation, underscoring its potential role in nitrogen cycling. However, the absence of pathways facilitating the conversion of nitrous oxide to dinitrogen gas delineates incomplete denitrification. The genetic potential for metal resistance and detoxification is evident through the presence of arsenate and mercury reduction genes, suggesting a capacity for heavy metal bioremediation. Notably, the absence of genes encoding photosynthetic components, including Photosystem I and II, infers a lack of oxygenic photosynthesis. Likewise, sulfur metabolism pathways, including sulfate reduction and thiosulfate oxidation, are incomplete, reflecting constraints in sulfur cycling (Fig. 10 B ). SARSHI1 represents a metabolically versatile organism with active carbon, nitrogen, and electron transport processes , yet with notable limitations in complex carbohydrate degradation, sulfur metabolism , and complete methanogenesis. 3.10.1 Pathway Analysis The SARSHI1 genome encodes an extensive array of hydrocarbon metabolism encompassing xylene (ko00622), toluene (ko00623), nitrotoluene (ko00633), naphthalene (ko00626), aminobenzoate (ko00627), ethylbenzene (ko00642), styrene (ko00643), and chloroalkane/chloroalkane degradation (ko00625) pathways (Supplementary Information 8) . Moreover, the polycyclic aromatic hydrocarbon (PAH) degradation pathway (ko00624), along with the key aromatic degradation enzyme 3-phenylpropionate/trans-cinnamate dioxygenase ferredoxin reductase (K00529), provides mechanistic insights into detoxifying carcinogenic PAHs. The alkane degradation pathway (ko00071) encoding alkane hydroxylases ( alkB1_2 , alkM ; K00496), 3-hydroxy acyl-CoA dehydrogenase (EC:1.1.1.35), and short-chain acyl-CoA dehydrogenase (K00248) allude well-adapted system for aliphatic hydrocarbon catabolism. Furthermore, partial gene sets encoding adenosylcobalamin (ADO-CBL) biosynthesis within the cobalamin biosynthesis pathway, alongside dye-decolorizing peroxidase (K15733, K00485), demonstrate genome potential in xenobiotics detoxification (Supplementary Information 9) . 3.10.2 Antibiotic Biosynthesis and Antimicrobial Resistance KEGG annotation unveiled a comprehensive metabolic framework governing antibiotic biosynthesis and antimicrobial resistance in SARSHI1. The genome encodes biosynthetic pathways for vancomycin (ko01055), type II polyketide products (KO01057), ansamycins (KO01051), and streptomycin (KO00521), demonstrating its potential for secondary metabolite production. Simultaneously, resistance determinants, including beta-lactam resistance (ko01501), vancomycin resistance (KO01502), CAMP resistance (KO01503), and drug metabolic enzymes such as dimethylaniline monooxygenase (N-oxide forming)/hypotaurine monooxygenase (K00485), delineate adaptive mechanisms that enhance survival under contaminated environments (Supplementary Information 9) . 3.11 Comprehensive Taxonomic Analysis 3.11.1 ANI Analysis The ANI estimates were highly consistent across both comparisons, reinforcing the reliability of the similarity assessment. A slight variation (98.5% vs. 98.6061%) was observed, which is expected due to differences in homologous genome fragment alignment depending on the query-reference roles. Furthermore, the number of matched fragments (1578–1598) relative to the total fragments compared (1706–1898) indicates a significant proportion of shared genomic content (Table 6 ). This high degree of similarity confirms that SARSHI1 and Rhodococcus indonesiensis CSLK01-03 are closely related at the genomic level. Table 6 ANI Analysis Query Genome Reference Genome ANI Estimate (%) Matches Total Comparisons SARSHI1.gff3_genome.assembly Rhodococcus _indonesiensis_CSLK01-03_genomic.fna_assembly 98.5 1578 1898 Rhodococcus _indonesiensis_CSLK01-03_genomic.fna_assembly SARSHI1.gff3_genome.assembly 98.6061 1598 1706 3.11.2 dDDH Analysis The digital DNA-DNA hybridization (dDDH) analysis was performed against the closest reference genome, GCA_030360185.1. The dDDH values were calculated using three distinct formulas, all surpassing the 70% species delineation threshold, thus confirming species-level relatedness (Table 7 ). Using Formula 1, SARSHI1 exhibited a dDDH value of 89.2%, with a model confidence interval (C.I.) of 85.9% – 91.9% and an evolutionary distance of 0.0851. Similarly, Formula 2 generated a dDDH value of 88.2% (C.I.: 85.7% – 90.3%) with a lower evolutionary distance (0.0141), indicating a strong genomic similarity. The probability of dDDH being ≥ 70% under Formula 2 was 97%, further supporting species-level classification. Among these, Formula 3 yielded the highest dDDH value (91.7%, C.I.: 89.2% – 93.7%), with an evolutionary distance of 0.098 and a 99.61% probability of species-level relatedness. Additionally, the G + C content difference between SARSHI1 and the reference genome was 0.07, further reinforcing their taxonomic congruence, confirming that SARSHI1 belongs to the same species as the reference genome GCA_030360185.1. Table 7 dDDH Analysis Formula 1 Formula 2 Formula 3 G + C difference Query genome Reference genome DDH Model C.I. Distance Prob. DDH > = 70% DDH Model C.I. Distance Prob. DDH > = 70% DDH Model C.I. Distance Prob. DDH > = 70% SARSHI1 GCA_030360185.1_ASM3036018v1_genomic 89.2 [85.9–91.9%] 0.0851 97 88.2 [85.7–90.3%] 0.0141 95.15 91.7 [89.2–93.7%] 0.098 99.61 0.07 3.11.3 TYGS Analysis A comprehensive comparative genomic analysis was conducted using the Type Strain Genome Server (TYGS) to determine the taxonomic position of SARSHI1. The highest genomic congruence was observed with Rhodococcus indonesiensis CSLK01-03, exhibiting a digital DNA-DNA hybridization (dDDH) value (d₀) of 89.2% (confidence interval: 85.9–91.9%). Additional similarity metrics, including d₄ (alignment-based similarity) and d₆ (tetranucleotide-based similarity), yielded values of 88.2% and 91.7%, respectively, reinforcing its phylogenetic proximity. Phylogenomic analyses, visualized through TYGA-generated trees, positioned SARSHI1 within the R. indonesiensis clade (Fig. 11 & Fig. 1 Supplementary Information 10). In contrast, other Rhodococcus species exhibited lower dDDH values, with Rhodococcus electrodiphilus LMG 29881 displaying 82.8% similarity, followed by Rhodococcus ruber strains (81.1% for NBRC 15591 and 80.2% for DSM 43338). More distantly related species, such as Rhodococcus phenolicus and Rhodococcus yananensis , demonstrated d₀ values below 30%, underscoring substantial phylogenetic divergence. The established dDDH species delineation threshold of 70% unequivocally classifies SARSHI1 as a member of R. indonesiensis, further corroborated by its minimal G + C content variation (0.09%) (Table 1 Supplementary Information 10). SARSHI1 is categorized under Species Cluster 9, exhibiting a strong genetic affiliation with R. indonesiensis within the Rhodococcus genus. Its genome, approximately 5.7 Mbp in size with a GC content of 70.08%, aligns with known Rhodococcus strains. Notably, the genome encodes 5,170 protein-coding genes, indicative of an expansive metabolic repertoire. This extensive genetic framework implies a potential for specialized enzymatic functions, further emphasizing its adaptive capabilities (Table 2 , Supplementary Information 10). 3.12 Protein Domain and Motifs Analysis InterProScan analysis identified functional domains across the majority of protein sequences (4,884 out of 5,170), with a substantial subset (3,393) further annotated with Gene Ontology (GO) terms ( Fig. 2 Supplementary Information 10 ). Comparative domain and repeat analyses utilizing the SMART, SuperFamily, and PANTHER databases uncovered a diverse repertoire of conserved motifs and protein families associated with hydrocarbon metabolism ( Figs. 3 – 5 , Supplementary Information 10). The identified repeat motifs, including Hexapeptide, Pentapeptide, Ankyrin, and Tetratricopeptide, were prevalent. Notably, pyrrolo-quinoline quinone and WD40-like beta-propeller repeats, implicated in hydrocarbon degradation, were identified. Additionally, key enzymes of the meta-cleavage pathway were identified, including 4-Hydroxy-2-Oxovalerate Aldolase (MF_01656), which facilitates aromatic ring cleavage, and acetaldehyde dehydrogenase (MF_01657), responsible for the oxidation of aldehydes into carboxylic acids, a critical step in both aliphatic and aromatic hydrocarbon catabolism. The efficient processing of aldehyde intermediates is a hallmark of hydrocarbon-degrading bacterial species, underscoring the metabolic specialization of SARSHI1. Functional annotation using the HAMAP database further confirmed the presence of key enzyme families involved in hydrocarbon metabolism and broader metabolic processes. A significant fraction of the dataset (87.17%) was categorized as "others," reflecting the extensive functional diversity of the identified domains (Fig. 12 ). Furthermore, the repeat regions with unknown biochemical functions, such as DUF308 and DUF349, suggest potential novel functionalities that warrant further experimental investigation (Fig. 13 ). These findings collectively underscore the functional complexity and biochemical versatility encoded within the SARSHI1 genome, particularly in the context of hydrocarbon degradation networks. 3.13 Annotation of Hydrocarbon Degradation Genes 3.13.1 CANT_HYD Analysis The CANT_HYD analysis identifies a diverse repertoire of monooxygenases, dioxygenases, dehydrogenases, and reductases, highlighting the metabolic versatility of SARSHI1 in both aerobic and anaerobic hydrocarbon degradation. Key gene families, including AlkB, AhyA, AlmA_GroupI, LadAB, NdoBC, BmoXXY , and TmoAE , play crucial roles in alkane oxidation, aromatic hydrocarbon catabolism, and sulfur/nitrogen-containing hydrocarbon metabolism. Among the most prominent findings, the AlkB gene family encoding alkane 1-monooxygenases, emerged as a principal enzymatic group, with high-confidence homologs such as ENBDON_05321 (E-value: 6.60E-219, Score: 725.9) and ENBDON_05322 (E-value: 2.10E-190, Score: 631.9). These enzymes facilitate the initial oxidation of alkanes to alcohols, a critical step in hydrocarbon degradation. The co-occurrence of rubredoxin systems (ENBDON_04732, E-value: 2.50E-11, Score: 41.3) and AlmA_ GroupI genes, including FAD-containing monooxygenase EthA (ENBDON_05147, E-value: 3.90E-205, Score: 680.3), further supports metabolic adaptability under varying environmental conditions. Additionally, propane monooxygenase components ( PrmA, PrmC, sBmoX, TmoA_BmoA, TomA1, TomA3 ) define a well-established propane oxidation pathway, initiating hydroxylation to propanol, followed by downstream metabolism. The LadA_alpha and LadA_beta enzymes, containing luciferase-like domains and flavin-dependent oxidoreductases, indicate the capability for long-chain alkane metabolism, a characteristic of thermophilic bacterial systems. Genes encoding Rieske-type oxygenases ( NdoB, NdoC ) and benzoate 1,2-dioxygenases delineate putative pathways for the degradation of naphthalene, benzene, and other aromatic hydrocarbons. These enzymes catalyze oxygenation reactions, converting aromatic compounds into catechols or dihydroxylated intermediates, which are further metabolized through the β-ketoadipate and ring-cleavage pathways. The identification of vanillate O-demethylase indicates a broader substrate spectrum for hydrocarbon catabolism. DszC (acyl-CoA dehydrogenase) and EbdA -associated reductases contribute to dibenzothiophene (DBT) and organosulfur compound degradation, emphasizing their role in bio-desulfurization. Nitrate reductases ( EbdA, CmdA , and putative nitrate/sulfite reductases) highlight hydrocarbon oxidation in oxygen-limited conditions. The dimethyl sulfoxide (DMSO) and trimethylamine N-oxide (TMAO) reductases suggest pathways for organosulfur compound utilization, with potential applications in petroleum bioremediation. Several enzymes rely on flavin (FAD, FMN), molybdenum (Mo), iron-sulfur clusters (Fe-S), and NAD(P)H as cofactors, promoting electron transfer in oxidation-reduction reactions. The characterization of F420-dependent oxidoreductases and glucose-6-phosphate dehydrogenase (coenzyme-F420) highlights their role in hydrocarbon oxidation via electron shuttling mechanisms. These findings illustrate a highly versatile enzymatic system capable of degrading a wide range of hydrocarbons, including alkanes (short- to long-chain), aromatics (benzene, naphthalene, vanillate), and organosulfur compounds. The extensive enzymatic network integrates hydroxylation, oxidation, ring-cleavage, and anaerobic respiration, ensuring efficient hydrocarbon mineralization under diverse environmental conditions (Supplementary Information 11). 3.13.2 HADEG Analysis The HADEG analysis elucidated a diverse repertoire of genes implicated in the aerobic degradation of aromatic and aliphatic hydrocarbons, underscoring the metabolic versatility of Rhodococcus indonesiensis SARSHI1. Genes involved in benzoate ( benA , benB , benC , benD ), biphenyl ( bphF , bphI , bphX2 ), and toluene ( xylC ) degradation exemplify the genomic potential for aromatic hydrocarbon mineralization. Furthermore, genes associated with the catechol ( catA , catC , pcaJ ) and protocatechuate ( pcaG , pcaH , pcaR ) pathways facilitate ortho-cleavage, and the gentisate pathway ( nagK , nagL , xlnE ) delineates the meta-cleavage essential for the complete mineralization of aromatic compounds. Moreover, the abundance of 4-hydroxyphenylacetate degradation genes ( hpaB ) enhances metabolic adaptability, enabling the catabolism of a broad range of aromatic substrates. The presence of genes encoding alkane monooxygenases ( alkB , alkG_rubA3_rdx ) and Baeyer–Villiger monooxygenases (BVMO, Q9I3H5 ) establishes a robust metabolic framework for alkane oxidation. The identified pathways define both terminal oxidation ( almA , ladA ) and subterminal oxidation ( prmA , prmB , prmC , prmD ) as principal mechanisms for aliphatic hydrocarbon degradation. Furthermore, genes associated with the Finnerty pathway ( AeAB_ahpC , AeAB_ahpF ) substantiate bacterial adaptation for ω-oxidation, a characteristic metabolic strategy of Rhodococcus species. Additionally, the ssuD gene, implicated in sulfur oxidation, represents an auxiliary factor in hydrocarbon metabolism, potentially enhancing survival in sulfur-rich environments. The presence of multiple copies of key degradation genes demonstrates a well-developed and highly redundant genetic framework for hydrocarbon catabolism (Supplementary Information 12 & Fig. 1 , Supplementary Information 13). Moreover, gene distribution patterns indicate terminal oxidation as the predominant mechanism employed by SARSHI1 for hydrocarbon degradation ( Fig. 14 ). 3.13.3 HMDB Analysis The Hydrocarbon Monooxygenase Database (HMDB) analysis identified multiple genes encoding hydrocarbon monooxygenases essential for alkane oxidation. Sequence similarity analysis revealed homologs of alkane 1-monooxygenase ( alkB ) and components of the rubredoxin-dependent hydroxylation system ( prmA, prmB, prmC, and prmD ), with high sequence identity. For instance, ENBDON_05051 demonstrated 100% identity to prmA (B5D5P6), while ENBDON_05050, ENBDON_05049, and ENBDON_05048 exhibited 94.81%, 95.66%, and 92.79% identity, respectively, to known rubredoxin-related genes. The high sequence identity and low E-values (≤ 0.0) further support the functional relevance of these genes, confirming that SARSHI1 harbors a complete and functionally intact hydrocarbon monooxygenase system. The key statistics, including top HMDB hits, sequence identity, alignment statistics, and functional annotations, are provided in Table 8 . Table 8 HMDB Analysis Query Seq-id Subject Seq-id % Identity Alignment Length Mismatches Gaps Query Subject E-value Bit Score Global % Identity Start End Start End ENBDON_03995 A0A098BFN3-A0A098BFN3_9NOCA-alkB 97.389 383 10 0 1 383 1 383 0.0 761 97.38 ENBDON_04319 A0A098BST3-A0A098BST3_9NOCA-alkB 98.280 407 7 0 1 407 1 407 0.0 827 98.28 ENBDON_05048 A0A866VUU8-A0A866VUU8_9NOCA-prmD 92.793 111 8 0 3 113 1 111 9.27e-76 218 92.79 ENBDON_05049 A0A866W1M4-A0A866W1M4_9NOCA-prmC 95.664 369 15 1 1 368 1 369 0.0 739 95.93 ENBDON_05050 A0A866W2B3-A0A866W2B3_9NOCA-prmB 94.813 347 18 0 1 347 1 347 0.0 680 94.81 ENBDON_05051 B5D5P6-B5D5P6_9NOCA-prmA 100 542 0 0 1 542 1 542 0.0 1135 100 3.14 Aerobic Degradation of Aliphatic and Aromatic Hydrocarbons Petroleum hydrocarbons comprise a complex mixture of long-chain alkanes, benzene, toluene, xylene, and biphenyl derivatives. The microbial degradation of these hydrocarbons follows a sequential process involving uptake, oxidation, and cleavage, ultimately channeling metabolic intermediates into the tricarboxylic acid (TCA) cycle[ 74 ]. The initial uptake of these hydrophobic compounds is facilitated by biosurfactant production or specialized transport mechanisms that enhance solubility and membrane translocation. The degradation of aliphatic hydrocarbons predominantly occurs via terminal and subterminal oxidation pathways. In terminal oxidation, key genes such as alkB , almA , and ladA encode alkane hydroxylases that catalyze the hydroxylation of terminal carbon atoms, yielding primary alcohols that are subsequently oxidized to carboxylic acids before entering the β-oxidation pathway. For short-chain alkanes, the prmABCD gene cluster encodes propane monooxygenase, which facilitates hydroxylation via a subterminal oxidation mechanism, producing secondary alcohols that are subsequently converted to ketones and metabolized through β-oxidation[ 75 ]. Additionally, Baeyer-Villiger monooxygenases (BVMOs) play a crucial role in metabolizing ketones and other cyclic intermediates. Gram-positive bacteria such as Rhodococcus employ an alternative ω-oxidation (Finnerty) pathway for long-chain alkane degradation, generating dicarboxylic acids as intermediates. Aromatic hydrocarbon degradation follows structurally distinct metabolic pathways. Benzene, toluene, xylene, and biphenyl compounds undergo initial hydroxylation catalyzed by dioxygenases, generating dihydroxylated intermediates such as catechol, protocatechuate, and gentisate, which subsequently undergo ring cleavage[ 76 ]. The ben gene cluster ( benA , benB , benC ) encodes benzoate dioxygenase, which hydroxylates benzoate to catechol or protocatechuate. Similarly, the bph gene cluster ( bphA , bphB , bphC , bphD ) encodes biphenyl dioxygenase and associated enzymes, facilitating the oxidation of biphenyl into hydroxylated intermediates. The xylC gene encodes benzyl alcohol dehydrogenase, a key enzyme in toluene and xylene degradation, catalyzing the oxidation of benzyl alcohol derivatives into their corresponding aldehydes. The central pathways for aromatic hydrocarbon catabolism include the catechol and protocatechuate pathways, which proceed via two primary mechanisms: ortho-cleavage (β-ketoadipate pathway) and meta-cleavage. In the ortho-cleavage pathway, catechol ( catABC ) and protocatechuate ( pcaG , pcaH , pcaC ) are metabolized via the β-ketoadipate pathway before their subsequent assimilation into the TCA cycle. Alternatively, the gentisate pathway involves the hydroxylation of aromatic compounds into gentisate, followed by ring cleavage mediated by gentisate 1,2-dioxygenase, ultimately directing intermediates into central metabolism ( Fig. 15 ). The Rhodococcus indonesiensis SARSHI1 strain exhibits a highly versatile hydrocarbon degradation capacity, possessing key enzymatic systems for the complete mineralization of both aliphatic and aromatic hydrocarbons. A detailed list of hydrocarbon metabolism-associated genes is provided in Supplementary Information 14& 15. 3.15 Secondary Metabolite Analysis A comprehensive antiSMASH analysis identified 19 distinct biosynthetic gene clusters (BGCs), underscoring the genomic potential for diverse secondary metabolite production ( Fig. 16 ). These BGCs encompass non-ribosomal peptide synthetases (NRPS), polyketide synthases (PKS), terpenes, ectoine, redox cofactors, and other metabolites, suggesting a role in antibiotic biosynthesis, stress adaptation, and metabolic flexibility. In Region 1.3, only one cluster, NAPAA, was identified with high similarity and is associated with the biosynthesis of ε-Poly-L-lysine, an antimicrobial compound. The presence of multiple NRPS and PKS clusters (Regions 1.1, 1.5, 1.10, 1.11, 1.13, 1.15, and 1.18) suggests the potential biosynthesis of antibiotics or bioactive compounds, with Region 1.11 (PKS) specifically linked to stenothricin, a known antimicrobial agent. In Region 1.2, the identified betalactone BGC gene order and color patterns are distinct from existing BGCs in Rhodococcus ruber , suggesting the synthesis of a betalactone variant. A similar trend was observed in polyketide biosynthetic clusters, particularly Region 1.6, which harbors multiple genes with low sequence similarity (< 60%) to previously characterized lasso peptide BGCs, indicating the potential for novel structural variant biosynthesis. The four distinct terpene clusters (Regions 1.7, 1.9, 1.14, and 1.17) suggest possible involvement in antimicrobial activity. Notably, Regions 1.9, 1.14, and 1.17 exhibit minimal sequence similarity (< 50%) to known terpene BGCs, implying the presence of a divergent or modified terpene biosynthesis pathway. Region 1.16 contains redox cofactor-related genes with moderate homology (0.40–0.49) to characterized BGCs, suggesting a unique oxidative metabolic function. Furthermore, Region 1.19 exhibits low similarity (0.35–0.41) to known clusters, potentially indicating the biosynthesis of a butyrolactone-like compound. The consistently low sequence homology observed across multiple BGCs highlights the versatile metabolic potential of SARSHI1 for the biosynthesis of structurally distinct metabolites or novel bioactive compounds. 4. Discussion Oil, as a non-renewable resource, plays a crucial role in global economic stability and development. However, its extraction, transportation, refining, and disposal pose significant environmental and health risks, primarily due to contamination by toxic petroleum hydrocarbons, including polycyclic aromatic hydrocarbons (PAHs), resins, and asphaltenes [ 77 ]. These pollutants accumulate in ecosystems, bioaccumulate through food chains, and pose risk to humans. Traditional oil remediation methods and emerging technologies such as incineration, solvent extraction, electrical remediation, and chemical leaching mitigate visible spills but are restrained by high costs and secondary pollution[ 78 ]. The rate of petroleum degradation in the environment depends on factors such as oil composition, concentration, environmental conditions, and microbial community structure. For example, the half-life of low molecular weight PAHs varies between 1.5 to 5.5 weeks in soils with 1–2% hydrocarbon content but increases to 2.5 to 52 weeks in soils with higher contamination levels [ 79 ]. Microbial bioremediation has emerged as an effective and sustainable approach. Microorganisms utilize hydrocarbons as the sole carbon and energy source and generate non-toxic end products. Advancements in multi-omics technologies have provided deeper insights into microbial metabolic pathways and the genetic adaptations that enable bacteria to degrade petroleum hydrocarbons efficiently. Several studies have identified diverse hydrocarbon-degrading bacteria with significant metabolic capabilities. For instance, Hossain et al. isolated 26 bacterial strains capable of utilizing polycyclic aromatic hydrocarbons (PAHs) and petroleum hydrocarbons as carbon sources. Notably, Pseudomonas citronellolis and Comamonas thiooxydans harbored the highest number of hydrocarbon-degrading enzymes, including dioxygenases, monooxygenases, hydroxylases, and dehydrogenases [ 26 ]. Delegan et al. conducted a complete genome analysis of previously isolated Rhodococcus opacus S8, identifying genes involved in alkane degradation, surfactant biosynthesis, and low-temperature adaptation. A key discovery was the strain’s ability to degrade hexadecane under oxygen-limited conditions, facilitated by the formation of bacterial micro conglomerates[ 80 ]. In the present study, we isolated a novel petroleum hydrocarbon-degrading strain, Rhodococcus indonesiensis SARSHI1, from Nacharam, Hyderabad, and performed high-precision whole-genome sequencing using hybrid sequencing. The resulting complete genome assembly, free from gaps or missing fragments, represents the first complete genome of R. indonesiensis to our knowledge. The assembled genome is 5.7 Mbp with a circular chromosome and plasmid. The genome quality analysis confirmed the predominance of single-copy functional protein-coding genes, with minimal duplication. Comprehensive genome annotation revealed a functional gene abundance, particularly monooxygenases, dioxygenases, and metal resistance, crucial for strain adaptability in petroleum-contaminated environments and hydrocarbon degradation. Functional annotation revealed a limited carbohydrate metabolism and constrained methanogenesis, suggesting a genome tailored for hydrocarbon metabolism rather than broad-spectrum nutrient assimilation. Notably, SARSHI1 exhibits a broad metabolic potential for degrading a wide spectrum of aromatic hydrocarbons, including xylene, toluene, aminobenzoate, chloroalkane, and polycyclic aromatic hydrocarbons (PAHs). The CANT_HYD analysis unveiled an extensive repertoire of hydrocarbon-degrading gene families, including AlkB, AhyA, AlmA_GroupI, LadAB,benABCD, bphFIX2 NdoBC , BmoXXY, and TmoAE , exhibiting high sequence similarity and significant abundance. The presence of catechol ( catA, catC, pcaJ ) and protocatechuate ( pcaG, pcaH, pcaR ), clusters, ortho cleavage, indicating complete mineralization of aromatic hydrocarbons. In addition, an abundance of alkane hydroxylases, alkane monooxygenases ( alkG_rubA3_rdx ), DNA repair-associated alkane metabolism, terminal oxidation ( almA, LadA ), subterminal oxidation ( prmA, prmB, prmC, prmD ), and Finnerty pathway ( ahpC, ahpF ) gene clusters pinpoint the prevalence of terminal oxidation as the primary metabolic pathway in aliphatic hydrocarbons degradation. Beyond hydrocarbon degradation, the genome harbors multiple hypothetical proteins and functionally uncharacterized domains, indicating potential novel metabolic functions. The genome encodes genes associated with antibiotic biosynthesis and resistance, including β-lactam, vancomycin, and drug-metabolizing enzymes, positing a competitive ecological advantage in hydrocarbon-rich environments. Furthermore, the genome possesses biosynthetic gene clusters (BGCs) encoding secondary metabolites, which may contribute to bacterial adaptation, stress tolerance, and antimicrobial activity. Several identified BGCs exhibit low sequence similarity to known clusters, suggesting the potential for the biosynthesis of novel structural variants of existing bioactive compounds, thereby pinpointing adaptive mechanisms for survival in hydrocarbon-rich environments. This study presents the first complete genome of Rhodococcus indonesiensis SARSHI1, offering comprehensive insights into its extensive hydrocarbon degradation potential, genomic resilience, and adaptive mechanisms in petroleum-contaminated environments. These findings establish SARSHI1 as a promising candidate for microbial bioremediation, reinforcing its significance in sustainable solutions. 5. Conclusion The study presents the first complete genome sequence of Rhodococcus indonesiensis SARSHI1, elucidating its extensive capacity for hydrocarbon degradation. A comprehensive genomic analysis reveals a diverse repertoire of monooxygenases, dioxygenases, and key gene clusters involved in the degradation of aliphatic and aromatic hydrocarbons. The SARSHI1 genome harbors bph , ben, and xyl gene clusters for aromatic hydrocarbons and alkB, LadA , AlmA , and prm clusters for aliphatic hydrocarbons degradation. The predominance of terminal oxidation pathways coupled with multiple metal resistance and detoxification mechanisms underscores strain ecological adaptability to hydrocarbon-rich environments. Furthermore, the gene clusters associated with antibiotic production and stress tolerance suggest a competitive advantage in extreme environmental conditions. Notably, the genome harbors several functionally uncharacterized domains, along with biosynthetic clusters exhibiting variations that may encode novel structural variants of secondary metabolites, further enhancing its biotechnological potential. The complete genome sequence has been deposited in GenBank under accession numbers CP180630 (chromosome) and CP180631 (plasmid). The raw sequencing reads have been submitted to the Sequence Read Archive (SRA), NCBI, under accession numbers SRX27520007 (Illumina) and SRX27520006 (ONT). Declarations CONFLICTS OF INTEREST/COMPETING INTERESTS: The authors affirm that there are no known competing financial interests or personal relationships that could have influenced the work reported in this paper. ETHICS APPROVAL AND CONSENT TO PARTICIPATE: Not applicable. HUMAN AND ANIMAL RIGHTS: No animals or humans were used in the studies that were the basis of this research. ACKNOWLEDGMENTS: The authors are thankful to Eminent Biosciences and LeGene Biosciences Pvt Ltd, Indore, India for 16S rRNA sequencing, Whole Genome Sequencing, and De novo assembly of the bacterium. FUNDING: We do not receive any funding for this study. AUTHOR CONTRIBUTIONS: S.A.U.Z.: Contributed to the conceptualization, Investigation, Methodology, Sample Collection, NGS Data analysis, Validation, Visualization and original draft writing of the manuscript. S.A.U.Z. also participated in reviewing and editing the manuscript. K.S. : Participated in composition of the initial draft and were also involved in reviewing and editing the manuscript. A.N.: Involved in Conceptualization, Investigation, Methodology, Project administration, Supervision, Writing – review & editing. K.M.K. and R.B.: Conducted the investigation, provided supervision, and contributed to the review and editing of the manuscript. Availability of data and materials: This whole-genome project has been deposited in GenBank under the Accession: CP180630 and CP180631 (plasmid). The read sequences have been deposited under BioProject Accession: PRJNA1217105, BioSample Accession: SAMN46479200, and SRA Accession: SRX27520007 and SRX27520006. Moreover, the queries can be directed to the corresponding author for any clarifications about the study if needed. WGS URL:https://www.ncbi.nlm.nih.gov/nuccore/CP180630 Plasmid URL: https://www.ncbi.nlm.nih.gov/nuccore/CP180631 BioProject URL: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1217105 BioSample URL: https://www.ncbi.nlm.nih.gov/biosample/SAMN46479200 SRA URL (Illumina): https://www.ncbi.nlm.nih.gov/sra/?term=SRX27520007 SRA URL (ONT): https://www.ncbi.nlm.nih.gov/sra/?term=SRX27520006 ETHICS APPROVAL AND CONSENT TO PARTICIPATE Not applicable. CONSENT FOR PUBLICATION Not applicable. COMPETING INTERESTS: The authors declare no competing interests. References A. Imam, P. K. Kanaujia, A. Ray, and S. K. Suman, “Removal of Petroleum Contaminants Through Bioremediation with Integrated Concepts of Resource Recovery: A Review,” Indian J Microbiol , vol. 61, no. 3, pp. 250–261, Sep. 2021, doi: 10.1007/s12088-021-00928-4. B. Narayan Thorat and R. Kumar Sonwani, “Current technologies and future perspectives for the treatment of complex petroleum refinery wastewater: A review,” Bioresource Technology , vol. 355, p. 127263, Jul. 2022, doi: 10.1016/j.biortech.2022.127263. S. Kuppusamy, N. R. Maddela, M. Megharaj, and K. Venkateswarlu, “Ecological Impacts of Total Petroleum Hydrocarbons,” in Total Petroleum Hydrocarbons , Cham: Springer International Publishing, 2020, pp. 95–138. doi: 10.1007/978-3-030-24035-6_5. Y. Wei, D. Ding, K. Qu, J. Sun, and Z. Cui, “Ecological risk assessment of heavy metal pollutants and total petroleum hydrocarbons in sediments of the Bohai Sea, China,” Marine Pollution Bulletin , vol. 184, p. 114218, Nov. 2022, doi: 10.1016/j.marpolbul.2022.114218. S. Kuppusamy, N. R. Maddela, M. Megharaj, and K. Venkateswarlu, “Fate of Total Petroleum Hydrocarbons in the Environment,” in Total Petroleum Hydrocarbons , Cham: Springer International Publishing, 2020, pp. 57–77. doi: 10.1007/978-3-030-24035-6_3. D. Pal and S. Sen, “Emerging Petroleum Pollutants and Their Adverse Effects on the Environment,” in Impact of Petroleum Waste on Environmental Pollution and its Sustainable Management Through Circular Economy , I. D. Behera and A. P. Das, Eds., in Environmental Science and Engineering. , Cham: Springer Nature Switzerland, 2023, pp. 103–137. doi: 10.1007/978-3-031-48220-5_5. H. Gao, M. Wu, H. Liu, Y. Xu, and Z. Liu, “Effect of petroleum hydrocarbon pollution levels on the soil microecosystem and ecological function,” Environmental Pollution , vol. 293, p. 118511, Jan. 2022, doi: 10.1016/j.envpol.2021.118511. S. Adipah, “Introduction of Petroleum Hydrocarbons Contaminants and its Human Effects,” Journal of Environmental Science and Public Health , vol. 3, no. 1, pp. 1–9, Jan. 2019. A. K. Pandey et al. , “Multipronged evaluation of genotoxicity in Indian petrol‐pump workers,” Environ and Mol Mutagen , vol. 49, no. 9, pp. 695–707, Dec. 2008, doi: 10.1002/em.20419. H. I. Abdel-Shafy and M. S. M. Mansour, “A review on polycyclic aromatic hydrocarbons: Source, environmental impact, effect on human health and remediation,” Egyptian Journal of Petroleum , vol. 25, no. 1, pp. 107–123, Mar. 2016, doi: 10.1016/j.ejpe.2015.03.011. I. C. Ossai, A. Ahmed, A. Hassan, and F. S. Hamid, “Remediation of soil and water contaminated with petroleum hydrocarbon: A review,” Environmental Technology & Innovation , vol. 17, p. 100526, Feb. 2020, doi: 10.1016/j.eti.2019.100526. N. Das and P. Chandran, “Microbial Degradation of Petroleum Hydrocarbon Contaminants: An Overview,” Biotechnology Research International , vol. 2011, pp. 1–13, Sep. 2011, doi: 10.4061/2011/941810. S. Varjani, A. Pandey, and V. N. Upasani, “Petroleum sludge polluted soil remediation: Integrated approach involving novel bacterial consortium and nutrient application,” Science of The Total Environment , vol. 763, p. 142934, Apr. 2021, doi: 10.1016/j.scitotenv.2020.142934. A. K. Bej, D. Saul, and J. Aislabie, “Cold-tolerant alkane-degrading Rhodococcus species from Antarctica,” Polar Biology , vol. 23, no. 2, pp. 100–105, Jan. 2000, doi: 10.1007/s003000050014. J. A. Viesser, M. H. Sugai-Guerios, L. C. Malucelli, M. R. Pincerati, S. G. Karp, and L. T. Maranho, “Petroleum-Tolerant Rhizospheric Bacteria: Isolation, Characterization and Bioremediation Potential,” Sci Rep , vol. 10, no. 1, p. 2060, Feb. 2020, doi: 10.1038/s41598-020-59029-9. M. S. Kuyukina and I. B. Ivshina, “Bioremediation of Contaminated Environments Using Rhodococcus,” in Biology of Rhodococcus , vol. 16, H. M. Alvarez, Ed., in Microbiology Monographs, vol. 16. , Cham: Springer International Publishing, 2019, pp. 231–270. doi: 10.1007/978-3-030-11461-9_9. X. Chen, G. Shan, J. Shen, F. Zhang, Y. Liu, and C. Cui, “In situ bioremediation of petroleum hydrocarbon–contaminated soil: isolation and application of a Rhodococcus strain,” Int Microbiol , vol. 26, no. 2, pp. 411–421, Dec. 2022, doi: 10.1007/s10123-022-00305-1. M. T. Nazari et al. , “Rhodococcus: A promising genus of actinomycetes for the bioremediation of organic and inorganic contaminants,” Journal of Environmental Management , vol. 323, p. 116220, Dec. 2022, doi: 10.1016/j.jenvman.2022.116220. M. Kaur, V. Singh, A. Khan, K. Sharma, F. J. B. Mendoonca Junior, and A. Nayarisseri, “Navigating the genomic landscape: A deep dive into clinical genetics with deep learning,” in Deep Learning in Genetics and Genomics , Elsevier, 2025, pp. 185–224. doi: 10.1016/B978-0-443-27574-6.00006-0. A. Nayarisseri et al. , “Impact of Next-Generation Whole-Exome sequencing in molecular diagnostics,” Drug Invention Today , vol. 5, no. 4, pp. 327–334, Dec. 2013, doi: 10.1016/j.dit.2013.07.005. H. Tilgner et al. , “Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events,” Nat Biotechnol , vol. 33, no. 7, pp. 736–742, Jul. 2015, doi: 10.1038/nbt.3242. S. Oikonomopoulos, Y. C. Wang, H. Djambazian, D. Badescu, and J. Ragoussis, “Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations,” Sci Rep , vol. 6, no. 1, p. 31602, Aug. 2016, doi: 10.1038/srep31602. N. De Maio et al. , “Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes,” Microbial Genomics , vol. 5, no. 9, Sep. 2019, doi: 10.1099/mgen.0.000294. Y. Lu et al. , “Hybrid Clustering of Long and Short-read for Improved Metagenome Assembly,” Jan. 26, 2021. doi: 10.1101/2021.01.25.428115. D. Das et al. , “Complete genome sequence analysis of Pseudomonas aeruginosa N002 reveals its genetic adaptation for crude oil degradation,” Genomics , vol. 105, no. 3, pp. 182–190, Mar. 2015, doi: 10.1016/j.ygeno.2014.12.006. M. S. Hossain, B. Iken, and R. Iyer, “Whole genome analysis of 26 bacterial strains reveals aromatic and hydrocarbon degrading enzymes from diverse environmental soil samples,” Sci Rep , vol. 14, no. 1, p. 30685, Dec. 2024, doi: 10.1038/s41598-024-78564-3. A. Nayarisseri, P. Singh, and S. K. Singh, “Screening, isolation and characterization of biosurfactant-producing Bacillus tequilensis strain ANSKLAB04 from brackish river water,” Int. J. Environ. Sci. Technol. , vol. 16, no. 11, pp. 7103–7112, Nov. 2019, doi: 10.1007/s13762-018-2089-9. Krishnan and A. Nayarisseri, “Biodegradation effects of o-cresol by Pseudomonas monteilii SHY on mustard seed germination,” Bioinformation , vol. 14, no. 06, pp. 271–278, Jun. 2018, doi: 10.6026/97320630014271. A. Nayarisseri, R. Khandelwal, and S. K. Singh, “Identification and Characterization of Lipopeptide Biosurfactant Producing Microbacterium sp Isolated from Brackish River Water,” CTMC , vol. 20, no. 24, pp. 2221–2234, Nov. 2020, doi: 10.2174/1568026620666200628144716. M. Mohan, S. Kozhithodi, and A. Nayarisseri, “Screening, Purification and Characterization of Protease Inhibitor from Capsicum frutescens,” Bioinformation , vol. 14, no. 06, pp. 285–293, Jun. 2018, doi: 10.6026/97320630014285. A. Nayarisseri et al. , “IDENTIFICATION AND CHARACTERIZATION OF NEUTRAL PROTEASE PRODUCING Paenibacillus Polymyxa SPECIES EMBS024 BY 16S rRNA GENE SEQUENCING,” Int J of Micr Res , vol. 4, no. 5, pp. 236–239, Jun. 2012, doi: 10.9735/0975-5276.4.5.236-239. A. Ravi et al. , “Characterization of petroleum degrading bacteria and its optimization conditions on effective utilization of petroleum hydrocarbons,” Microbiological Research , vol. 265, p. 127184, Dec. 2022, doi: 10.1016/j.micres.2022.127184. A. Nayarisseri, A. Suppahia, A. G. Nadh, and A. S. Nair, “Identification and Characterization of a Pesticide Degrading Flavobacterium Species EMBS0145 by 16S rRNA Gene Sequencing,” Interdiscip Sci Comput Life Sci , vol. 7, no. 2, pp. 93–99, Jun. 2015, doi: 10.1007/s12539-015-0016-z. A. Nayarisseri and S. K. Singh, “Genome analysis of biosurfactant producing bacterium, Bacillus tequilensis,” PLoS ONE , vol. 18, no. 6, p. e0285994, Jun. 2023, doi: 10.1371/journal.pone.0285994. Qiagen, “Qiagen. (2020). QIAamp DNA Mini Kit (Catalog No. 51304). Qiagen. Available at:,” 2020, [Online]. Available: https://www.qiagen.com/us/products/dna-analysis/dna-purification/qiamp-dna-mini-kit Date,L.E, “Date, L. E. Thermo Scientific CloneJET PCR Cloning Kit.,” 2017. P. Amareshwari et al. , “Isolation and characterization of a novel chlorpyrifos degrading flavobacterium species EMBS0145 by 16S rRNA gene sequencing,” Interdiscip Sci Comput Life Sci , vol. 7, no. 1, pp. 1–6, Mar. 2015, doi: 10.1007/s12539-012-0207-9. A. Ns, S. Mk, M. Yadav, and J. K, “IDENTIFICATION AND CHARACTERIZATION OF PROTEASES AND AMYLASES PRODUCING Bacillus licheniformis STRAIN EMBS026 BY 16S rRNA GENE SEQUENCING,” Int J of Micr Res , vol. 4, no. 5, pp. 231–235, Jun. 2012, doi: 10.9735/0975-5276.4.5.231-235. “GeneJET Gel Extraction Kit,” 2015, [Online]. Available: https://www.thermofisher.com/order/catalog/product/K0691 “DNA Baser v5.15(),” 2022, [Online]. Available: DNA Baser v5.15(2022), SciVance Technologies, www.DnaBaser.com H. Chandok et al. , “Screening, Isolation and Identification of Probiotic Producing Lactobacillus acidophilus Strains EMBS081 & EMBS082 by 16S rRNA Gene Sequencing,” Interdiscip Sci Comput Life Sci , vol. 7, no. 3, pp. 242–248, Sep. 2015, doi: 10.1007/s12539-015-0002-5. P. Rice, I. Longden, and A. Bleasby, “EMBOSS: The European Molecular Biology Open Software Suite,” Trends in Genetics , vol. 16, no. 6, pp. 276–277, Jun. 2000, doi: 10.1016/S0168-9525(00)02024-2. M. Bhatia, A. Girdhar, A. Tiwari, and A. Nayarisseri, “Implications of a novel Pseudomonas species on low density polyethylene biodegradation: an in vitro to in silico approach,” SpringerPlus , vol. 3, no. 1, p. 497, Dec. 2014, doi: 10.1186/2193-1801-3-497. A. G. Nadh et al. , “Identification of Azo Dye Degrading Sphingomonas Strain EMBS022 and EMBS023 Using 16S rRNA Gene Sequencing,” CBIO , vol. 10, no. 5, pp. 599–605, Nov. 2015, doi: 10.2174/1574893610666151008012312. A. N. Pyde et al. , “Identification and characterization of foodborne pathogen Listeria monocytogenes strain Pyde1 and Pyde2 using 16S rRNA gene sequencing,” Journal of Pharmacy Research , vol. 6, no. 7, pp. 736–741, Jul. 2013, doi: 10.1016/j.jopr.2013.07.009. K. P. Shah, K. H. Chandok, P. Rathore, M. V. Sharma, M. Yadav, and S. A. Nayarisseri, “Screening, Isolation and Identification of Polygalacturonase Producing Bacillus tequilensis Strain EMBS083 Using 16S rRNA Gene Sequencing,” 2013. K. Sharma, A. Nayarisseri, and S. K. Singh, “Biodegradation of plasticizers by novel strains of bacteria isolated from plastic waste near Juhu Beach, Mumbai, India,” Sci Rep , vol. 14, no. 1, p. 30824, Dec. 2024, doi: 10.1038/s41598-024-81239-8. K. Venkatesh, D. Lajwanti, Sandhya. P. Kiran, D. V. Raje, and A. Nayarisseri, “Differentially expressed genes in tumors of prostate cancer in American patients with European and African origin,” Journal of Pharmacy Research , vol. 6, no. 5, pp. 583–588, May 2013, doi: 10.1016/j.jopr.2013.04.036. Agilent Technologies, “4200 TapeStation System,” 2015, [Online]. Available: https://www.agilent.com/en/product/automated-electrophoresis/tapestation-systems/tapestation-instruments/4200-tapestation-system-228263 S. K. Pradhan et al. , “Illumina MiSeq based assessment of bacterial community structure and diversity along the heavy metal concentration gradient in Sukinda chromite mine area soils, India,” Ecological Genetics and Genomics , vol. 15, p. 100054, May 2020, doi: 10.1016/j.egg.2020.100054. N. Versmessen et al. , “Average Nucleotide Identity and Digital DNA-DNA Hybridization Analysis Following PromethION Nanopore-Based Whole Genome Sequencing Allows for Accurate Prokaryotic Typing,” Diagnostics , vol. 14, no. 16, p. 1800, Aug. 2024, doi: 10.3390/diagnostics14161800. ANDREWS, S, “Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data.,” 2010, [Online]. Available: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ A. M. Bolger, M. Lohse, and B. Usadel, “Trimmomatic: a flexible trimmer for Illumina sequence data,” Bioinformatics , vol. 30, no. 15, pp. 2114–2120, Aug. 2014, doi: 10.1093/bioinformatics/btu170. A. Bankevich et al. , “SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing,” Journal of Computational Biology , vol. 19, no. 5, pp. 455–477, May 2012, doi: 10.1089/cmb.2012.0021. Wick, R, “Wick, R. R. Filtlong: Read Trimming and Filtering Tool for Long Reads. GitHub Repository.,” 2021, [Online]. Available: https://github.com/rrwick/Filtlong R. Wick, “Wick, R. R. Porechop: Adapter Trimmer for Oxford Nanopore Reads. GitHub Repository.,” 2021, [Online]. Available: https://github.com/rrwick/Porechop. R. R. Wick, L. M. Judd, C. L. Gorrie, and K. E. Holt, “Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads,” PLoS Comput Biol , vol. 13, no. 6, p. e1005595, Jun. 2017, doi: 10.1371/journal.pcbi.1005595. A. Gurevich, V. Saveliev, N. Vyahhi, and G. Tesler, “QUAST: quality assessment tool for genome assemblies,” Bioinformatics , vol. 29, no. 8, pp. 1072–1075, Apr. 2013, doi: 10.1093/bioinformatics/btt086. R. R. Wick, M. B. Schultz, J. Zobel, and K. E. Holt, “Bandage: interactive visualization of de novo genome assemblies,” Bioinformatics , vol. 31, no. 20, pp. 3350–3352, Oct. 2015, doi: 10.1093/bioinformatics/btv383. O. Schwengers, L. Jelonek, M. A. Dieckmann, S. Beyvers, J. Blom, and A. Goesmann, “Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification: Find out more about Bakta, the motivation, challenges and applications, here.,” Microbial Genomics , vol. 7, no. 11, Nov. 2021, doi: 10.1099/mgen.0.000685. C. P. Cantalapiedra, A. Hernández-Plaza, I. Letunic, P. Bork, and J. Huerta-Cepas, “eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale,” Molecular Biology and Evolution , vol. 38, no. 12, pp. 5825–5829, Dec. 2021, doi: 10.1093/molbev/msab293. D. H. Parks, M. Imelfort, C. T. Skennerton, P. Hugenholtz, and G. W. Tyson, “CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes,” Genome Res. , vol. 25, no. 7, pp. 1043–1055, Jul. 2015, doi: 10.1101/gr.186072.114. A. Chklovski, D. H. Parks, B. J. Woodcroft, and G. W. Tyson, “CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning,” Nat Methods , vol. 20, no. 8, pp. 1203–1212, Aug. 2023, doi: 10.1038/s41592-023-01940-w. M. Shaffer et al. , “DRAM for distilling microbial metabolism to automate the curation of microbiome function,” Nucleic Acids Research , vol. 48, no. 16, pp. 8883–8900, Sep. 2020, doi: 10.1093/nar/gkaa621. F. A. Simão, R. M. Waterhouse, P. Ioannidis, E. V. Kriventseva, and E. M. Zdobnov, “BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs,” Bioinformatics , vol. 31, no. 19, pp. 3210–3212, Oct. 2015, doi: 10.1093/bioinformatics/btv351. C. Jain, L. M. Rodriguez-R, A. M. Phillippy, K. T. Konstantinidis, and S. Aluru, “High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries,” Nat Commun , vol. 9, no. 1, p. 5114, Nov. 2018, doi: 10.1038/s41467-018-07641-9. J. P. Meier-Kolthoff, A. F. Auch, H.-P. Klenk, and M. Göker, “Genome sequence-based species delimitation with confidence intervals and improved distance functions,” BMC Bioinformatics , vol. 14, no. 1, p. 60, Dec. 2013, doi: 10.1186/1471-2105-14-60. J. P. Meier-Kolthoff, J. S. Carbasse, R. L. Peinado-Olarte, and M. Göker, “TYGS and LPSN: a database tandem for fast and reliable genome-based classification and nomenclature of prokaryotes,” Nucleic Acids Research , vol. 50, no. D1, pp. D801–D807, Jan. 2022, doi: 10.1093/nar/gkab902. J. P. Meier-Kolthoff and M. Göker, “TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy,” Nat Commun , vol. 10, no. 1, p. 2182, May 2019, doi: 10.1038/s41467-019-10210-3. V. Khot et al. , “CANT-HYD: A Curated Database of Phylogeny-Derived Hidden Markov Models for Annotation of Marker Genes Involved in Hydrocarbon Degradation,” Front. Microbiol. , vol. 12, p. 764058, Jan. 2022, doi: 10.3389/fmicb.2021.764058. J. Rojas-Vargas, H. G. Castelán-Sánchez, and L. Pardo-López, “HADEG: A Curated Hydrocarbon Aerobic Degradation Enzymes and Genes Database,” Sep. 01, 2022, Bioinformatics . doi: 10.1101/2022.08.30.505856. S. Wang et al. , “HMDB: A curated database of genes involved in hydrocarbon monooxygenation reaction with homologous genes as background,” Journal of Hazardous Materials , vol. 460, p. 132397, Oct. 2023, doi: 10.1016/j.jhazmat.2023.132397. K. Blin et al. , “antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline,” Nucleic Acids Research , vol. 47, no. W1, pp. W81–W87, Jul. 2019, doi: 10.1093/nar/gkz310. M. Binazadeh, I. A. Karimi, and Z. Li, “Fast biodegradation of long chain n-alkanes and crude oil at high concentrations with Rhodococcus sp. Moj-3449,” Enzyme and Microbial Technology , vol. 45, no. 3, pp. 195–202, Sep. 2009, doi: 10.1016/j.enzmictec.2009.06.001. T. Kawagoe, K. Kubota, K. S. Araki, and M. Kubo, “Analysis of the Alkane Hydroxylase Gene and Long-Chain Cyclic Alkane Degradation in Rhodococcus,” AiM , vol. 09, no. 03, pp. 151–163, 2019, doi: 10.4236/aim.2019.93012. A. Krivoruchko, M. Kuyukina, T. Peshkur, C. J. Cunningham, and I. Ivshina, “Rhodococcus Strains from the Specialized Collection of Alkanotrophs for Biodegradation of Aromatic Compounds,” Molecules , vol. 28, no. 5, p. 2393, Mar. 2023, doi: 10.3390/molecules28052393. D. Cerqueda-García, J. Q. García-Maldonado, L. Aguirre-Macedo, and U. García-Cruz, “A succession of marine bacterial communities in batch reactor experiments during the degradation of five different petroleum types,” Marine Pollution Bulletin , vol. 150, p. 110775, Jan. 2020, doi: 10.1016/j.marpolbul.2019.110775. B. Z. Fathepure, “Recent studies in microbial degradation of petroleum hydrocarbons in hypersaline environments,” Front. Microbiol. , vol. 5, Apr. 2014, doi: 10.3389/fmicb.2014.00173. M. I. Roslund et al. , “Endocrine disruption and commensal bacteria alteration associated with gaseous and soil PAH contamination among daycare children,” Environment International , vol. 130, p. 104894, Sep. 2019, doi: 10.1016/j.envint.2019.06.004. Y. Delegan et al. , “Complete Genome Analysis of Rhodococcus opacus S8 Capable of Degrading Alkanes and Producing Biosurfactant Reveals Its Genetic Adaptation for Crude Oil Decomposition,” Microorganisms , vol. 10, no. 6, p. 1172, Jun. 2022, doi: 10.3390/microorganisms10061172. Additional Declarations No competing interests reported. Supplementary Files SupplementaryInformation1.docx SupplementaryInformation2.docx SupplementaryInformation3.xlsx Supplementaryinformation4.xlsx SupplementaryInformation5.docx SupplementaryInformation6.xlsx SupplementaryInformation7.docx SupplementaryInformation8.xlsx SupplementaryInformation10.docx SupplementaryInformation11.xlsx SupplementaryInformation12.xlsx SupplementaryInformation13.docx SupplementaryInformation14.docx SupplementaryInformation15.docx Cite Share Download PDF Status: Published Journal Publication published 02 Dec, 2025 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 01 Oct, 2025 Reviews received at journal 27 Sep, 2025 Reviews received at journal 27 Sep, 2025 Reviewers agreed at journal 18 Sep, 2025 Reviewers agreed at journal 17 Sep, 2025 Reviewers agreed at journal 17 Sep, 2025 Reviewers agreed at journal 17 Sep, 2025 Reviewers invited by journal 17 Sep, 2025 Editor assigned by journal 26 Mar, 2025 Submission checks completed at journal 26 Mar, 2025 First submitted to journal 26 Mar, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6309542","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":518233070,"identity":"e8bdf210-2294-47a0-b933-c737bf254c38","order_by":0,"name":"Syed Arshi Uz Zaman","email":"","orcid":"","institution":"Banasthali Vidyapith","correspondingAuthor":false,"prefix":"","firstName":"Syed","middleName":"Arshi Uz","lastName":"Zaman","suffix":""},{"id":518233071,"identity":"1ff3d92c-f816-4f79-9afb-fa09317a66ac","order_by":1,"name":"Khushboo Sharma","email":"","orcid":"","institution":"Eminent Biosciences","correspondingAuthor":false,"prefix":"","firstName":"Khushboo","middleName":"","lastName":"Sharma","suffix":""},{"id":518233072,"identity":"09144143-cd66-41da-893d-1d9a530ee1c0","order_by":2,"name":"Anuraj Nayarisseri","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3ElEQVRIiWNgGAWjYNACHhsGBmYw6wAPsVrSoFoSiNbCcBhKJxwgrJZ/do/Zhx8y5+V023kPPi78cUeGn4H54aMbeLRI3DljPLOH57ax2WG+ZOMZCc94JBvYjI1z8FlzI8eYgYfnduK2wzxm0jwJh3kMDvCwSePTIg/UwviH51w9UIv5b6K0GAC1MPPwHEgwA9rCTJQWwxtpxcwyPMmGQFuMpXnSgH5pJuAXuRvJmxnf9tjJm50/Y/iZx+aOPT9788PHeL0PAow9yDxmQsrB4AdRqkbBKBgFo2CkAgDI20Ue/bNPpwAAAABJRU5ErkJggg==","orcid":"","institution":"Eminent Biosciences","correspondingAuthor":true,"prefix":"","firstName":"Anuraj","middleName":"","lastName":"Nayarisseri","suffix":""},{"id":518233073,"identity":"a7c7d6ed-3a79-4442-8c42-d7ba39d4ea20","order_by":3,"name":"Kamal A. Khazanehdari","email":"","orcid":"","institution":"Molecular Biology \u0026 Genomics Centre, Zabeel 2","correspondingAuthor":false,"prefix":"","firstName":"Kamal","middleName":"A.","lastName":"Khazanehdari","suffix":""},{"id":518233074,"identity":"af9da999-824d-43f1-92a2-4ed366fb2190","order_by":4,"name":"Rajabrata Bhuyan","email":"","orcid":"","institution":"Banasthali Vidyapith","correspondingAuthor":false,"prefix":"","firstName":"Rajabrata","middleName":"","lastName":"Bhuyan","suffix":""}],"badges":[],"createdAt":"2025-03-26 07:08:31","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6309542/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6309542/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-28934-2","type":"published","date":"2025-12-02T15:57:12+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":92276362,"identity":"988ed124-dac6-4f26-a7b1-80023d8b0e16","added_by":"auto","created_at":"2025-09-26 15:30:29","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":3790947,"visible":true,"origin":"","legend":"","description":"","filename":"FinalManuscript26March2025.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/4d954a7870b2a574ff5ef618.docx"},{"id":92275981,"identity":"de76daa6-8122-4331-9543-65f4506450bb","added_by":"auto","created_at":"2025-09-26 15:30:01","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8523,"visible":true,"origin":"","legend":"","description":"","filename":"647376a1819243629ca172cdd112f765.json","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/074e71c747540e5dfa13fc96.json"},{"id":92277122,"identity":"0c55da09-9b1c-4fa6-b615-3ccd7bb06a72","added_by":"auto","created_at":"2025-09-26 15:38:11","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":5030624,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation1.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/5a14a47c973f5ff5973277cd.docx"},{"id":92275978,"identity":"0064f03b-d9e0-422b-b5de-2c819d07785d","added_by":"auto","created_at":"2025-09-26 15:29:58","extension":"docx","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":600086,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation10.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/964522e0c72c9edae950f010.docx"},{"id":92275988,"identity":"7bb79f95-95f9-4471-a2dc-292c17e4c107","added_by":"auto","created_at":"2025-09-26 15:30:02","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":37395,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation11.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/ece31ee486c55d731c838124.xlsx"},{"id":92276127,"identity":"c1032935-45b1-470a-9999-cbb14bc07534","added_by":"auto","created_at":"2025-09-26 15:30:07","extension":"xlsx","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":11796,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation12.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/d79d5ff27d10a96d50e8d7e8.xlsx"},{"id":92276223,"identity":"e68a3457-4491-4825-9ad7-75f1107bbc0f","added_by":"auto","created_at":"2025-09-26 15:30:17","extension":"docx","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":82015,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation13.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/f5f3cd04aedb289cc27e570d.docx"},{"id":92276257,"identity":"1a47068a-ba20-4be3-94a2-5b7448e2b85f","added_by":"auto","created_at":"2025-09-26 15:30:21","extension":"docx","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":127108,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation14.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/3494548ddd037773a8e7b594.docx"},{"id":92276008,"identity":"9c99b032-954a-4004-9721-ccdb83cb65b0","added_by":"auto","created_at":"2025-09-26 15:30:05","extension":"docx","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":128594,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation15.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/ac079c636ce096099f0b5022.docx"},{"id":92276227,"identity":"42b6dd74-0ade-4d64-bc77-932a089648e6","added_by":"auto","created_at":"2025-09-26 15:30:19","extension":"docx","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":101791,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation2.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/366e4d2a3768809bb76b5a65.docx"},{"id":92276209,"identity":"8608ffa1-ce95-4743-984d-a1bcdb431f60","added_by":"auto","created_at":"2025-09-26 15:30:14","extension":"xlsx","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":421484,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/f0a72391b232a11ecf581c61.xlsx"},{"id":92275980,"identity":"44a13acd-3b3b-45e3-bb94-d50760a5e413","added_by":"auto","created_at":"2025-09-26 15:30:01","extension":"docx","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":188863,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation5.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/4e023af625e8f110c50d64a9.docx"},{"id":92276253,"identity":"478133f8-71d4-44b0-8a8a-742c142274c1","added_by":"auto","created_at":"2025-09-26 15:30:20","extension":"xlsx","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":34116081,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation6.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/a377a2e8f8f26801e724877c.xlsx"},{"id":92275994,"identity":"3f2d6ff3-7431-44d3-863a-571bb816770b","added_by":"auto","created_at":"2025-09-26 15:30:03","extension":"docx","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":914296,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation7.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/66dd0604fd63d927e65f34ac.docx"},{"id":92276193,"identity":"23bcf65f-06b9-494a-9eda-e8854015330c","added_by":"auto","created_at":"2025-09-26 15:30:11","extension":"xlsx","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":357251,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation8.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/50f99d92893bade3eee69be6.xlsx"},{"id":92276192,"identity":"b4dadce3-99fe-4d98-b045-b13df25f793d","added_by":"auto","created_at":"2025-09-26 15:30:11","extension":"xlsx","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1239882,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation9.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/945e3d1cde0ac38f55b269c3.xlsx"},{"id":92276221,"identity":"d3cd270b-5ab1-4bd7-ba6a-2b2db9b7ea31","added_by":"auto","created_at":"2025-09-26 15:30:16","extension":"xlsx","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":906055,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryinformation4.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/4b3ed8d3748b6e5c4198bfe8.xlsx"},{"id":92276203,"identity":"e31c4628-a4cc-4c3c-b61b-fee3c507bd30","added_by":"auto","created_at":"2025-09-26 15:30:13","extension":"xml","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":249620,"visible":true,"origin":"","legend":"","description":"","filename":"647376a1819243629ca172cdd112f7651enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/b49bd7ec315ed866a9b6eebd.xml"},{"id":92275979,"identity":"61ac590a-b142-4563-aa95-76859d9149a6","added_by":"auto","created_at":"2025-09-26 15:29:58","extension":"png","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":851854,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/581c507174b8e8cce8250d7e.png"},{"id":92276162,"identity":"06cbf5b9-075a-4e07-b82e-9ed86c81ce08","added_by":"auto","created_at":"2025-09-26 15:30:08","extension":"png","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":50588,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/247a361b48ed394f91beacec.png"},{"id":92276189,"identity":"198752bf-7690-4bdf-b3e2-1a5909874e82","added_by":"auto","created_at":"2025-09-26 15:30:11","extension":"png","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":156558,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/73371a3b85783139fce3eeba.png"},{"id":92275963,"identity":"9d49ec6c-6088-4496-9b50-692428fff931","added_by":"auto","created_at":"2025-09-26 15:29:53","extension":"jpeg","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":495813,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage12.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/74f77f608fec95c443dabc81.jpeg"},{"id":92275969,"identity":"56f481db-d662-46d2-baa5-a1e471089442","added_by":"auto","created_at":"2025-09-26 15:29:54","extension":"png","order_by":22,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52989,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage13.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/43d92b8545ed4d529a112990.png"},{"id":92275992,"identity":"db4e7ead-8053-4906-9671-86894b6a5e4d","added_by":"auto","created_at":"2025-09-26 15:30:03","extension":"png","order_by":23,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":157473,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage14.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/30aab2df99301b065af37235.png"},{"id":92276198,"identity":"73b6920b-586d-4964-adcd-9b7a07e01306","added_by":"auto","created_at":"2025-09-26 15:30:12","extension":"png","order_by":24,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":64194,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage15.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/b4e072208ab3646af2bef989.png"},{"id":92275965,"identity":"99b6e316-1d45-4d5b-ad4d-36ea1cb2d8a6","added_by":"auto","created_at":"2025-09-26 15:29:54","extension":"png","order_by":25,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":139634,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage16.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/07bf19bb50e1ef89d0b5a3db.png"},{"id":92277127,"identity":"bef347c0-3d35-4653-aa35-86e697d04d75","added_by":"auto","created_at":"2025-09-26 15:38:15","extension":"png","order_by":26,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":135795,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage17.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/7a02a4e3b8c130f7ff35dbc8.png"},{"id":92276266,"identity":"d9eed101-943e-4c33-8a5d-1630b71c1335","added_by":"auto","created_at":"2025-09-26 15:30:25","extension":"png","order_by":27,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":219309,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage18.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/25859a11a5c8b407063cbefb.png"},{"id":92276303,"identity":"90e219b5-fbf8-46dc-936d-02e0bfcb0db9","added_by":"auto","created_at":"2025-09-26 15:30:27","extension":"jpeg","order_by":28,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":169149,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/989e6967b1d7d15abecd9144.jpeg"},{"id":92276120,"identity":"47799c1e-138c-4aee-a724-688064ff5f68","added_by":"auto","created_at":"2025-09-26 15:30:06","extension":"png","order_by":29,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":107952,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/483f90279b4ec6bdb6a9f971.png"},{"id":92276219,"identity":"161b819c-f0e9-4374-8a5a-5252d84fe59e","added_by":"auto","created_at":"2025-09-26 15:30:16","extension":"png","order_by":30,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":124137,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/b27594d4aebeca85e9e85501.png"},{"id":92276252,"identity":"23173c1e-17ca-4ea9-9dff-69a35e9c8c5f","added_by":"auto","created_at":"2025-09-26 15:30:20","extension":"png","order_by":31,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":158889,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/96ee1c02c8842b9e3e2c37fa.png"},{"id":92276117,"identity":"5421cdff-2e24-48e7-9db6-e83fb5f03719","added_by":"auto","created_at":"2025-09-26 15:30:06","extension":"png","order_by":32,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":68014,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/d5785cf84ed3622d25e4ae88.png"},{"id":92276261,"identity":"3e47c3ce-125c-4217-954f-b99cae35b910","added_by":"auto","created_at":"2025-09-26 15:30:23","extension":"png","order_by":33,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":598440,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/dc6d12aa65ab20d357911e35.png"},{"id":92276201,"identity":"de5183cc-899f-4823-bca2-2d3642ec4da4","added_by":"auto","created_at":"2025-09-26 15:30:12","extension":"png","order_by":34,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":23238,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/016dda5ab437222786f9b3be.png"},{"id":92277128,"identity":"fc8db645-affb-49a3-a6c4-499a2e8301d7","added_by":"auto","created_at":"2025-09-26 15:38:16","extension":"png","order_by":35,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":149530,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/00e89dff86e68ceea971a579.png"},{"id":92276263,"identity":"a48bc8f3-76f4-4617-a489-641437cb3113","added_by":"auto","created_at":"2025-09-26 15:30:24","extension":"png","order_by":36,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":116866,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/05b759e3dabed8bd06aa15e8.png"},{"id":92276202,"identity":"7600bcee-50a8-4c81-a728-a5cc13b9756d","added_by":"auto","created_at":"2025-09-26 15:30:13","extension":"png","order_by":37,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":10410,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage10.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/33bd50587b2c8b6f7e7436ca.png"},{"id":92277139,"identity":"f2a25233-e65e-49ca-a051-4660fda3a089","added_by":"auto","created_at":"2025-09-26 15:38:24","extension":"png","order_by":38,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":43119,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage11.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/a57085a2a994dbb2184e888e.png"},{"id":92277119,"identity":"62fa519c-da0b-4c9a-bb86-bacdc2b512e8","added_by":"auto","created_at":"2025-09-26 15:38:03","extension":"png","order_by":39,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":159072,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage12.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/e83e37d03f85da6a6af1ec59.png"},{"id":92276217,"identity":"c7ac3410-2bdf-46dd-893e-b45299639400","added_by":"auto","created_at":"2025-09-26 15:30:16","extension":"png","order_by":40,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":11673,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage13.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/36081ba59996558818e0db47.png"},{"id":92276385,"identity":"0a43a13a-034c-459c-9b0c-b59a7530d5a6","added_by":"auto","created_at":"2025-09-26 15:30:35","extension":"png","order_by":41,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":32237,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage14.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/7bfbf9d245f75ba730b24c5a.png"},{"id":92276002,"identity":"dc03d21a-032d-4942-ae62-62d37f00c582","added_by":"auto","created_at":"2025-09-26 15:30:04","extension":"png","order_by":42,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":14407,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage15.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/1b8425192022c81b09dd54cc.png"},{"id":92276265,"identity":"2549d6ad-007a-4e64-a98d-e36c3fd8871b","added_by":"auto","created_at":"2025-09-26 15:30:25","extension":"png","order_by":43,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":34631,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage16.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/a2d496ce2b8e77bd33a4f909.png"},{"id":92275998,"identity":"20800ebc-6981-4651-ba2f-3fac6bf117e4","added_by":"auto","created_at":"2025-09-26 15:30:04","extension":"png","order_by":44,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":27450,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage17.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/0732b97185251a6bbff9690b.png"},{"id":92276254,"identity":"f56b4af1-f3ba-41c9-a98d-729aefa1e717","added_by":"auto","created_at":"2025-09-26 15:30:20","extension":"png","order_by":45,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":65843,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage18.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/daf686e5e18404987d63026e.png"},{"id":92276196,"identity":"bcb14f20-6e12-4642-9f28-c7c88a51026c","added_by":"auto","created_at":"2025-09-26 15:30:12","extension":"png","order_by":46,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":18514,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/52a5304d1b46f0f4241320a0.png"},{"id":92277121,"identity":"68e5d2c9-5f53-43ea-a3ac-bc8554ab12dc","added_by":"auto","created_at":"2025-09-26 15:38:06","extension":"png","order_by":47,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":85083,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/c28bed54f12fabed81d2e64a.png"},{"id":92275970,"identity":"df4abdc8-f06d-42e1-931a-dfdc923f1ba5","added_by":"auto","created_at":"2025-09-26 15:29:55","extension":"png","order_by":48,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":9735,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/b2ef867e233ed3e5cf9c788d.png"},{"id":92276206,"identity":"804123f4-6783-437b-9a88-02b602f8f5c8","added_by":"auto","created_at":"2025-09-26 15:30:13","extension":"png","order_by":49,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":41670,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/fc60567106b14afa4efee74e.png"},{"id":92276215,"identity":"431d7bac-0c72-4818-b659-60705bdacd1b","added_by":"auto","created_at":"2025-09-26 15:30:15","extension":"png","order_by":50,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":16291,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/54c0a80637809411b5759d46.png"},{"id":92275999,"identity":"ced71bda-7619-42e5-8192-1943402f4afb","added_by":"auto","created_at":"2025-09-26 15:30:04","extension":"png","order_by":51,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":77507,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/424b8b2a3cf35179fd076b65.png"},{"id":92276370,"identity":"98b6313d-a4c6-462a-9d67-9d1fdc3b8842","added_by":"auto","created_at":"2025-09-26 15:30:29","extension":"png","order_by":52,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":7187,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/ac4dbc5ceb9c0f94932e2972.png"},{"id":92276216,"identity":"da9d0aaa-456f-4834-b260-c062fa17bcf7","added_by":"auto","created_at":"2025-09-26 15:30:16","extension":"png","order_by":53,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":48320,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage9.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/9f6e569881c8ba85857ca7fa.png"},{"id":92276121,"identity":"5d351af1-3f00-4d59-9ee0-d79f13a898c9","added_by":"auto","created_at":"2025-09-26 15:30:06","extension":"xml","order_by":54,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":248536,"visible":true,"origin":"","legend":"","description":"","filename":"647376a1819243629ca172cdd112f7651structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/12c7f534a94325469e994296.xml"},{"id":92276208,"identity":"2f530f0b-9116-4fab-a960-36ae4399cea0","added_by":"auto","created_at":"2025-09-26 15:30:14","extension":"html","order_by":55,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":275420,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/13d04b058c24efdb44e3567c.html"},{"id":92275996,"identity":"60de7682-ca9b-4717-8d8b-3c345e59c8c0","added_by":"auto","created_at":"2025-09-26 15:30:04","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":282336,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePetroleum Pollution: Isolation of novel petroleum degrading bacteria using Hybrid Sequencing.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/afbdbd80919b5771c55a730e.png"},{"id":92275984,"identity":"1ac37bd6-8457-4f74-95d6-f86d5d2f27ac","added_by":"auto","created_at":"2025-09-26 15:30:01","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":105019,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCrude Oil degradation by SARSHI1\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/20f8ffa18d3f73dc07eb0e3e.png"},{"id":92275961,"identity":"06930715-b94e-462c-b381-48799832b077","added_by":"auto","created_at":"2025-09-26 15:29:52","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":303755,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePhylogenetic analysis of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eRhodococcus indonesiensis\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e SARSHI1 against other species of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eRhodococcus\u003c/strong\u003e\u003c/em\u003e\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/3c71eb3e83d70b325acc5c68.png"},{"id":92275966,"identity":"d2bf9151-302c-405d-9d5e-9b5ce210d13f","added_by":"auto","created_at":"2025-09-26 15:29:54","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":113533,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHill Plot for SARSHI1\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/892dfb3b0af32dc443abcbcf.png"},{"id":92275960,"identity":"4c1f4cd8-9045-49f4-9576-6186a88be111","added_by":"auto","created_at":"2025-09-26 15:29:51","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":89692,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eLibrary Profile of sample SARSHI1 on Agilent TapeStation\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/d72682df5b00830e26747cf7.png"},{"id":92275972,"identity":"f1b10a1d-820e-4a9b-a561-6b9ddbbe447e","added_by":"auto","created_at":"2025-09-26 15:29:56","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":112589,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenome Sequencing and Annotation Workflow\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/0cdd0b53979e524ddb191551.png"},{"id":92276258,"identity":"5d2caa70-5e6e-435d-b19a-31a54dcb217f","added_by":"auto","created_at":"2025-09-26 15:30:22","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":294647,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenome Map of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eRhodococcus indonesiensis \u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003eSARSHI1\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/7fe3d1377fac92c8d610bfcc.png"},{"id":92276228,"identity":"f583e47a-dd57-48c3-ba4f-4a21bd886289","added_by":"auto","created_at":"2025-09-26 15:30:19","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":159708,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA CheckM- bin_qa Plot\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eB CheckM- Ref_dist Plot\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/849fe63c19b028989f2ce9a0.png"},{"id":92275977,"identity":"d24ec1c0-c8de-4efc-9c91-cb4bc1b6f4a9","added_by":"auto","created_at":"2025-09-26 15:29:57","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":59356,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eBUSCO Analysis\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/45994c20c21b3ea2253e1020.png"},{"id":92277134,"identity":"44de549e-e9bb-4c88-93e0-3d03485c4201","added_by":"auto","created_at":"2025-09-26 15:38:20","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":366404,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA DRAM Metabolic Annotations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eB DRAM SubPathway Analysis\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/78d61c72d4311364142f15bb.png"},{"id":92276226,"identity":"902d2b80-7484-464d-9755-42458c3c0724","added_by":"auto","created_at":"2025-09-26 15:30:19","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":61460,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenome Tree of \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eRhodococcus indonesiensis \u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003eSARSHI1\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"11.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/814517bb9cc73d769ba3c392.png"},{"id":92276122,"identity":"9e5105f6-ea00-4677-9551-a7d88cd13eb8","added_by":"auto","created_at":"2025-09-26 15:30:06","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":252787,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHAMAP-based functional annotation\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"12.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/0c23680f6ffecf3d12e519d0.png"},{"id":92276213,"identity":"bcbf45be-1d1d-4eb0-87c2-6ae5ea9ef415","added_by":"auto","created_at":"2025-09-26 15:30:15","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":73178,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRepeats Distribution for SARSHI1\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"13.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/bd887ca4a20c4e839de76e55.png"},{"id":92277120,"identity":"11464092-5c35-4332-9faf-97cf78429445","added_by":"auto","created_at":"2025-09-26 15:38:04","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":43547,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSubPathway Analysis: Bubble Plot\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"14.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/04feeb363cb68176c3ff86ec.png"},{"id":92276210,"identity":"50770ce3-7f4e-428e-9e30-18f8fc7c73ad","added_by":"auto","created_at":"2025-09-26 15:30:14","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":179286,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAerobic Biodegradation of Aromatic and Aliphatic Hydrocarbons in \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eRhodococcus indonesiensis \u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003eSARSHI1\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"15.png","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/f50778d7e1de55f7181fd769.png"},{"id":97724834,"identity":"149eb66a-7e9a-48fe-9ede-9b3ec7408e55","added_by":"auto","created_at":"2025-12-08 16:13:44","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":4933940,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/c805f023-f559-4cbc-af5c-8ff8bf73fb82.pdf"},{"id":92276233,"identity":"33e1bf01-9493-4bbb-b97c-582f1ee0e133","added_by":"auto","created_at":"2025-09-26 15:30:20","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":5030624,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation1.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/b6a00c89decb497df57ec647.docx"},{"id":92276156,"identity":"c26bb567-5107-4a57-9c29-9036eb6c31c7","added_by":"auto","created_at":"2025-09-26 15:30:07","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":101791,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation2.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/81c38d3ff8a6f2e92fde3d1d.docx"},{"id":92276160,"identity":"47f09d99-ddff-4715-ae7f-16e3d684b00f","added_by":"auto","created_at":"2025-09-26 15:30:08","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":421484,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/63776467659ce76922327152.xlsx"},{"id":92276304,"identity":"2ef05108-f686-4b4c-a064-52cec3da0d30","added_by":"auto","created_at":"2025-09-26 15:30:28","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":906055,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaryinformation4.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/53a6c88ff04f61d3a70cc7ca.xlsx"},{"id":92276186,"identity":"a17126c5-c3f0-47e5-ace1-ab2725abff23","added_by":"auto","created_at":"2025-09-26 15:30:11","extension":"docx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":188863,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation5.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/4dbeb897dafba524c38bb612.docx"},{"id":92276222,"identity":"1132393a-76dc-4b05-8746-dc8ed3278907","added_by":"auto","created_at":"2025-09-26 15:30:17","extension":"xlsx","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":34116081,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation6.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/a0cb53bb7ae18dc5c6e50cf6.xlsx"},{"id":92276159,"identity":"6e1a90df-7705-4f60-8cbf-76e8be222558","added_by":"auto","created_at":"2025-09-26 15:30:08","extension":"docx","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":914296,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation7.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/314b777acdf38655f18d509c.docx"},{"id":92275990,"identity":"97d664c4-5716-402a-88fc-a91b30b89ca8","added_by":"auto","created_at":"2025-09-26 15:30:02","extension":"xlsx","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":357251,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation8.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/2e64e70fdad8076fcf07c864.xlsx"},{"id":92276384,"identity":"abc57221-e949-4d25-a79d-9d3ae18d0520","added_by":"auto","created_at":"2025-09-26 15:30:35","extension":"docx","order_by":10,"title":"","display":"","copyAsset":false,"role":"supplement","size":600086,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation10.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/1124e6c990af3dd5eebd88b3.docx"},{"id":92276218,"identity":"27a4f144-7b3c-40e2-b126-512063778d4a","added_by":"auto","created_at":"2025-09-26 15:30:16","extension":"xlsx","order_by":11,"title":"","display":"","copyAsset":false,"role":"supplement","size":37395,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation11.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/ba5135efa4fbd633eb403f2b.xlsx"},{"id":92276231,"identity":"c40fbeff-2a56-4759-a17c-f3538fce7c60","added_by":"auto","created_at":"2025-09-26 15:30:20","extension":"xlsx","order_by":12,"title":"","display":"","copyAsset":false,"role":"supplement","size":11796,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation12.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/45c38283bdc849d3949787af.xlsx"},{"id":92276194,"identity":"b29cb999-1150-4b61-b4b4-a5e8d76c4ad6","added_by":"auto","created_at":"2025-09-26 15:30:12","extension":"docx","order_by":13,"title":"","display":"","copyAsset":false,"role":"supplement","size":82015,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation13.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/ad67abeca238f3848532b899.docx"},{"id":92276262,"identity":"ab18bf2d-a494-443a-80a4-195cab7cadab","added_by":"auto","created_at":"2025-09-26 15:30:23","extension":"docx","order_by":14,"title":"","display":"","copyAsset":false,"role":"supplement","size":127108,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation14.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/4922f21c7727c94779fbbdb8.docx"},{"id":92276383,"identity":"9bf54f7c-fbb5-47ba-8d9f-a084e759a8cd","added_by":"auto","created_at":"2025-09-26 15:30:34","extension":"docx","order_by":15,"title":"","display":"","copyAsset":false,"role":"supplement","size":128594,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation15.docx","url":"https://assets-eu.researchsquare.com/files/rs-6309542/v1/a1463c3ab17da927f6dc038e.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Whole Genome of Petroleum Hydrocarbon Degrading Rhodococcus indonesiensis isolated from Nacharam, Hyderabad, India","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003ePetroleum-derived products, including fuels and petrochemicals, are indispensable to modern economics and daily life. Since the 19th century, global crude oil consumption has surged, with projections indicating a demand of 121.5\u0026nbsp;million barrels per day (BPD) by 2050 [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. India, a major consumer, recorded 137.6\u0026nbsp;million metric tonnes of petroleum consumption between April and October 2023, reflecting a 3% year-on-year increase [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. However, petroleum extraction, refining, and transportation pose significant environmental and health risks, particularly due to accidental spills and industrial emissions. In India, petroleum refineries are major contributors to air, soil, and water pollution, exacerbating greenhouse gas emissions and environmental degradation[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. The persistence of petroleum hydrocarbons (PHs) in ecosystems leads to extensive ecological disruptions[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. In aquatic environments, PHs integrate into sediments, bioaccumulate in organisms, and disrupt trophic interactions, ultimately compromising the food chain[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e][\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. In terrestrial ecosystems, PH contamination at high concentrations (\u0026gt;\u0026thinsp;20,000 mg/kg) significantly reduces microbial diversity, impairing essential biogeochemical cycles. Conversely, moderate contamination (4,000\u0026ndash;20,000 mg/kg) can enhance microbial species coexistence[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. The toxicological impact of PH exposure in humans varies with concentration, duration, and route of exposure. Chronic exposure has been linked to carcinogenicity, immunosuppression, and developmental toxicity, while acute exposure can lead to dermatological reactions, respiratory distress, and ocular infections[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. A study conducted on Indian petroleum workers in 2013 reported significant DNA damage even at low exposure levels, with benzene and its metabolites identified as key contributors [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eConventional petroleum remediation strategies, including physical methods such as skimming and chemical dispersants, are often inefficient for large-scale spills and may induce secondary ecological disturbances [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e][\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Bioremediation is a cost-effective, sustainable approach that leverages microbial metabolic pathways to degrade hydrocarbons into non-toxic byproducts. However, the degradation is influenced by various environmental factors, including temperature, pH, oxygen availability, nutrient concentrations, and hydrocarbon bioavailability. Among microbial degraders, bacteria exhibit exceptional adaptability and metabolic versatility, making them prime candidates for bioremediation[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Notable genera include \u003cem\u003ePseudomonas\u003c/em\u003e, \u003cem\u003eBurkholderia\u003c/em\u003e, \u003cem\u003eGordonia\u003c/em\u003e, and \u003cem\u003eRhodococcus\u003c/em\u003e, all of which utilize hydrocarbons as sole carbon and energy sources[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. For instance, \u003cem\u003eRhodococcus\u003c/em\u003e strains isolated from Antarctic oil-contaminated soils have demonstrated hydrocarbon metabolism at sub-zero temperatures (-2\u0026deg;C)[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Likewise, recent research has identified three petroleum-degrading bacterial species\u0026mdash;\u003cem\u003eBacillus thuringiensis, Bacillus pumilus\u003c/em\u003e, and \u003cem\u003eRhodococcus hoagie\u003c/em\u003eisolated from the rhizosphere of \u003cem\u003ePanicum aquaticum Poir.\u003c/em\u003e, a plant that thrives in oil-contaminated soils[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e\u003cp\u003e\u003cem\u003eRhodococcus\u003c/em\u003e species have garnered significant interest due to their metabolic plasticity and resilience under extreme environmental conditions, such as high salinity, low temperatures, and nutrient scarcity[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. These bacteria exhibit an extraordinary ability to degrade a wide spectrum of petroleum hydrocarbons and xenobiotic compounds[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. However, despite their recognized potential, genomic insights into the enzymatic pathways and regulatory networks governing hydrocarbon degradation remain limited. Recent advancements in whole-genome sequencing (WGS) have facilitated the identification of key genetic determinants and metabolic pathways involved in hydrocarbon degradation, enhancing bacterial applications for large-scale environmental remediation[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. WGS enables comprehensive genomic analysis, pinpointing critical genetic elements and protein-coding sequences involved in petroleum biodegradation [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e][\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. However, traditional next-generation sequencing (NGS), relying primarily on short-read sequencing, encounters limitations in resolving complex genomic architectures, particularly repetitive sequences, structural variations, and large insertions often present in bacterial genomes [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eHybrid sequencing, which integrates short-read and long-read sequencing technologies, effectively addresses these challenges. While short-read sequencing provides high precision in detecting small-scale genetic variations, long-read sequencing resolves large genomic rearrangements, repetitive regions, and complex structural variations[\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. This synergistic approach offers an unprecedentedly detailed view of bacterial genomes, facilitating the identification of novel genes, enzymatic pathways, and functional networks essential for hydrocarbon metabolism [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. This approach provides insights into the metabolic networks and regulatory pathways that enable bacterial resilience in polluted environments[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eIn the present study, we have isolated a novel petroleum-degrading bacterial strain and performed an in-depth genomic characterization using hybrid sequencing [Figure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e]. By integrating advanced sequencing platforms, we achieved high-resolution insights into the strain\u0026rsquo;s genetic framework, elucidating the metabolic pathways and enzymatic systems involved in hydrocarbon degradation. These findings provide a deeper understanding of the molecular mechanisms underpinning petroleum biodegradation and underscore the potential of the strain for application in environmental bioremediation strategies. Our research highlights the transformative role of genomic approaches in discovering novel bacterial resources for sustainable and eco-friendly remediation of petroleum-contaminated environments.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e"},{"header":"2. Methodology","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e2.1 Sample Collection and Isolation of Petroleum Hydrocarbon Degrading Bacteria\u003c/h2\u003e\u003cp\u003eOil-contaminated soil samples were obtained from polluted sites near service and gas stations in Nacharam, Hyderabad, by excavating surface layers at a depth of 10\u0026ndash;12 inches. The collected samples were homogenized and stored at 4\u0026deg;C before enrichment in nutrient agar (NA) containing crude oil. The soil samples were inoculated into minimal salt medium (MSM) supplemented with 2% (w/v) crude oil, while an additional flask containing nutrient media with crude oil served as a control. The flasks were incubated at 30\u0026deg;C for 24 hours, and bacterial growth was monitored by assessing turbidity. All experiments were performed in triplicate. The cultures were serially diluted and spread onto MSM plates containing 2% (w/v) crude oil to isolate petroleum-degrading bacterial strains. The plates were incubated at 30\u0026deg;C for 24\u0026ndash;48 hours[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e][\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. Among the isolates, only SARSHI1 exhibited significant hydrocarbon degradation. The characterization of SARSHI1 was performed following Bergey\u0026rsquo;s Manual of Determinative Bacteriology[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e][\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. Biochemical and microbial analyses included motility assays, Gram staining, and metabolic tests such as indole, Methyl Red (MR), and Voges-Proskauer (VP) assays to evaluate glucose oxidation and non-acidic end-product formation[\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e][\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eThe biodegradation efficiency of SARSHI1 was further assessed using crude oil concentrations of 5%, 10%, and 15% (w/v) over a seven-day incubation period. The experiment was performed in replicates. Following incubation, cultures were centrifuged at 3,000\u0026ndash;5,000 rpm for 5 minutes to pellet bacterial cells. The supernatant was discarded, and the pellet was washed with 0.9% physiological saline. Bacterial growth was quantified spectrophotometrically at OD600[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e][\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e][\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e2.2 Genomic DNA Extraction, Qualitative and Quantitative Analysis\u003c/h2\u003e\u003cp\u003eGenomic DNA was extracted from a pure bacterial culture of isolate SARSHI1 using the QIAamp DNA Mini Kit (Qiagen), following the manufacturer\u0026rsquo;s protocol. Further, the quality and quantity of the genomic DNA were assessed to ensure integrity and purity. The purity of the DNA was evaluated by determining the A260/280 ratio using a NanoDrop spectrophotometer (ThermoFisher Scientific). To further verify the integrity of the genomic DNA, a portion of the extracted DNA was resolved on a 1% agarose gel with a ladder[\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. The absence of smearing or degradation bands on the gel confirmed the integrity of the extracted genomic DNA, confirming its suitability for downstream analysis[\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e][\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003e2.3 Molecular characterization using 16S rRNA Sequencing\u003c/h2\u003e\u003cp\u003eThe isolated DNA was amplified using universal 16S primers, with the forward primer sequence GGATGAGCCCGCGGCCTA and the reverse primer sequence CGGTGTGTACAAGGCCCGG. The PCR mix contained template DNA, primers, Taq DNA polymerase, 10X buffer, MgCl2, and dNTPs. The PCR conditions are initial denaturation at 95\u0026deg;C for 5 minutes, followed by 30 cycles of denaturation at 95\u0026deg;C for 40 seconds, primer annealing at 65\u0026deg;C for 1 minute, extension at 72\u0026deg;C for 2 minutes, and final extension step at 65\u0026deg;C for 1 minute[\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]. A single discrete PCR amplicon band was observed on agarose gel electrophoresis [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e][\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. The amplified DNA fragments were purified using the GeneJet Gel Extraction PCR Purification Kit (\u003cem\u003eGeneJet Gel Extraction Kit\u003c/em\u003e, 2015) [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. The PCR amplicon was sequenced using Sanger dideoxy sequencing. The forward and reverse chromatogram files were assembled and analyzed with DNA Baser v- 5.15(\u003cem\u003eDNA Baser v5.15\u003c/em\u003e, 2022) [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. The sequence nucleotide composition, molecular weight, and GC content were determined using EMBOSS software[\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e][\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\u003ch2\u003e2.4 Phylogenetic Analysis and rRNA Structure Prediction\u003c/h2\u003e\u003cp\u003eThe top 20 sequences with over 95% similarity to the query sequence were retrieved in FASTA format for the phylogenetic analysis. These sequences were imported into MEGA XI and aligned using the MUSCLE algorithm with up to 16 iterations and UPGMA clustering[\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e][\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. The alignment was exported in MEGA format, and a phylogenetic tree was generated using the neighbor-joining method with 1,000 bootstrap replicates. The evolutionary relationships were evaluated using the Maximum Composite Likelihood model[\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e][\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e]. The rRNA secondary structure was predicted using UNAfold. The Mfold server was incorporated to predict minimum free energy (MFE) structure based on the nearest-neighbor model that provides insight into the thermodynamic stability of the predicted secondary structures[\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e][\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e2.5 Preparation of 2X150 WGS Library\u003c/h2\u003e\u003cp\u003eThe paired-end sequencing library was prepared from the extracted DNA using the Illumina TruSeq Nano DNA Library Prep Kit (TruSeq, DNA Library Prep Kits, 2012). 100 ng of DNA was fragmented using the Covaris M220 system, resulting in an average fragment size of 350 bp. Covaris shearing generates double-stranded DNA fragments with 3\u0026prime; or 5\u0026prime; overhangs, subjected to end-repair to produce blunt ends. The repaired DNA fragments were ligated with adapters, followed by size selection using AMPure XP beads to ensure uniformity. The size-selected fragments were PCR-amplified using an index-specific primers kit, integrating indexing adapters. These adapters facilitated the hybridization of DNA fragments onto the flow cell for sequencing. The PCR-enriched library was assessed for quality and fragment distribution using the Agilent 4200 TapeStation system with High Sensitivity D1000 Screen Tape, following the manufacturer\u0026rsquo;s protocol[\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003e2.6 Cluster Generation and Sequencing\u003c/h2\u003e\u003cp\u003eThe paired-end (PE) Illumina sequencing library was quantified using Qubit, assessed for fragment size with the Agilent TapeStation, and then loaded onto the NovaSeq 6000 for cluster generation and high-throughput sequencing. PE sequencing captures DNA fragments in both directions, enhancing accuracy and facilitating the detection of structural variants and repeats. Cluster generation involves hybridizing the prepared library to adapter-bound oligos on the flow cell, enabling selective forward-strand cleavage after reverse-strand synthesis. This process ensures bidirectional sequencing with high-quality data and improved genome assembly [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e][\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003e2.7 Preparation of Nanopore Library\u003c/h2\u003e\u003cp\u003eThe Nanopore sequencing library was prepared from the quality-controlled genomic DNA sample using the Ligation Sequencing-gDNA Native Barcoding Kit. Initially, 800 ng of the DNA sample was subjected to DNA repair, followed by purification using AMPureXP beads and adaptor ligation. The samples were pooled according to the kit protocol. The pooled library was further purified using AMPureXP beads to maintain high-quality preparation. The concentration of the purified pooled library was quantified using a Qubit fluorometer. Finally, the purified library was loaded onto an Oxford Nanopore PromethION P2 Solo flow cell for sequencing [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e]. Nanopore sequencing facilitates the real-time selective sequencing of single DNA molecules by dynamically reversing the voltage across individual nanopores.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\u003ch2\u003e2.8 Genome Assembly andQuality Control\u003c/h2\u003e\u003cp\u003eThe Raw sequencing reads from Illumina and Oxford Nanopore platforms underwent rigorous quality control and preprocessing to ensure high-fidelity assembly. Quality control of Illumina paired-end reads was conducted using FastQC (v0.11.9) (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.bioinformatics.babraham.ac.uk/projects/fastqc/\u003c/span\u003e\u003cspan address=\"http://www.bioinformatics.babraham.ac.uk/projects/fastqc/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), with reads achieving a Phred quality score\u0026thinsp;\u0026gt;\u0026thinsp;30 prioritized for downstream analysis[\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]. The Oxford Nanopore reads were evaluated by read length distribution and per-base quality scores. The low-quality bases were trimmed and adapters were removed using Trimmomatic v0.39 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/timflutre/trimmomatic\u003c/span\u003e\u003cspan address=\"https://github.com/timflutre/trimmomatic\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e]. A sliding window approach (window size\u0026thinsp;=\u0026thinsp;4, minimum average quality score\u0026thinsp;=\u0026thinsp;20) was applied to refine read quality. The bases at the ends of reads with quality scores below 3 were removed, and reads shorter than 36 bp were discarded to ensure high-quality data. The error correction was performed using SPAdesv3.15(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://ablab.github.io/spades/\u003c/span\u003e\u003cspan address=\"https://ablab.github.io/spades/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), employing a k-mer-based strategy (k\u0026thinsp;=\u0026thinsp;21\u0026ndash;51)[\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]. The Oxford nanopore reads were filtered using Filtlong(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/rrwick/Filtlong),retainin\u003c/span\u003e\u003cspan address=\"https://github.com/rrwick/Filtlong),retainin\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003eg only reads exceeding 1,000 bp with a mean quality score of at least 10[\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e], followed by adapter removal using Porechop(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/rrwick/Porechop\u003c/span\u003e\u003cspan address=\"https://github.com/rrwick/Porechop\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e)[\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eHybrid genome assembly was performed using Unicycler (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/rrwick/Unicycler\u003c/span\u003e\u003cspan address=\"https://github.com/rrwick/Unicycler\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) in \u003cem\u003ebold\u003c/em\u003e mode (--mode bold) to maximize scaffold continuity. Short-read error correction was conducted with SPAdes (k\u0026thinsp;=\u0026thinsp;21, 33, 55), while long reads were assembled using miniasm, followed by two rounds of polishing with Racon. A depth filter (--depth_filter 2.5\u0026times; median depth) was applied to exclude low-confidence contigs, and sequences shorter than 1,000 bp were removed to retain only high-quality contigs. Structural variations and repeat regions were resolved through automatic bridging (--bridging auto), leveraging long reads to enhance assembly completeness[\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e]. Genome assembly quality was assessed using QUAST-5.3(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ablab/quast\u003c/span\u003e\u003cspan address=\"https://github.com/ablab/quast\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), evaluating key metrics such as N50, total genome size, and misassembly rates[\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e]. Contiguity was determined based on total assembly length and N50, while genome completeness was quantified by the genome fraction representing reference-mapped base coverage. Additionally, genome circularity and structural coherence were examined using Bandage(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/rrwick/Bandage\u003c/span\u003e\u003cspan address=\"https://github.com/rrwick/Bandage\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), by employing hierarchical layouts to detect potential assembly discrepancies[\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003e2.9 Functional Annotation\u003c/h2\u003e\u003cp\u003eComprehensive genome annotation was performed using integrated bioinformatics tools to ensure high-resolution structural and functional characterization. The genome was initially annotated using BAKTA v1.10.3 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/oschwengers/bakta\u003c/span\u003e\u003cspan address=\"https://github.com/oschwengers/bakta\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e], followed by NCBI Prokaryotic Genome Annotation Pipeline (PGAP) (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ncbi/pgap\u003c/span\u003e\u003cspan address=\"https://github.com/ncbi/pgap\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) and Blast2GO(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.blast2go.com/\u003c/span\u003e\u003cspan address=\"https://www.blast2go.com/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). PGAP annotation was performed using the best-placed reference methodology, while gene prediction was conducted with GeneMarkS-2\u0026thinsp;+\u0026thinsp;v6.9 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/karlgem/GeneMarkS-2\u003c/span\u003e\u003cspan address=\"https://github.com/karlgem/GeneMarkS-2\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://genemark.bme.gatech.edu/\u003c/span\u003e\u003cspan address=\"https://genemark.bme.gatech.edu/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). EggNOG-mapper(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/eggnogdb/eggnog-mapper\u003c/span\u003e\u003cspan address=\"https://github.com/eggnogdb/eggnog-mapper\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) was utilized for functional annotation, incorporating orthology-based assignments, Gene Ontology (GO) terms, KEGG pathway mapping, and Clusters of Orthologous Groups (COG) classifications [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e]. Genome assembly completeness and contamination were assessed using CheckM \u003cb\u003e(\u003c/b\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/Ecogenomics/CheckM\u003c/span\u003e\u003cspan address=\"https://github.com/Ecogenomics/CheckM\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) and CheckM2 \u003cb\u003e(\u003c/b\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/chklovski/CheckM2\u003c/span\u003e\u003cspan address=\"https://github.com/chklovski/CheckM2\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) [\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e][\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e]. CheckM v1.0.2 evaluates genome integrity based on lineage-specific marker genes, while CheckM2 leverages machine-learning models for enhanced accuracy. The quality assessment process includes gene identification, metadata calculation, and DIAMOND-based genome annotation, followed by machine-learning-driven prediction of genome completeness and contamination.\u003c/p\u003e\u003cp\u003ePathway analysis was performed using Blast2GO to elucidate the biological functions of annotated genes. This encompasses pathway classifications, species mapping, and statistical evaluations based on Fisher\u0026rsquo;s Exact Test and Gene Set Enrichment Analysis (GSEA). Metabolic annotation including the identification of key biogeochemical cycles genes was performed with DRAM (Distilled and Refined Annotation of Metabolism)[\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e], (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/WrightonLabCSU/DRAM\u003c/span\u003e\u003cspan address=\"https://github.com/WrightonLabCSU/DRAM\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). The analysis was conducted with default parameters. The protein domains and motifs were identified using InterProScan(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ebi-pf-team/interproscan\u003c/span\u003e\u003cspan address=\"https://github.com/ebi-pf-team/interproscan\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). The completeness and accuracy of the SARSHI1 genome assembly were further validated using BUSCO v5(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/metashot/busco\u003c/span\u003e\u003cspan address=\"https://github.com/metashot/busco\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e)[\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e]. The assembled genome was compared against the \u003cem\u003eactinobacteria\u003c/em\u003e_phylum_odb10 dataset, comprising 292 orthologous genes from 893 \u003cem\u003eactinobacteria\u003c/em\u003e genomes. The analysis was performed in \u003cem\u003egenome\u003c/em\u003e mode to quantify genome integrity based on complete, fragmented, or missing genes.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003e2.10 Comparative Genomics and Taxonomic Analysis\u003c/h2\u003e\u003cp\u003eAverage Nucleotide Identity (ANI) analysis was performed using FastANI(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ParBLiSS/FastANI\u003c/span\u003e\u003cspan address=\"https://github.com/ParBLiSS/FastANI\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) with default parameters [\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e]. The SARSHI1 genome was compared bidirectional with \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e CSLK01-03. FastANI fragment-based k-mer mapping strategy enables efficient and accurate ANI estimation, yielding key metrics such as ANI percentage based on fragment matches and total comparisons. To further refine taxonomic classification, digital DNA-DNA hybridization (dDDH) analysis was performed against the closest reference genome (GCA_030360185.1) using Genome-to-Genome Distance Calculator (GGDC 3.0) \u003cb\u003e(\u003c/b\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://ggdc.dsmz.de/\u003c/span\u003e\u003cspan address=\"https://ggdc.dsmz.de/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). The analysis leverages three distinct formulas length-normalized intergenomic distance, high-scoring segment pair-based alignment, and sum-of-bits approach to calculate distance. Bootstrapping was applied and species delineation was performed against the 70% dDDH threshold, a critical microbial taxonomic criterion to determine confidence intervals. The G\u0026thinsp;+\u0026thinsp;C content differences and evolutionary distances further validated genomic congruence[\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e] [\u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eWhole-genome-based taxonomic classification was performed using the Type (Strain) Genome Server (TYGS), (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://tygs.dsmz.de/\u003c/span\u003e\u003cspan address=\"https://tygs.dsmz.de/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). The ten closest types (strains) were selected using the MASH distance algorithm and 16S rRNA gene-based comparisons. The intergenomic distances were computed under the trimming algorithm with 100 replicates using the Genome BLAST Distance Phylogeny (GBDP) approach. The digital DNA-DNA hybridization (dDDH) values were determined, followed by phylogenomic tree construction using the FASTME algorithm with subtree pruning and regrafting (SPR\u003cb\u003e).\u003c/b\u003e Phylogenetic trees were visualized using PhyD3, with species clustering based on the 70% dDDH threshold and subspecies delineation at \u003cb\u003e7\u003c/b\u003e9% dDDH[\u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e69\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\u003ch2\u003e2.11 Annotation of Hydrocarbon Degradation Genes\u003c/h2\u003e\u003c/div\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003e2.11.1 CANT_HYD Analysis\u003c/h2\u003e\u003cp\u003eThe aerobic and anaerobic hydrocarbon degradation genes were annotated using CANT_HYD(\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/dgittins/CANT-HYD-HydrocarbonBiodegradation\u003c/span\u003e\u003cspan address=\"https://github.com/dgittins/CANT-HYD-HydrocarbonBiodegradation\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) analysis. The genomic sequences were processed using HMMER 3.4 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://hmmer.org/\u003c/span\u003e\u003cspan address=\"http://hmmer.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) to detect hydrocarbon-degrading enzymes. Hidden Markov Models (HMMs) were built from multiple sequence alignments (MSAs) of known enzymes, including alkane 1-monooxygenase (AlkB), nitrate and sulfite reductases (AhyA), and flavin-containing monooxygenases (AlmA_GroupI). The \u003cem\u003ehmmscan\u003c/em\u003e algorithm was employed to search against the curated HMM database, using an E-value threshold of \u0026lt;\u0026thinsp;1e-5. The KEGG and UniProt databases were used to map the identified genes to aerobic hydrocarbon degradation pathways [\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e70\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\u003ch2\u003e2.11.2 HADEG Analysis\u003c/h2\u003e\u003cp\u003eThe HADEG (Hydrocarbon Aerobic Degradation Enzymes and Genes) database was utilized to characterize aerobic hydrocarbon degradation genes. The alignment was performed in BLASTp mode using DIAMOND v2.0 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/bbuchfink/diamond)agains\u003c/span\u003e\u003cspan address=\"https://github.com/bbuchfink/diamond)agains\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003et the HADEG database. The orthologous gene clusters were detected and classified using Proteinortho 6 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://gitlab.com/paulklemm_PHD/proteinortho\u003c/span\u003e\u003cspan address=\"https://gitlab.com/paulklemm_PHD/proteinortho\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e)[\u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e71\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e\u003ch2\u003e2.11.3 HMDB Analysis\u003c/h2\u003e\u003cp\u003eThe hydrocarbon monooxygenase genes were annotated using DIAMOND-based sequence searches against the Hydrocarbon Monooxygenase Gene Database (HMDB) [\u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e72\u003c/span\u003e]. A homology-based functional annotation was performed using DIAMOND with an E-value threshold of \u0026le;\u0026thinsp;1e-5 and a minimum identity threshold of 25% to ensure high-confidence hits. The identified sequences were then filtered based on sequence identity and bit score to retain only significant matches. Further, the results were validated based on sequence conservation, domain architecture, and functional relevance with well-characterized hydrocarbon monooxygenases, including alkane 1-monooxygenase (\u003cem\u003ealkB\u003c/em\u003e) and rubredoxin electron transfer systems (\u003cem\u003eprmA, prmB, prmC, and prmD\u003c/em\u003e).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec17\" class=\"Section2\"\u003e\u003ch2\u003e2.11.4 antiSMASH Analysis: Secondary Metabolite Biosynthesis\u003c/h2\u003e\u003cp\u003eThe antiSMASH (Antibiotics \u0026amp; Secondary Metabolite Analysis Shell) (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/antismash\u003c/span\u003e\u003cspan address=\"https://github.com/antismash\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) analysis, enabling \u003cem\u003eClusterBlas\u003c/em\u003et and Known \u003cem\u003eClusterBlast\u003c/em\u003e was performed to identify Biosynthetic gene clusters (BGCs)[\u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e73\u003c/span\u003e]. The analysis facilitates comparative genomic analysis by aligning predicted BGCs with known clusters. The \u003cem\u003eProcluster Region\u003c/em\u003e Analysis approach was used with \u003cem\u003erelaxed\u003c/em\u003e strictness to ensure the comprehensive detection and annotation of secondary metabolite gene clusters.\u003c/p\u003e\u003c/div\u003e"},{"header":"3. Results","content":"\u003cdiv id=\"Sec19\" class=\"Section2\"\u003e\n \u003ch2\u003e3.1 Screening and Isolation of Bacteria\u003c/h2\u003e\n \u003cp\u003eComprehensive biochemical and microbial tests were performed to characterize the strain SARSHI1. The bacterium tested negative for the indole and Methyl Red (MR) tests, indicating the absence of tryptophan metabolism and mixed acid fermentation pathways. The positive Voges-Proskauer (VP) test, confirms its ability to produce neutral end products from glucose fermentation. Further, assays confirmed that SARSHI1 was Gram-positive and oxidase-negative. The gelatin liquefaction test indicated the presence of extracellular proteolytic activity. The motility test showed a negative result, confirming the non-motile nature of the strain (Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e \u003cstrong\u003eSupplementary Information 1).\u003c/strong\u003e The SARSHI1 was further checked for growth at different crude oil concentrations, and optical density (OD) measurements at 600 nm were recorded at 24-hour intervals. The growth dynamics of SARSHI1 under varying crude oil concentrations as the sole carbon source are presented in Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e. A pronounced lag phase was observed at a crude oil concentration of 15% (w/v), indicating delayed adaptation to higher hydrocarbon levels.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e\n \u003ch2\u003e3.2 DNA Quality Control and Quantification\u003c/h2\u003e\n \u003cdiv id=\"Sec21\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.1 QC of extracted DNA on Agarose Gel\u003c/h2\u003e\n \u003cp\u003eThe quality of extracted DNA was checked using Agarose gel electrophoresis. As shown in Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e \u003cstrong\u003eSupplementary Information 1\u003c/strong\u003e, Lane M contains a molecular weight marker (a DNA ladder) with distinct bands. Lane 1 displays a single, high-molecular-weight DNA band with no visible smearing, indicating high quality and intact samples. The absence of degraded fragments suggests minimal DNA fragmentation or contamination.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec22\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.2 Quantification of extracted DNA using NanoDrop\u003c/h2\u003e\n \u003cp\u003eThe NanoDrop quantified DNA concentration was 54.2 ng/\u0026micro;L. The OD ratio at A260/280 was 1.83, within the standard range for pure DNA, indicating minimal protein contamination. The OD ratio of 1.71 at A260/230 implies reasonable purity. However, a value closer to 2.0 would be ideal for minimal organic contaminant interference (e.g., phenol or other solvents). The sample passed the quality control (QC) criteria for downstream analysis (Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e\u003cstrong\u003e).\u003c/strong\u003e\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eQuantification of extracted DNA sample on NanoDrop\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSr No\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSample ID\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eNanoDrop Readings\u003c/p\u003e\n \u003cp\u003e(ng/ul)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eNanoDrop\u003c/p\u003e\n \u003cp\u003eOD \u003csub\u003eA260/280\u003c/sub\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eNanoDrop\u003c/p\u003e\n \u003cp\u003eOD \u003csub\u003eA260/230\u003c/sub\u003e\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eQC Status\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSARSHI1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e54.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1.83\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1.71\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePass\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec23\" class=\"Section2\"\u003e\n \u003ch2\u003e3.3 Molecular Characterization by Ribotyping\u003c/h2\u003e\n \u003cp\u003eThe forward and reverse abi trace files obtained from Sanger dideoxy sequencing were assembled using DNA Baser software and saved into FASTA format. Comparative sequence analysis was performed using the NCBI-BLAST (Basic Local Alignment Search Tool), employing an e-value threshold of 0.0 and a coverage cut-off of 200%. Based on morphological, biochemical, and 16S rRNA sequence analyses, the strain SARSHI1 was identified as a Gram-positive bacterium, \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e. The 16S rRNA sequence of strain \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e SARSHI1 has been submitted to the NCBI GenBank database under the accession number PV034287. The 16S rRNA sequence exhibited a GC content of 60.69%, a molecular weight of 439,033 Daltons, and a total length of 720 bp. Further sequence comparisons revealed that SARSHI1 shared a high degree of similarity with various \u003cem\u003eRhodococcus\u003c/em\u003e species, displaying 98.47% identity with the partial sequences of \u003cem\u003eRhodococcus ruber\u003c/em\u003e strain JC435 and \u003cem\u003eRhodococcus ruber\u003c/em\u003e strain DSM 43338, as well as 98.33% similarity with the partial sequences of \u003cem\u003eRhodococcus aetherivorans\u003c/em\u003e strain DSM44752 and strain 10bc312. The strain also exhibited 97.22% sequence identity with \u003cem\u003eRhodococcus nanhaiensis\u003c/em\u003e strain SCSIO10187. Notably, the strain 10bc312 was isolated from a petrochemical sludge bioreactor.\u003c/p\u003e\n \u003cp\u003eThe top 20 BLAST hits with \u0026gt;\u0026thinsp;95% sequence similarity were retrieved for phylogenetic tree construction. The phylogenetic tree demonstrated in Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e \u003cstrong\u003ewas\u003c/strong\u003e generated using the neighbor-joining method, applying the Maximum Composite Likelihood model. Evolutionary parameters, including base composition bias, Disparity Index, average substitution rates per site, and Transition/Transversion bias, were estimated using the Kimura 2-parameter model (Table \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e\u003cstrong\u003e).\u003c/strong\u003e The substitution patterns were computed based on the Tamura-Nei model. The summary statistics and maximum Likelihood (ML) scores for tree topology are provided in Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e, \u003cstrong\u003eSupplementary Information 1.\u003c/strong\u003e\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eEvolutionary Analysis using MEGAXI\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eConserved Sites\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e172\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eVariable Sites\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e672\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eParsim-info Sites\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e552\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSingleton Sites\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e120\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAverage Disparity Index\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.946420\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAverage Composition distance\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e1.047846\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAverage Pairwise Distance\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e4.313008\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eOverall Mean Distance\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e4.31\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTransition /Transversion Bias\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003e0.63\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eA detailed structural analysis of the rRNA sequence was conducted to identify conserved functional motifs essential for ribosomal activity. Secondary structure predictions were generated using UNAFOLD under specific conditions, including a window size of 12, a folding temperature of 37\u0026deg;C, and an ionic concentration of 1 M NaCl. The predicted RNA structure, along with its respective free energy value (\u0026Delta;G= -256.00 kcal/mol) is illustrated in Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e, \u003cstrong\u003eSupplementary Information 1.\u003c/strong\u003e Functional annotations highlighted key regions, and thermodynamic properties were summarized in Table \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e, \u003cstrong\u003eSupplementary Information 1.\u003c/strong\u003e Furthermore, an entropy assessment was performed using RNAfold illustrated in a hill plot in Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec24\" class=\"Section2\"\u003e\n \u003ch2\u003e3.4 Library QC using Tape Station\u003c/h2\u003e\n \u003cp\u003eThe electropherogram depicted in Fig. \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e provides a detailed fragment size distribution analysis of a nucleic acid sample, obtained through TapeStation. The data indicate a primary fragment size range between 197 bp and 963 bp, with an average fragment length of 409 bp. The sample exhibits a high concentration of nucleic acid fragments, quantified at 26.7 ng/\u0026micro;l, and a region molarity of 110 nmol/l indicates sufficient nucleic acid yield for downstream analysis. Notably, the detected fragment population accounts for 95.85% of the total sample, indicating minimal degradation and contamination. The well-defined lower and upper markers ensure a high-integrity sample, suitable for sequencing.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec25\" class=\"Section2\"\u003e\n \u003ch2\u003e3.5 Data Generation and Analytics\u003c/h2\u003e\n \u003cp\u003eThe raw sequencing reads of SARSHI1 underwent comprehensive quality control and preprocessing using Trimmomatic v0.39. The adapter sequences containing more than 5% ambiguous nucleotides (\u0026quot;N\u0026quot;), and low-quality reads with over 10% of bases exhibiting a Phred quality score below 25 were systematically removed. A stringent filtering strategy was implemented, incorporating sliding window trimming of 10 bp, wherein bases with average quality within the window fell below (Phred\u0026thinsp;\u0026lt;\u0026thinsp;25) were discarded. Additionally, leading and trailing bases with quality scores below 25 were removed to ensure sequence integrity. Post-trimming only reads exceeding a minimum length threshold of 100 nucleotides were retained, resulting in high-confidence data comprising approximately 12.89 million high-quality reads \u003cstrong\u003e(\u003c/strong\u003eTable\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e\u003cstrong\u003e).\u003c/strong\u003e\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab3\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eRead statistics (Illumina PE Short reads)\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSample Name\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRaw Reads\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRaw Total bases\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRaw Data in GB\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eHQ\u003c/p\u003e\n \u003cp\u003eReads\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eHQ total bases\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eHQ data in GB\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSARSHI1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e13,900, 477\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4,182,022,022\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4.18\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e13,169,190\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3,904,771,108\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3.9\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec26\" class=\"Section3\"\u003e\n \u003ch2\u003e3.5.1 ONT Read Statistics\u003c/h2\u003e\n \u003cp\u003eFor sample SARSHI1, single-end sequencing was employed, and the raw data were processed in FASTQ format to ensure compatibility. The sequencing achieved an approximate genome coverage of 800X, underscoring the depth and reliability of the generated data. The read statistics are summarized in Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e, with read lengths ranging from a minimum of 68 bp to a maximum of 936,949 bp, highlighting the Nanopore sequencer\u0026rsquo;s capability to produce both short and ultra-long reads. The total high-quality read data comprises 4.83 GB.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab4\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eONT read statistics\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSample Name\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eNo of reads\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTotal number of bases\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eHQ reads\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eHQ Total Bases\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMin_length\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMax_length\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eData in GB\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSARSHI1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2,539,063\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4,826,079,667\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1,567,736\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4,112,035,435\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e68\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e936949\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4.83\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec27\" class=\"Section2\"\u003e\n \u003ch2\u003e3.6 Hybrid De-novo Genome Assembly\u003c/h2\u003e\n \u003cp\u003eThe hybrid genome assembly of high-quality Illumina paired-end reads and Oxford Nanopore Technology (ONT) reads for sample SARSHI1 was performed using Unicycler v0.48 (Fig. \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e\u003cstrong\u003e).\u003c/strong\u003e This assembly successfully reconstructed one chromosome and one plasmid, achieving an N50 value of 5536171 bp \u003cstrong\u003e(\u003c/strong\u003eTable \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e). The assembly robustness was further validated using QUAST and Bandage, with comprehensive metrics detailed in Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e, \u003cstrong\u003eSupplementary Information 2\u003c/strong\u003e. The genome assembly consisted of only two contigs, reflecting minimal fragmentation and high continuity. The total genome length was 5,695,289 bp, with the largest contig spanning 5,536,171 bp, demonstrating that the majority of the genome was assembled into a single dominant contig. The N50 and N90 values of 5,536,171 bp further confirm high contiguity and structural integrity. The GC content of 70.08% was consistent with known bacterial genomes, reinforcing assembly reliability. Moreover, the L50 and L90 values (1) denoted that a single contig contained at least 50% and 90% of the genome.\u003c/p\u003e\n \u003cp\u003eThe absence of Ns per 100 kbp (0.00) confirmed a gap-free assembly, while the auN value of 5,385,944 pinpoints genome completeness and continuity \u003cstrong\u003e(\u003c/strong\u003eFigs. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e and \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e, \u003cstrong\u003eSupplementary Information 2).\u003c/strong\u003e The raw sequencing data for SARSHI1 \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e have been deposited in the Sequence Read Archive (SRA) under Accession \u003cstrong\u003eSRX27520007\u003c/strong\u003e (Illumina data) and Accession \u003cstrong\u003eSRX27520006\u003c/strong\u003e (ONT data), as part of BioProject \u003cstrong\u003ePRJNA1217105\u003c/strong\u003e and BioSample \u003cstrong\u003eSAMN46479200.\u003c/strong\u003e The genome assembly visualization using Bandage revealed a nearly circular contig representing the complete genome, and a small circular structure corresponding to the plasmid \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e,\u0026nbsp;\u003cstrong\u003eSupplementary Information 2)\u003c/strong\u003e. A minor discontinuity was observed, possibly indicating an unresolved repeat region. The polygonal and jagged shape of the contig, rather than a perfectly smooth circle, refers to sequencing depth variations or unresolved repeats.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab5\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eUnicyler Genome Assembly Statistics\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDescription\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSARSHI1\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSequence\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eTotal Length(bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5,695,289\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eN50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5536171\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMaximum Length of scaffold(bp)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e5536171\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGC%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e70.08\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec28\" class=\"Section2\"\u003e\n \u003ch2\u003e3.7 Genome Representation and Comprehensive Annotation\u003c/h2\u003e\n \u003cp\u003eThe circular genomic map of \u003cstrong\u003eRhodococcus indonesiensis\u003c/strong\u003e \u003cstrong\u003eSARSHI1\u003c/strong\u003e (5.7 Mbp), illustrated in Fig. \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e, provides a comprehensive visualization of its structural and functional organization. The circular genome representation facilitates the interpretation of gene distribution and evolutionary characteristics. The outermost ring highlights the distribution of protein-coding sequences (CDS) in grey, alongside key genetic elements such as transfer RNA (tRNA, green), ribosomal RNA (rRNA, red), and non-coding RNAs (ncRNA, shades of blue). The presence of CRISPR loci (dark green) suggests active defense mechanisms against phage and plasmid invasion, signifying adaptive immunity. The second ring represents GC content variation, implying regions of high transcriptional activity or horizontal gene transfer. The innermost rings depict GC skew, where the positive skew (green) corresponds to the leading strand and the negative skew (red) to the lagging strand, alluding to replication dynamics.\u003c/p\u003e\n \u003cp\u003eThe complete genome of \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e strain SARSHI1 was annotated using the Prokaryotic Genome Annotation Pipeline (PGAP), revealing a genomic architecture comprising two contigs, each of which contains a chromosome and a plasmid. The complete genome sequence has been deposited in the GenBank database under accession numbers CP180630 (complete genome) and CP180631 (plasmid). A total of 5,220 genes were identified, including 5,150 coding sequences, of which 5,094 were classified as protein-coding genes. Additionally, 70 RNA genes were annotated, comprising 12 ribosomal RNA (rRNA) genes (four copies each of 5S, 16S, and 23S rRNAs), 55 transfer RNAs (tRNAs), and a single non-coding RNA (ncRNA). The 56 pseudogenes were characterized by various structural impairments as 28 contained frameshift mutations, 38 (incomplete), four exhibited internal stop codons, and nine displayed multiple disruptions. Notably, no pseudogenes with ambiguous residues were detected. Functional annotation revealed key genetic determinants involved in hydrocarbon metabolism, including alkane monooxygenases and oxidoreductases.\u003c/p\u003e\n \u003cp\u003eThe pRiA4b ORF-3 family and plasmid maintenance gene systems were identified, highlighting intrinsic mechanisms for plasmid stability and inheritance. The antibiotic biosynthesis and antimicrobial resistance genetic elements revealed potential regulatory pathways governing adaptive mechanisms in contaminated environments.\u003c/p\u003e\n \u003cdiv id=\"Sec29\" class=\"Section3\"\u003e\n \u003ch2\u003e3.7.1 Functional Annotation using Bakta\u003c/h2\u003e\n \u003cp\u003eThe annotated genome contains a single chromosomal replication origin (\u003cem\u003eoriC\u003c/em\u003e) and lacks plasmid replication (\u003cem\u003eoriV\u003c/em\u003e) or transfer (\u003cem\u003eoriT\u003c/em\u003e) origins, highlighting the absence of conjugative transfer elements. The high GC content (~\u0026thinsp;70.08%) is consistent with its classification within high-GC bacterial lineages, likely \u003cem\u003eActinobacteria\u003c/em\u003e. The genome encodes a robust translational and regulatory network with 18 ncRNA regions. The presence of a CRISPR array further suggests an adaptive immunity against phages or plasmids. The genome harbors 5,169 coding sequences (CDSs), with 243 hypothetical proteins. Notably, a small open reading frame (sORF) was predicted to encode a putative small peptide, which may play a role in regulatory or metabolic processes. Each CDS was assigned a unique locus tag and mapped to specific genomic coordinates, with most genes exhibiting high query coverage (~\u0026thinsp;1.0) and strong sequence identity with known proteins. Several CDSs featured low e-values (approaching 0) and high bit scores, indicating well-characterized proteins with potential roles in metabolic regulation.\u003c/p\u003e\n \u003cp\u003eConversely, a subset of CDSs remained uncharacterized, highlighting the presence of novel or hypothetical proteins \u003cstrong\u003e(Supplementary Information 3)\u003c/strong\u003e. Functional annotation using EggNOG provided insights into gene classifications, functions, associated pathways, and Gene Ontology (GO) terms. The analysis identified genes encoding metal-dependent hydrolases, ATP-binding proteins, and conserved hypothetical proteins with unknown functions. Furthermore, genes encoding alkane hydroxylases, monooxygenases, aromatic hydroxylases, ortho-cleavage, and meta-cleavage were detected, pinpointing complete mineralization of aliphatic and aromatic hydrocarbons via ring-cleaving mechanisms. Genes associated with oxidative stress response and membrane transporters indicate adaptive mechanisms for survival in hydrocarbon-rich environments \u003cstrong\u003e(Supplementary Information 4).\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eThe comprehensive genome annotation using Bakta, BLAST2GO, and NCBI PGAP confirms that contig_2 is a plasmid with 134 coding sequences (CDSs) and 2 pseudogenes. Notably, the absence of transfer RNAs (tRNAs), transfer-messenger RNAs (tmRNAs), ribosomal RNAs (rRNAs), non-coding RNAs (ncRNAs), regulatory non-coding RNAs, CRISPR arrays, origins of replication (oriCs/oriVs), and origins of transfer (oriTs) and presence of several plasmid-associated elements pinpointed extra chromosomal nature. The key hallmarks include the presence of mobile genetic elements, IS256 family transposases (ENBDON_05209), \u003cem\u003eTnsA-\u003c/em\u003elike transposases (ENBDON_05215), and site-specific integrases (ENBDON_05214). The plasmid stability and inheritance-related toxin-antitoxin (TA) systems comprise the \u003cem\u003eRelE/ParE\u003c/em\u003e toxin-antitoxin system (ENBDON_05204) and the \u003cem\u003eHigA\u003c/em\u003e family antitoxin (ENBDON_05205). Partitioning system components, \u003cem\u003eParA\u003c/em\u003e (ENBDON_05247) and \u003cem\u003eParB\u003c/em\u003e (ENBDON_05246) facilitate accurate plasmid segregation during bacterial cell division, further reinforcing the extra chromosomal nature of contig_2.\u003c/p\u003e\n \u003cp\u003eThe annotation also identified transporter proteins, including the ABC transporter ATP-binding protein (ENBDON_05239) and an \u003cem\u003eMFS\u003c/em\u003e transporter (ENBDON_05230), which may imply adaptive responses and antibiotic resistance. The type IV secretion system (T4SS), including \u003cem\u003eVirB4\u003c/em\u003e and \u003cem\u003eTraM\u003c/em\u003e facilitates conjugative transfer mechanisms. The presence of the \u003cem\u003eTrwC\u003c/em\u003e relaxase, a pivotal component of conjugative plasmids, further substantiates this classification. BLAST2GO annotation corroborated these findings by identifying plasmid \u003cem\u003epRiA4b\u003c/em\u003e ORF-3 family homologs and proteins with Mu transposase C-terminal domains. The TraM domain-containing proteins play a pivotal role in horizontal gene transfer, potentially expediting the dissemination of antibiotic resistance genes within bacterial populations \u003cstrong\u003e(Supplementary Information 4).\u003c/strong\u003e\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec30\" class=\"Section3\"\u003e\n \u003ch2\u003e3.7.2 Blast2GO Annotation\u003c/h2\u003e\n \u003cp\u003eThe assembly metrics, characterized by an N50 value of 5.5 Mbp, indicate a high-quality and contiguous genome assembly. Sequence alignment across varying coverage thresholds, including High-Scoring Segment Pair per Hit (HSP/Hit) and HSP per Query Sequence (HSP/Seq), exhibits a high degree of consistency, particularly at 100% coverage. Notably, the highest number of alignment hits was observed in \u003cem\u003eRhodococcus ruber\u003c/em\u003e BKS 20\u0026ndash;38 (1929 hits) and \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e (1905 hits), suggesting a strong phylogenetic and functional association with these taxa \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e, \u003cstrong\u003eSupplementary Information 5).\u003c/strong\u003e A minor discrepancy exists between the total sequences identified by BLAST (5144) and those annotated with GO terms (4190), suggesting that some sequences remain unassigned to GO terms. The GO distribution graph highlights the predominance of catalytic and electron transfer activities in the MF category and detoxification and localization functions in the BP category \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e, \u003cstrong\u003eSupplementary Information 5).\u003c/strong\u003e The majority of sequences (803) were assigned a single Gene Ontology (GO) term, while only a small subset exhibited more than ten GO terms, suggesting a multifunctional role. The enzymatic classification highlights the dominance of oxidoreductases, pinpointing the oxidation potential \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e \u003cstrong\u003eSupplementary Information 5).\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eThe SARSHI1 genome abundantly contains sequences with significant similarity, associated with essential biological processes. For instance, sequence ENBDON_00001, comprising 548 amino acids, exhibits a high degree of similarity (91.89%) to known protein sequences in the Non-Redundant (NR) protein database. BLAST analysis identified 20 significant hits with an E-value of 0.00E\u0026thinsp;+\u0026thinsp;00, demonstrating 100% query coverage and, in several instances, 100% sequence identity. Similarly, the ENBDON_00007 corresponds to an alpha/beta fold hydrolase consisting of 300 amino acids and exhibited 95.44% sequence similarity to known hydrolases in the NR protein database. BLAST analysis produced multiple significant hits (E-value 0.00E\u0026thinsp;+\u0026thinsp;00), reinforcing its strong homology to characterized hydrolases. Moreover, the genome encodes diverse gene families associated with hydrocarbon metabolism(\u003cstrong\u003eSupplementary Information 6).\u003c/strong\u003e\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec31\" class=\"Section2\"\u003e\n \u003ch2\u003e3.8 Genome Assembly and QualityAssessment\u003c/h2\u003e\n \u003cp\u003eThe SARSHI1.fna_assembly genome belongs to the order \u003cem\u003eActinomycetales\u003c/em\u003e, exhibiting 100% completeness without contamination. The 570 single-copy marker genes with minimal duplication notion a high-quality genome. Most marker genes are present in single copies, with \u0026ge;\u0026thinsp;90% amino acid identity (AAI), emphasizing the completeness and absence of contamination \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003eA). The unimodal distribution observed in the GC content plot validates genomic homogeneity, while the percent coding density plot corroborates the high proportion of functional genes. The \u0026Delta;TD plot highlights a well-assembled genome with low \u0026Delta;TD values (~\u0026thinsp;0.1\u0026ndash;0.2), indicating relative homogeneity, whereas higher \u0026Delta;TD values (~\u0026thinsp;0.3\u0026ndash;0.6) in a small fraction of sequences suggest horizontal gene transfer (HGT). Most sequences cluster near 0.07 \u0026Delta;TD, aligning well with the expected genome profile, while a few deviations (~\u0026thinsp;0.12\u0026ndash;0.14 \u0026Delta;TD) may correspond to plasmids. Notably, a large contig (~\u0026thinsp;5000 kbp) with a very low \u0026Delta;TD suggests the presence of a dominant genome with minimal variation (Fig. \u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003eB\u003cstrong\u003e).\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eFurther assessment using CheckM2 with a Neural Network-Specific model demonstrated 100% completeness, with no missing sequences. The genome predominantly comprises functional protein-coding sequences with a coding density of 91.4%. The assembly consists of only two contigs, with the largest spanning 5.5 million base pairs and an N50 value of 5,536,171, signifying high contiguity and minimal fragmentation. The genome contains 5,170 protein-coding genes with an average gene length of 333.85 base pairs, inferred functionally rich genetic composition. Overall, the SARSHI1.fna_assembly genome represents a high-quality, well-assembled complete genome with minimal contamination and high contiguity.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec32\" class=\"Section2\"\u003e\n \u003ch2\u003e3.9 Genomic Insights into Metabolism\u003c/h2\u003e\n \u003cp\u003eThe integrated Gene Ontology (GO) annotation framework provides a structured and hierarchical representation of sequence localization and molecular activities within cellular components (CC), molecular functions (MF), and biological processes (BP). The hierarchical organization of GO terms highlights functional interdependencies, where deeper nodes denote increasingly specific molecular functions. CC ontology analysis reveals significant sequences associated with cellular anatomical structures (GO: 0110165) and intracellular components (GO: 0005622), including cytoplasm (GO: 0005737) and cytosol (GO: 0005829), suggesting enriched intracellular processes. Moreover, the abundance of protein complexes (GO: 0032991) signifies structural integrity and enriched enzymatic activity. \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e \u003cstrong\u003eSupplementary Information 7).\u003c/strong\u003e The MF ontology predominantly features binding (GO: 0005488) and catalytic (GO: 0003824) activities. The dominance of critical subcategories such as nucleic acid binding (GO: 0003676), ion binding (GO: 0043167), and small molecule binding (GO: 0036094) is rich in regulatory mechanisms. The dominance of oxidoreductase (GO: 0016491), hydrolase (GO: 0016787), and transferase(GO:0016740) activities reflects robust oxidative metabolism \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e \u003cstrong\u003eSupplementary Information 7).\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eThe BP ontology highlights functional domains associated with primary and secondary metabolic pathways. A significant presence of sequences involved in response to external stimuli (GO: 0050896), stress conditions (GO: 0006950), and signal transduction pathways (GO: 0007165) highlights adaptation to diverse environmental changes. Moreover, the augmented oxidoreductase and hydrolase activities substantiate robust hydrocarbon metabolism \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e, \u003cstrong\u003eSupplementary Information 7).\u003c/strong\u003e The BUSCO analysis further confirms the complete genome, with no fragmented or missing genes. 100% of BUSCO genes were identified, including 98.97% single-copy complete genes (289 genes) and 1.03% duplicated complete genes (3 genes)\u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e9\u003c/span\u003e\u003cstrong\u003e)\u003c/strong\u003e. The presence of all expected orthologous genes suggests a well-assembled genome with minimal duplication.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec33\" class=\"Section2\"\u003e\n \u003ch2\u003e3.10 Functional Annotation of Metabolism\u003c/h2\u003e\n \u003cp\u003eThe SARSHI1 genome encodes an array of metabolic pathways facilitating various biochemical transformations. It possesses a fully functional glycolytic pathway (Embden-Meyerhof pathway) and a complete tricarboxylic acid (TCA) cycle (Krebs cycle), complemented by a fully integrated pentose phosphate pathway. However, alternative carbon fixation mechanisms, including the 3-hydroxypropionate bi-cycle and the Reductive Acetyl-CoA pathway (Wood-Ljungdahl pathway), are incomplete, implying limited autotrophic potential. The genome encodes key components of the electron transport chain (ETC), including Complex I (NADH dehydrogenase), Complex III (cytochrome \u003cem\u003ebc1\u003c/em\u003e), and Complex IV (various cytochrome oxidases), supporting aerobic respiration. The absence of certain subunits denotes metabolic flexibility and enables the exploration of alternative electron acceptors. Methanogenesis pathways are partially encoded, particularly those mediating CO₂ reduction to methane and acetate conversion to methane, suggesting a possible role in anaerobic carbon cycling. Nevertheless, the lack of a fully assembled methanogenesis pathway indicates constrained methane biosynthesis \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e10\u003c/span\u003eA\u003cstrong\u003e).\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eFurthermore, the genome harbors an extensive repertoire of nitrogen metabolism genes, encompassing nitrate and nitrite reduction, nitrogen fixation, and ammonia oxidation, underscoring its potential role in nitrogen cycling. However, the absence of pathways facilitating the conversion of nitrous oxide to dinitrogen gas delineates incomplete denitrification. The genetic potential for metal resistance and detoxification is evident through the presence of arsenate and mercury reduction genes, suggesting a capacity for heavy metal bioremediation. Notably, the absence of genes encoding photosynthetic components, including Photosystem I and II, infers a lack of oxygenic photosynthesis. Likewise, sulfur metabolism pathways, including sulfate reduction and thiosulfate oxidation, are incomplete, reflecting constraints in sulfur cycling (Fig. \u003cspan class=\"InternalRef\"\u003e10\u003c/span\u003eB\u003cstrong\u003e).\u003c/strong\u003e SARSHI1 represents a \u003cstrong\u003emetabolically versatile organism\u003c/strong\u003e with active \u003cstrong\u003ecarbon, nitrogen, and electron transport processes\u003c/strong\u003e, yet with notable limitations in complex \u003cstrong\u003ecarbohydrate degradation, sulfur metabolism\u003c/strong\u003e, and complete \u003cstrong\u003emethanogenesis.\u003c/strong\u003e\u003c/p\u003e\n \u003cdiv id=\"Sec34\" class=\"Section3\"\u003e\n \u003ch2\u003e3.10.1 Pathway Analysis\u003c/h2\u003e\n \u003cp\u003eThe SARSHI1 genome encodes an extensive array of hydrocarbon metabolism encompassing xylene (ko00622), toluene (ko00623), nitrotoluene (ko00633), naphthalene (ko00626), aminobenzoate (ko00627), ethylbenzene (ko00642), styrene (ko00643), and chloroalkane/chloroalkane degradation (ko00625) pathways \u003cstrong\u003e(Supplementary Information 8)\u003c/strong\u003e. Moreover, the polycyclic aromatic hydrocarbon (PAH) degradation pathway (ko00624), along with the key aromatic degradation enzyme 3-phenylpropionate/trans-cinnamate dioxygenase ferredoxin reductase (K00529), provides mechanistic insights into detoxifying carcinogenic PAHs.\u003c/p\u003e\n \u003cp\u003eThe alkane degradation pathway (ko00071) encoding alkane hydroxylases (\u003cem\u003ealkB1_2\u003c/em\u003e, \u003cem\u003ealkM\u003c/em\u003e; K00496), 3-hydroxy acyl-CoA dehydrogenase (EC:1.1.1.35), and short-chain acyl-CoA dehydrogenase (K00248) allude well-adapted system for aliphatic hydrocarbon catabolism. Furthermore, partial gene sets encoding adenosylcobalamin (ADO-CBL) biosynthesis within the cobalamin biosynthesis pathway, alongside dye-decolorizing peroxidase (K15733, K00485), demonstrate genome potential in xenobiotics detoxification \u003cstrong\u003e(Supplementary Information 9)\u003c/strong\u003e.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec35\" class=\"Section3\"\u003e\n \u003ch2\u003e3.10.2 Antibiotic Biosynthesis and Antimicrobial Resistance\u003c/h2\u003e\n \u003cp\u003eKEGG annotation unveiled a comprehensive metabolic framework governing antibiotic biosynthesis and antimicrobial resistance in SARSHI1. The genome encodes biosynthetic pathways for vancomycin (ko01055), type II polyketide products (KO01057), ansamycins (KO01051), and streptomycin (KO00521), demonstrating its potential for secondary metabolite production. Simultaneously, resistance determinants, including beta-lactam resistance (ko01501), vancomycin resistance (KO01502), CAMP resistance (KO01503), and drug metabolic enzymes such as dimethylaniline monooxygenase (N-oxide forming)/hypotaurine monooxygenase (K00485), delineate adaptive mechanisms that enhance survival under contaminated environments \u003cstrong\u003e(Supplementary Information 9)\u003c/strong\u003e.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec36\" class=\"Section2\"\u003e\n \u003ch2\u003e3.11 Comprehensive Taxonomic Analysis\u003c/h2\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec37\" class=\"Section2\"\u003e\n \u003ch2\u003e3.11.1 ANI Analysis\u003c/h2\u003e\n \u003cp\u003eThe ANI estimates were highly consistent across both comparisons, reinforcing the reliability of the similarity assessment. A slight variation (98.5% vs. 98.6061%) was observed, which is expected due to differences in homologous genome fragment alignment depending on the query-reference roles. Furthermore, the number of matched fragments (1578\u0026ndash;1598) relative to the total fragments compared (1706\u0026ndash;1898) indicates a significant proportion of shared genomic content (Table \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e\u003cstrong\u003e).\u003c/strong\u003e This high degree of similarity confirms that SARSHI1 and \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e CSLK01-03 are closely related at the genomic level.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab6\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eANI Analysis\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eQuery Genome\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eReference Genome\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eANI Estimate (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMatches\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTotal Comparisons\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSARSHI1.gff3_genome.assembly\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cem\u003eRhodococcus\u003c/em\u003e_indonesiensis_CSLK01-03_genomic.fna_assembly\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e98.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1578\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1898\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cem\u003eRhodococcus\u003c/em\u003e_indonesiensis_CSLK01-03_genomic.fna_assembly\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSARSHI1.gff3_genome.assembly\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e98.6061\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1598\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1706\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec38\" class=\"Section2\"\u003e\n \u003ch2\u003e3.11.2 dDDH Analysis\u003c/h2\u003e\n \u003cp\u003eThe digital DNA-DNA hybridization (dDDH) analysis was performed against the closest reference genome, GCA_030360185.1. The dDDH values were calculated using three distinct formulas, all surpassing the 70% species delineation threshold, thus confirming species-level relatedness (Table \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e\u003cstrong\u003e).\u003c/strong\u003e Using Formula 1, SARSHI1 exhibited a dDDH value of 89.2%, with a model confidence interval (C.I.) of 85.9% \u0026ndash; 91.9% and an evolutionary distance of 0.0851. Similarly, Formula 2 generated a dDDH value of 88.2% (C.I.: 85.7% \u0026ndash; 90.3%) with a lower evolutionary distance (0.0141), indicating a strong genomic similarity. The probability of dDDH being \u0026ge;\u0026thinsp;70% under Formula 2 was 97%, further supporting species-level classification. Among these, Formula 3 yielded the highest dDDH value (91.7%, C.I.: 89.2% \u0026ndash; 93.7%), with an evolutionary distance of 0.098 and a 99.61% probability of species-level relatedness. Additionally, the G\u0026thinsp;+\u0026thinsp;C content difference between SARSHI1 and the reference genome was 0.07, further reinforcing their taxonomic congruence, confirming that SARSHI1 belongs to the same species as the reference genome GCA_030360185.1.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab7\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 7\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003edDDH Analysis\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFormula 1\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFormula 2\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFormula 3\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eG\u0026thinsp;+\u0026thinsp;C difference\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eQuery genome\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eReference genome\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDDH\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eModel C.I.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDistance\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProb. DDH\u0026thinsp;\u0026gt;\u0026thinsp;=\u0026thinsp;70%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDDH\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eModel C.I.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDistance\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProb. DDH\u0026thinsp;\u0026gt;\u0026thinsp;=\u0026thinsp;70%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDDH\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eModel C.I.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDistance\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eProb. DDH\u0026thinsp;\u0026gt;\u0026thinsp;=\u0026thinsp;70%\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSARSHI1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eGCA_030360185.1_ASM3036018v1_genomic\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e89.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e[85.9\u0026ndash;91.9%]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0851\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e97\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e88.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e[85.7\u0026ndash;90.3%]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0141\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e95.15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e91.7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e[89.2\u0026ndash;93.7%]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.098\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e99.61\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.07\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec39\" class=\"Section2\"\u003e\n \u003ch2\u003e3.11.3 TYGS Analysis\u003c/h2\u003e\n \u003cp\u003eA comprehensive comparative genomic analysis was conducted using the Type Strain Genome Server (TYGS) to determine the taxonomic position of SARSHI1. The highest genomic congruence was observed with \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e CSLK01-03, exhibiting a digital DNA-DNA hybridization (dDDH) value (d₀) of 89.2% (confidence interval: 85.9\u0026ndash;91.9%). Additional similarity metrics, including d₄ (alignment-based similarity) and d₆ (tetranucleotide-based similarity), yielded values of 88.2% and 91.7%, respectively, reinforcing its phylogenetic proximity. Phylogenomic analyses, visualized through TYGA-generated trees, positioned SARSHI1 within the R. indonesiensis clade (Fig. \u003cspan class=\"InternalRef\"\u003e11\u003c/span\u003e \u003cstrong\u003e\u0026amp;\u003c/strong\u003e Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e \u003cstrong\u003eSupplementary Information 10).\u003c/strong\u003e In contrast, other \u003cem\u003eRhodococcus\u003c/em\u003e species exhibited lower dDDH values, with \u003cem\u003eRhodococcus electrodiphilus\u003c/em\u003e LMG 29881 displaying 82.8% similarity, followed by \u003cem\u003eRhodococcus ruber\u003c/em\u003e strains (81.1% for NBRC 15591 and 80.2% for DSM 43338). More distantly related species, such as \u003cem\u003eRhodococcus phenolicus\u003c/em\u003e and \u003cem\u003eRhodococcus yananensis\u003c/em\u003e, demonstrated d₀ values below 30%, underscoring substantial phylogenetic divergence. The established dDDH species delineation threshold of 70% unequivocally classifies SARSHI1 as a member of R. indonesiensis, further corroborated by its minimal G\u0026thinsp;+\u0026thinsp;C content variation (0.09%) (Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e \u003cstrong\u003eSupplementary Information 10).\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eSARSHI1 is categorized under Species Cluster 9, exhibiting a strong genetic affiliation with \u003cem\u003eR. indonesiensis\u003c/em\u003e within the \u003cem\u003eRhodococcus\u003c/em\u003e genus. Its genome, approximately 5.7 Mbp in size with a GC content of 70.08%, aligns with known \u003cem\u003eRhodococcus\u003c/em\u003e strains. Notably, the genome encodes 5,170 protein-coding genes, indicative of an expansive metabolic repertoire. This extensive genetic framework implies a potential for specialized enzymatic functions, further emphasizing its adaptive capabilities (Table \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e, \u003cstrong\u003eSupplementary Information 10).\u003c/strong\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec40\" class=\"Section2\"\u003e\n \u003ch2\u003e3.12 Protein Domain and Motifs Analysis\u003c/h2\u003e\n \u003cp\u003eInterProScan analysis identified functional domains across the majority of protein sequences (4,884 out of 5,170), with a substantial subset (3,393) further annotated with Gene Ontology (GO) terms \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e \u003cstrong\u003eSupplementary Information 10\u003c/strong\u003e). Comparative domain and repeat analyses utilizing the SMART, SuperFamily, and PANTHER databases uncovered a diverse repertoire of conserved motifs and protein families associated with hydrocarbon metabolism \u003cstrong\u003e(\u003c/strong\u003eFigs. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e\u0026ndash;\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e, \u003cstrong\u003eSupplementary Information 10).\u003c/strong\u003e The identified repeat motifs, including Hexapeptide, Pentapeptide, Ankyrin, and Tetratricopeptide, were prevalent. Notably, pyrrolo-quinoline quinone and WD40-like beta-propeller repeats, implicated in hydrocarbon degradation, were identified. Additionally, key enzymes of the meta-cleavage pathway were identified, including 4-Hydroxy-2-Oxovalerate Aldolase (MF_01656), which facilitates aromatic ring cleavage, and acetaldehyde dehydrogenase (MF_01657), responsible for the oxidation of aldehydes into carboxylic acids, a critical step in both aliphatic and aromatic hydrocarbon catabolism.\u003c/p\u003e\n \u003cp\u003eThe efficient processing of aldehyde intermediates is a hallmark of hydrocarbon-degrading bacterial species, underscoring the metabolic specialization of SARSHI1. Functional annotation using the HAMAP database further confirmed the presence of key enzyme families involved in hydrocarbon metabolism and broader metabolic processes. A significant fraction of the dataset (87.17%) was categorized as \u0026quot;others,\u0026quot; reflecting the extensive functional diversity of the identified domains (Fig. \u003cspan class=\"InternalRef\"\u003e12\u003c/span\u003e). Furthermore, the repeat regions with unknown biochemical functions, such as DUF308 and DUF349, suggest potential novel functionalities that warrant further experimental investigation (Fig. \u003cspan class=\"InternalRef\"\u003e13\u003c/span\u003e\u003cstrong\u003e).\u003c/strong\u003e These findings collectively underscore the functional complexity and biochemical versatility encoded within the SARSHI1 genome, particularly in the context of hydrocarbon degradation networks.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec41\" class=\"Section2\"\u003e\n \u003ch2\u003e3.13 Annotation of Hydrocarbon Degradation Genes\u003c/h2\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec42\" class=\"Section2\"\u003e\n \u003ch2\u003e3.13.1 CANT_HYD Analysis\u003c/h2\u003e\n \u003cp\u003eThe CANT_HYD analysis identifies a diverse repertoire of monooxygenases, dioxygenases, dehydrogenases, and reductases, highlighting the metabolic versatility of SARSHI1 in both aerobic and anaerobic hydrocarbon degradation. Key gene families, including \u003cem\u003eAlkB, AhyA, AlmA_GroupI, LadAB, NdoBC, BmoXXY\u003c/em\u003e, and \u003cem\u003eTmoAE\u003c/em\u003e, play crucial roles in alkane oxidation, aromatic hydrocarbon catabolism, and sulfur/nitrogen-containing hydrocarbon metabolism. Among the most prominent findings, the \u003cem\u003eAlkB\u003c/em\u003e gene family encoding alkane 1-monooxygenases, emerged as a principal enzymatic group, with high-confidence homologs such as ENBDON_05321 (E-value: 6.60E-219, Score: 725.9) and ENBDON_05322 (E-value: 2.10E-190, Score: 631.9). These enzymes facilitate the initial oxidation of alkanes to alcohols, a critical step in hydrocarbon degradation. The co-occurrence of rubredoxin systems (ENBDON_04732, E-value: 2.50E-11, Score: 41.3) and \u003cem\u003eAlmA_\u003c/em\u003eGroupI genes, including FAD-containing monooxygenase \u003cem\u003eEthA\u003c/em\u003e (ENBDON_05147, E-value: 3.90E-205, Score: 680.3), further supports metabolic adaptability under varying environmental conditions. Additionally, propane monooxygenase components (\u003cem\u003ePrmA, PrmC, sBmoX, TmoA_BmoA, TomA1, TomA3\u003c/em\u003e) define a well-established propane oxidation pathway, initiating hydroxylation to propanol, followed by downstream metabolism.\u003c/p\u003e\n \u003cp\u003eThe \u003cem\u003eLadA_alpha\u003c/em\u003e and \u003cem\u003eLadA_beta\u003c/em\u003e enzymes, containing luciferase-like domains and flavin-dependent oxidoreductases, indicate the capability for long-chain alkane metabolism, a characteristic of thermophilic bacterial systems. Genes encoding Rieske-type oxygenases (\u003cem\u003eNdoB, NdoC\u003c/em\u003e) and benzoate 1,2-dioxygenases delineate putative pathways for the degradation of naphthalene, benzene, and other aromatic hydrocarbons. These enzymes catalyze oxygenation reactions, converting aromatic compounds into catechols or dihydroxylated intermediates, which are further metabolized through the \u0026beta;-ketoadipate and ring-cleavage pathways. The identification of vanillate O-demethylase indicates a broader substrate spectrum for hydrocarbon catabolism. \u003cem\u003eDszC\u003c/em\u003e (acyl-CoA dehydrogenase) and \u003cem\u003eEbdA\u003c/em\u003e-associated reductases contribute to dibenzothiophene (DBT) and organosulfur compound degradation, emphasizing their role in bio-desulfurization. Nitrate reductases (\u003cem\u003eEbdA, CmdA\u003c/em\u003e, and putative nitrate/sulfite reductases) highlight hydrocarbon oxidation in oxygen-limited conditions. The dimethyl sulfoxide (DMSO) and trimethylamine N-oxide (TMAO) reductases suggest pathways for organosulfur compound utilization, with potential applications in petroleum bioremediation.\u003c/p\u003e\n \u003cp\u003eSeveral enzymes rely on flavin (FAD, FMN), molybdenum (Mo), iron-sulfur clusters (Fe-S), and NAD(P)H as cofactors, promoting electron transfer in oxidation-reduction reactions. The characterization of F420-dependent oxidoreductases and glucose-6-phosphate dehydrogenase (coenzyme-F420) highlights their role in hydrocarbon oxidation via electron shuttling mechanisms. These findings illustrate a highly versatile enzymatic system capable of degrading a wide range of hydrocarbons, including alkanes (short- to long-chain), aromatics (benzene, naphthalene, vanillate), and organosulfur compounds. The extensive enzymatic network integrates hydroxylation, oxidation, ring-cleavage, and anaerobic respiration, ensuring efficient hydrocarbon mineralization under diverse environmental conditions\u003cstrong\u003e(Supplementary Information 11).\u003c/strong\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec43\" class=\"Section2\"\u003e\n \u003ch2\u003e3.13.2 HADEG Analysis\u003c/h2\u003e\n \u003cp\u003eThe HADEG analysis elucidated a diverse repertoire of genes implicated in the aerobic degradation of aromatic and aliphatic hydrocarbons, underscoring the metabolic versatility of \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e SARSHI1. Genes involved in benzoate (\u003cem\u003ebenA\u003c/em\u003e, \u003cem\u003ebenB\u003c/em\u003e, \u003cem\u003ebenC\u003c/em\u003e, \u003cem\u003ebenD\u003c/em\u003e), biphenyl (\u003cem\u003ebphF\u003c/em\u003e, \u003cem\u003ebphI\u003c/em\u003e, \u003cem\u003ebphX2\u003c/em\u003e), and toluene (\u003cem\u003exylC\u003c/em\u003e) degradation exemplify the genomic potential for aromatic hydrocarbon mineralization. Furthermore, genes associated with the catechol (\u003cem\u003ecatA\u003c/em\u003e, \u003cem\u003ecatC\u003c/em\u003e, \u003cem\u003epcaJ\u003c/em\u003e) and protocatechuate (\u003cem\u003epcaG\u003c/em\u003e, \u003cem\u003epcaH\u003c/em\u003e, \u003cem\u003epcaR\u003c/em\u003e) pathways facilitate ortho-cleavage, and the gentisate pathway (\u003cem\u003enagK\u003c/em\u003e, \u003cem\u003enagL\u003c/em\u003e, \u003cem\u003exlnE\u003c/em\u003e) delineates the meta-cleavage essential for the complete mineralization of aromatic compounds. Moreover, the abundance of 4-hydroxyphenylacetate degradation genes (\u003cem\u003ehpaB\u003c/em\u003e) enhances metabolic adaptability, enabling the catabolism of a broad range of aromatic substrates. The presence of genes encoding alkane monooxygenases (\u003cem\u003ealkB\u003c/em\u003e, \u003cem\u003ealkG_rubA3_rdx\u003c/em\u003e) and Baeyer\u0026ndash;Villiger monooxygenases (BVMO, \u003cem\u003eQ9I3H5\u003c/em\u003e) establishes a robust metabolic framework for alkane oxidation.\u003c/p\u003e\n \u003cp\u003eThe identified pathways define both terminal oxidation (\u003cem\u003ealmA\u003c/em\u003e, \u003cem\u003eladA\u003c/em\u003e) and subterminal oxidation (\u003cem\u003eprmA\u003c/em\u003e, \u003cem\u003eprmB\u003c/em\u003e, \u003cem\u003eprmC\u003c/em\u003e, \u003cem\u003eprmD\u003c/em\u003e) as principal mechanisms for aliphatic hydrocarbon degradation. Furthermore, genes associated with the Finnerty pathway (\u003cem\u003eAeAB_ahpC\u003c/em\u003e, \u003cem\u003eAeAB_ahpF\u003c/em\u003e) substantiate bacterial adaptation for \u0026omega;-oxidation, a characteristic metabolic strategy of \u003cem\u003eRhodococcus\u003c/em\u003e species. Additionally, the \u003cem\u003essuD\u003c/em\u003e gene, implicated in sulfur oxidation, represents an auxiliary factor in hydrocarbon metabolism, potentially enhancing survival in sulfur-rich environments. The presence of multiple copies of key degradation genes demonstrates a well-developed and highly redundant genetic framework for hydrocarbon catabolism \u003cstrong\u003e(Supplementary Information 12 \u0026amp;\u003c/strong\u003e Fig. \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e, \u003cstrong\u003eSupplementary Information 13).\u003c/strong\u003e Moreover, gene distribution patterns indicate terminal oxidation as the predominant mechanism employed by SARSHI1 for hydrocarbon degradation \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e14\u003c/span\u003e\u003cstrong\u003e).\u003c/strong\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec44\" class=\"Section2\"\u003e\n \u003ch2\u003e3.13.3 HMDB Analysis\u003c/h2\u003e\n \u003cp\u003eThe Hydrocarbon Monooxygenase Database (HMDB) analysis identified multiple genes encoding hydrocarbon monooxygenases essential for alkane oxidation. Sequence similarity analysis revealed homologs of alkane 1-monooxygenase (\u003cstrong\u003ealkB\u003c/strong\u003e\u003cstrong\u003e)\u003c/strong\u003e and components of the rubredoxin-dependent hydroxylation system (\u003cstrong\u003eprmA, prmB, prmC, and prmD\u003c/strong\u003e), with high sequence identity. For instance, ENBDON_05051 demonstrated 100% identity to \u003cstrong\u003eprmA\u003c/strong\u003e (B5D5P6), while ENBDON_05050, ENBDON_05049, and ENBDON_05048 exhibited 94.81%, 95.66%, and 92.79% identity, respectively, to known rubredoxin-related genes. The high sequence identity and low E-values \u003cstrong\u003e(\u0026le;\u0026thinsp;0.0)\u003c/strong\u003e further support the functional relevance of these genes, confirming that SARSHI1 harbors a complete and functionally intact hydrocarbon monooxygenase system. The key statistics, including top HMDB hits, sequence identity, alignment statistics, and functional annotations, are provided in Table \u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab8\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 8\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eHMDB Analysis\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eQuery Seq-id\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSubject Seq-id\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003e% Identity\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAlignment Length\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMismatches\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGaps\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003eQuery\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colspan=\"2\"\u003e\n \u003cp\u003eSubject\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eE-value\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eBit Score\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eGlobal % Identity\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colspan=\"6\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eStart\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEnd\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eStart\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEnd\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colspan=\"3\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eENBDON_03995\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eA0A098BFN3-A0A098BFN3_9NOCA-alkB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e97.389\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e383\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e383\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e383\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e761\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e97.38\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eENBDON_04319\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eA0A098BST3-A0A098BST3_9NOCA-alkB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e98.280\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e407\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e407\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e407\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e827\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e98.28\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eENBDON_05048\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eA0A866VUU8-A0A866VUU8_9NOCA-prmD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e92.793\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e111\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e113\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e111\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e9.27e-76\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e218\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e92.79\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eENBDON_05049\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eA0A866W1M4-A0A866W1M4_9NOCA-prmC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e95.664\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e369\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e368\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e369\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e739\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e95.93\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eENBDON_05050\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eA0A866W2B3-A0A866W2B3_9NOCA-prmB\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e94.813\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e347\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e18\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e347\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e347\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e680\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e94.81\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eENBDON_05051\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eB5D5P6-B5D5P6_9NOCA-prmA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e100\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e542\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e542\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e542\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1135\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e100\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec45\" class=\"Section2\"\u003e\n \u003ch2\u003e3.14 Aerobic Degradation of Aliphatic and Aromatic Hydrocarbons\u003c/h2\u003e\n \u003cp\u003ePetroleum hydrocarbons comprise a complex mixture of long-chain alkanes, benzene, toluene, xylene, and biphenyl derivatives. The microbial degradation of these hydrocarbons follows a sequential process involving uptake, oxidation, and cleavage, ultimately channeling metabolic intermediates into the tricarboxylic acid (TCA) cycle[\u003cspan class=\"CitationRef\"\u003e74\u003c/span\u003e]. The initial uptake of these hydrophobic compounds is facilitated by biosurfactant production or specialized transport mechanisms that enhance solubility and membrane translocation. The degradation of aliphatic hydrocarbons predominantly occurs via terminal and subterminal oxidation pathways. In terminal oxidation, key genes such as \u003cem\u003ealkB\u003c/em\u003e, \u003cem\u003ealmA\u003c/em\u003e, and \u003cem\u003eladA\u003c/em\u003e encode alkane hydroxylases that catalyze the hydroxylation of terminal carbon atoms, yielding primary alcohols that are subsequently oxidized to carboxylic acids before entering the \u0026beta;-oxidation pathway. For short-chain alkanes, the \u003cem\u003eprmABCD\u003c/em\u003e gene cluster encodes propane monooxygenase, which facilitates hydroxylation via a subterminal oxidation mechanism, producing secondary alcohols that are subsequently converted to ketones and metabolized through \u0026beta;-oxidation[\u003cspan class=\"CitationRef\"\u003e75\u003c/span\u003e]. Additionally, Baeyer-Villiger monooxygenases (BVMOs) play a crucial role in metabolizing ketones and other cyclic intermediates. Gram-positive bacteria such as \u003cem\u003eRhodococcus\u003c/em\u003e employ an alternative \u0026omega;-oxidation (Finnerty) pathway for long-chain alkane degradation, generating dicarboxylic acids as intermediates.\u003c/p\u003e\n \u003cp\u003eAromatic hydrocarbon degradation follows structurally distinct metabolic pathways. Benzene, toluene, xylene, and biphenyl compounds undergo initial hydroxylation catalyzed by dioxygenases, generating dihydroxylated intermediates such as catechol, protocatechuate, and gentisate, which subsequently undergo ring cleavage[\u003cspan class=\"CitationRef\"\u003e76\u003c/span\u003e]. The \u003cem\u003eben\u003c/em\u003e gene cluster (\u003cem\u003ebenA\u003c/em\u003e, \u003cem\u003ebenB\u003c/em\u003e, \u003cem\u003ebenC\u003c/em\u003e) encodes benzoate dioxygenase, which hydroxylates benzoate to catechol or protocatechuate. Similarly, the \u003cem\u003ebph\u003c/em\u003e gene cluster (\u003cem\u003ebphA\u003c/em\u003e, \u003cem\u003ebphB\u003c/em\u003e, \u003cem\u003ebphC\u003c/em\u003e, \u003cem\u003ebphD\u003c/em\u003e) encodes biphenyl dioxygenase and associated enzymes, facilitating the oxidation of biphenyl into hydroxylated intermediates. The \u003cem\u003exylC\u003c/em\u003e gene encodes benzyl alcohol dehydrogenase, a key enzyme in toluene and xylene degradation, catalyzing the oxidation of benzyl alcohol derivatives into their corresponding aldehydes. The central pathways for aromatic hydrocarbon catabolism include the catechol and protocatechuate pathways, which proceed via two primary mechanisms: ortho-cleavage (\u0026beta;-ketoadipate pathway) and meta-cleavage. In the ortho-cleavage pathway, catechol (\u003cem\u003ecatABC\u003c/em\u003e) and protocatechuate (\u003cem\u003epcaG\u003c/em\u003e, \u003cem\u003epcaH\u003c/em\u003e, \u003cem\u003epcaC\u003c/em\u003e) are metabolized via the \u0026beta;-ketoadipate pathway before their subsequent assimilation into the TCA cycle. Alternatively, the gentisate pathway involves the hydroxylation of aromatic compounds into gentisate, followed by ring cleavage mediated by gentisate 1,2-dioxygenase, ultimately directing intermediates into central metabolism \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e15\u003c/span\u003e\u003cstrong\u003e).\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eThe \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e SARSHI1 strain exhibits a highly versatile hydrocarbon degradation capacity, possessing key enzymatic systems for the complete mineralization of both aliphatic and aromatic hydrocarbons. A detailed list of hydrocarbon metabolism-associated genes is provided in \u003cstrong\u003eSupplementary Information 14\u0026amp; 15.\u003c/strong\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec46\" class=\"Section2\"\u003e\n \u003ch2\u003e3.15 Secondary Metabolite Analysis\u003c/h2\u003e\n \u003cp\u003eA comprehensive antiSMASH analysis identified 19 distinct biosynthetic gene clusters (BGCs), underscoring the genomic potential for diverse secondary metabolite production \u003cstrong\u003e(\u003c/strong\u003eFig. \u003cspan class=\"InternalRef\"\u003e16\u003c/span\u003e). These BGCs encompass non-ribosomal peptide synthetases (NRPS), polyketide synthases (PKS), terpenes, ectoine, redox cofactors, and other metabolites, suggesting a role in antibiotic biosynthesis, stress adaptation, and metabolic flexibility. In Region 1.3, only one cluster, NAPAA, was identified with high similarity and is associated with the biosynthesis of \u0026epsilon;-Poly-L-lysine, an antimicrobial compound. The presence of multiple NRPS and PKS clusters (Regions 1.1, 1.5, 1.10, 1.11, 1.13, 1.15, and 1.18) suggests the potential biosynthesis of antibiotics or bioactive compounds, with Region 1.11 (PKS) specifically linked to stenothricin, a known antimicrobial agent.\u003c/p\u003e\n \u003cp\u003eIn Region 1.2, the identified betalactone BGC gene order and color patterns are distinct from existing BGCs in \u003cem\u003eRhodococcus ruber\u003c/em\u003e, suggesting the synthesis of a betalactone variant. A similar trend was observed in polyketide biosynthetic clusters, particularly Region 1.6, which harbors multiple genes with low sequence similarity (\u0026lt;\u0026thinsp;60%) to previously characterized lasso peptide BGCs, indicating the potential for novel structural variant biosynthesis. The four distinct terpene clusters (Regions 1.7, 1.9, 1.14, and 1.17) suggest possible involvement in antimicrobial activity. Notably, Regions 1.9, 1.14, and 1.17 exhibit minimal sequence similarity (\u0026lt;\u0026thinsp;50%) to known terpene BGCs, implying the presence of a divergent or modified terpene biosynthesis pathway. Region 1.16 contains redox cofactor-related genes with moderate homology (0.40\u0026ndash;0.49) to characterized BGCs, suggesting a unique oxidative metabolic function. Furthermore, Region 1.19 exhibits low similarity (0.35\u0026ndash;0.41) to known clusters, potentially indicating the biosynthesis of a butyrolactone-like compound. The consistently low sequence homology observed across multiple BGCs highlights the versatile metabolic potential of SARSHI1 for the biosynthesis of structurally distinct metabolites or novel bioactive compounds.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eOil, as a non-renewable resource, plays a crucial role in global economic stability and development. However, its extraction, transportation, refining, and disposal pose significant environmental and health risks, primarily due to contamination by toxic petroleum hydrocarbons, including polycyclic aromatic hydrocarbons (PAHs), resins, and asphaltenes [\u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e77\u003c/span\u003e]. These pollutants accumulate in ecosystems, bioaccumulate through food chains, and pose risk to humans. Traditional oil remediation methods and emerging technologies such as incineration, solvent extraction, electrical remediation, and chemical leaching mitigate visible spills but are restrained by high costs and secondary pollution[\u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e78\u003c/span\u003e]. The rate of petroleum degradation in the environment depends on factors such as oil composition, concentration, environmental conditions, and microbial community structure. For example, the half-life of low molecular weight PAHs varies between 1.5 to 5.5 weeks in soils with 1\u0026ndash;2% hydrocarbon content but increases to 2.5 to 52 weeks in soils with higher contamination levels [\u003cspan citationid=\"CR79\" class=\"CitationRef\"\u003e79\u003c/span\u003e]. Microbial bioremediation has emerged as an effective and sustainable approach. Microorganisms utilize hydrocarbons as the sole carbon and energy source and generate non-toxic end products. Advancements in multi-omics technologies have provided deeper insights into microbial metabolic pathways and the genetic adaptations that enable bacteria to degrade petroleum hydrocarbons efficiently.\u003c/p\u003e\u003cp\u003eSeveral studies have identified diverse hydrocarbon-degrading bacteria with significant metabolic capabilities. For instance, Hossain et al. isolated 26 bacterial strains capable of utilizing polycyclic aromatic hydrocarbons (PAHs) and petroleum hydrocarbons as carbon sources. Notably, \u003cem\u003ePseudomonas citronellolis\u003c/em\u003e and \u003cem\u003eComamonas thiooxydans\u003c/em\u003e harbored the highest number of hydrocarbon-degrading enzymes, including dioxygenases, monooxygenases, hydroxylases, and dehydrogenases [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. Delegan et al. conducted a complete genome analysis of previously isolated \u003cem\u003eRhodococcus opacus\u003c/em\u003e S8, identifying genes involved in alkane degradation, surfactant biosynthesis, and low-temperature adaptation. A key discovery was the strain\u0026rsquo;s ability to degrade hexadecane under oxygen-limited conditions, facilitated by the formation of bacterial micro conglomerates[\u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e80\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eIn the present study, we isolated a novel petroleum hydrocarbon-degrading strain, \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e SARSHI1, from Nacharam, Hyderabad, and performed high-precision whole-genome sequencing using hybrid sequencing. The resulting complete genome assembly, free from gaps or missing fragments, represents the first complete genome of \u003cem\u003eR. indonesiensis\u003c/em\u003e to our knowledge. The assembled genome is 5.7 Mbp with a circular chromosome and plasmid. The genome quality analysis confirmed the predominance of single-copy functional protein-coding genes, with minimal duplication. Comprehensive genome annotation revealed a functional gene abundance, particularly monooxygenases, dioxygenases, and metal resistance, crucial for strain adaptability in petroleum-contaminated environments and hydrocarbon degradation. Functional annotation revealed a limited carbohydrate metabolism and constrained methanogenesis, suggesting a genome tailored for hydrocarbon metabolism rather than broad-spectrum nutrient assimilation. Notably, SARSHI1 exhibits a broad metabolic potential for degrading a wide spectrum of aromatic hydrocarbons, including xylene, toluene, aminobenzoate, chloroalkane, and polycyclic aromatic hydrocarbons (PAHs).\u003c/p\u003e\u003cp\u003eThe CANT_HYD analysis unveiled an extensive repertoire of hydrocarbon-degrading gene families, including \u003cem\u003eAlkB, AhyA, AlmA_GroupI, LadAB,benABCD, bphFIX2 NdoBC\u003c/em\u003e, \u003cem\u003eBmoXXY, and TmoAE\u003c/em\u003e, exhibiting high sequence similarity and significant abundance. The presence of catechol (\u003cem\u003ecatA, catC, pcaJ\u003c/em\u003e) and protocatechuate (\u003cem\u003epcaG, pcaH, pcaR\u003c/em\u003e), clusters, ortho cleavage, indicating complete mineralization of aromatic hydrocarbons. In addition, an abundance of alkane hydroxylases, alkane monooxygenases (\u003cem\u003ealkG_rubA3_rdx\u003c/em\u003e), DNA repair-associated alkane metabolism, terminal oxidation (\u003cem\u003ealmA, LadA\u003c/em\u003e), subterminal oxidation (\u003cem\u003eprmA, prmB, prmC, prmD\u003c/em\u003e), and Finnerty pathway (\u003cem\u003eahpC, ahpF\u003c/em\u003e) gene clusters pinpoint the prevalence of terminal oxidation as the primary metabolic pathway in aliphatic hydrocarbons degradation. Beyond hydrocarbon degradation, the genome harbors multiple hypothetical proteins and functionally uncharacterized domains, indicating potential novel metabolic functions. The genome encodes genes associated with antibiotic biosynthesis and resistance, including β-lactam, vancomycin, and drug-metabolizing enzymes, positing a competitive ecological advantage in hydrocarbon-rich environments. Furthermore, the genome possesses biosynthetic gene clusters (BGCs) encoding secondary metabolites, which may contribute to bacterial adaptation, stress tolerance, and antimicrobial activity. Several identified BGCs exhibit low sequence similarity to known clusters, suggesting the potential for the biosynthesis of novel structural variants of existing bioactive compounds, thereby pinpointing adaptive mechanisms for survival in hydrocarbon-rich environments.\u003c/p\u003e\u003cp\u003eThis study presents the first complete genome of \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e SARSHI1, offering comprehensive insights into its extensive hydrocarbon degradation potential, genomic resilience, and adaptive mechanisms in petroleum-contaminated environments. These findings establish SARSHI1 as a promising candidate for microbial bioremediation, reinforcing its significance in sustainable solutions.\u003c/p\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003eThe study presents the first complete genome sequence of \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e SARSHI1, elucidating its extensive capacity for hydrocarbon degradation. A comprehensive genomic analysis reveals a diverse repertoire of monooxygenases, dioxygenases, and key gene clusters involved in the degradation of aliphatic and aromatic hydrocarbons. The SARSHI1 genome harbors \u003cem\u003ebph\u003c/em\u003e, \u003cem\u003eben, and xyl\u003c/em\u003e gene clusters for aromatic hydrocarbons and alkB, \u003cem\u003eLadA\u003c/em\u003e, \u003cem\u003eAlmA\u003c/em\u003e, and \u003cem\u003eprm\u003c/em\u003e clusters for aliphatic hydrocarbons degradation. The predominance of terminal oxidation pathways coupled with multiple metal resistance and detoxification mechanisms underscores strain ecological adaptability to hydrocarbon-rich environments. Furthermore, the gene clusters associated with antibiotic production and stress tolerance suggest a competitive advantage in extreme environmental conditions. Notably, the genome harbors several functionally uncharacterized domains, along with biosynthetic clusters exhibiting variations that may encode novel structural variants of secondary metabolites, further enhancing its biotechnological potential. The complete genome sequence has been deposited in GenBank under accession numbers CP180630 (chromosome) and CP180631 (plasmid). The raw sequencing reads have been submitted to the Sequence Read Archive (SRA), NCBI, under accession numbers SRX27520007 (Illumina) and SRX27520006 (ONT).\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eCONFLICTS OF INTEREST/COMPETING INTERESTS:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors affirm that there are no known competing financial interests or personal relationships that could have influenced the work reported in this paper.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eETHICS APPROVAL AND CONSENT TO PARTICIPATE:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHUMAN AND ANIMAL RIGHTS:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNo animals or humans were used in the studies that were the basis of this research.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eACKNOWLEDGMENTS:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors are thankful to Eminent Biosciences and LeGene Biosciences Pvt Ltd, Indore, India for 16S rRNA sequencing, Whole Genome Sequencing, and De novo assembly of the bacterium.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFUNDING:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe do not receive any funding for this study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAUTHOR CONTRIBUTIONS:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eS.A.U.Z.:\u003c/strong\u003e Contributed to the conceptualization, Investigation, Methodology, Sample Collection, NGS Data analysis, Validation, Visualization and original draft writing of the manuscript. S.A.U.Z. also participated in reviewing and editing the manuscript. \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eK.S.\u003c/strong\u003e: Participated in composition of the initial draft and were also involved in reviewing and editing the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eA.N.:\u003c/strong\u003e Involved in Conceptualization, Investigation, Methodology, Project administration, Supervision, Writing \u0026ndash; review \u0026amp; editing. \u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eK.M.K.\u003c/strong\u003e and \u003cstrong\u003eR.B.:\u003c/strong\u003e Conducted the investigation, provided supervision, and contributed to the review and editing of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis whole-genome project has been deposited in GenBank under the Accession: CP180630 and CP180631 (plasmid). The read sequences have been deposited under BioProject Accession: PRJNA1217105, BioSample Accession: SAMN46479200, and SRA Accession: SRX27520007 and SRX27520006. Moreover, the queries can be directed to the corresponding author for any clarifications about the study if needed.\u003c/p\u003e\n\u003cp\u003eWGS URL:https://www.ncbi.nlm.nih.gov/nuccore/CP180630\u003c/p\u003e\n\u003cp\u003ePlasmid URL: https://www.ncbi.nlm.nih.gov/nuccore/CP180631\u003c/p\u003e\n\u003cp\u003eBioProject URL: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1217105\u003c/p\u003e\n\u003cp\u003eBioSample URL: https://www.ncbi.nlm.nih.gov/biosample/SAMN46479200\u003c/p\u003e\n\u003cp\u003eSRA URL (Illumina): https://www.ncbi.nlm.nih.gov/sra/?term=SRX27520007\u003c/p\u003e\n\u003cp\u003eSRA URL (ONT): https://www.ncbi.nlm.nih.gov/sra/?term=SRX27520006\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eETHICS APPROVAL AND CONSENT TO PARTICIPATE\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCONSENT FOR PUBLICATION\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCOMPETING INTERESTS:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eA. Imam, P. K. Kanaujia, A. Ray, and S. K. Suman, \u0026ldquo;Removal of Petroleum Contaminants Through Bioremediation with Integrated Concepts of Resource Recovery: A Review,\u0026rdquo; \u003cem\u003eIndian J Microbiol\u003c/em\u003e, vol. 61, no. 3, pp. 250\u0026ndash;261, Sep. 2021, doi: 10.1007/s12088-021-00928-4.\u003c/li\u003e\n\u003cli\u003eB. Narayan Thorat and R. Kumar Sonwani, \u0026ldquo;Current technologies and future perspectives for the treatment of complex petroleum refinery wastewater: A review,\u0026rdquo; \u003cem\u003eBioresource Technology\u003c/em\u003e, vol. 355, p. 127263, Jul. 2022, doi: 10.1016/j.biortech.2022.127263.\u003c/li\u003e\n\u003cli\u003eS. Kuppusamy, N. R. Maddela, M. Megharaj, and K. Venkateswarlu, \u0026ldquo;Ecological Impacts of Total Petroleum Hydrocarbons,\u0026rdquo; in \u003cem\u003eTotal Petroleum Hydrocarbons\u003c/em\u003e, Cham: Springer International Publishing, 2020, pp. 95\u0026ndash;138. doi: 10.1007/978-3-030-24035-6_5.\u003c/li\u003e\n\u003cli\u003eY. Wei, D. Ding, K. Qu, J. Sun, and Z. Cui, \u0026ldquo;Ecological risk assessment of heavy metal pollutants and total petroleum hydrocarbons in sediments of the Bohai Sea, China,\u0026rdquo; \u003cem\u003eMarine Pollution Bulletin\u003c/em\u003e, vol. 184, p. 114218, Nov. 2022, doi: 10.1016/j.marpolbul.2022.114218.\u003c/li\u003e\n\u003cli\u003eS. Kuppusamy, N. R. Maddela, M. Megharaj, and K. Venkateswarlu, \u0026ldquo;Fate of Total Petroleum Hydrocarbons in the Environment,\u0026rdquo; in \u003cem\u003eTotal Petroleum Hydrocarbons\u003c/em\u003e, Cham: Springer International Publishing, 2020, pp. 57\u0026ndash;77. doi: 10.1007/978-3-030-24035-6_3.\u003c/li\u003e\n\u003cli\u003eD. Pal and S. Sen, \u0026ldquo;Emerging Petroleum Pollutants and Their Adverse Effects on the Environment,\u0026rdquo; in \u003cem\u003eImpact of Petroleum Waste on Environmental Pollution and its Sustainable Management Through Circular Economy\u003c/em\u003e, I. D. Behera and A. P. Das, Eds., in Environmental Science and Engineering. , Cham: Springer Nature Switzerland, 2023, pp. 103\u0026ndash;137. doi: 10.1007/978-3-031-48220-5_5.\u003c/li\u003e\n\u003cli\u003eH. Gao, M. Wu, H. Liu, Y. Xu, and Z. Liu, \u0026ldquo;Effect of petroleum hydrocarbon pollution levels on the soil microecosystem and ecological function,\u0026rdquo; \u003cem\u003eEnvironmental Pollution\u003c/em\u003e, vol. 293, p. 118511, Jan. 2022, doi: 10.1016/j.envpol.2021.118511.\u003c/li\u003e\n\u003cli\u003eS. Adipah, \u0026ldquo;Introduction of Petroleum Hydrocarbons Contaminants and its Human Effects,\u0026rdquo; \u003cem\u003eJournal of Environmental Science and Public Health\u003c/em\u003e, vol. 3, no. 1, pp. 1\u0026ndash;9, Jan. 2019.\u003c/li\u003e\n\u003cli\u003eA. K. Pandey \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Multipronged evaluation of genotoxicity in Indian petrol‐pump workers,\u0026rdquo; \u003cem\u003eEnviron and Mol Mutagen\u003c/em\u003e, vol. 49, no. 9, pp. 695\u0026ndash;707, Dec. 2008, doi: 10.1002/em.20419.\u003c/li\u003e\n\u003cli\u003eH. I. Abdel-Shafy and M. S. M. Mansour, \u0026ldquo;A review on polycyclic aromatic hydrocarbons: Source, environmental impact, effect on human health and remediation,\u0026rdquo; \u003cem\u003eEgyptian Journal of Petroleum\u003c/em\u003e, vol. 25, no. 1, pp. 107\u0026ndash;123, Mar. 2016, doi: 10.1016/j.ejpe.2015.03.011.\u003c/li\u003e\n\u003cli\u003eI. C. Ossai, A. Ahmed, A. Hassan, and F. S. Hamid, \u0026ldquo;Remediation of soil and water contaminated with petroleum hydrocarbon: A review,\u0026rdquo; \u003cem\u003eEnvironmental Technology \u0026amp; Innovation\u003c/em\u003e, vol. 17, p. 100526, Feb. 2020, doi: 10.1016/j.eti.2019.100526.\u003c/li\u003e\n\u003cli\u003eN. Das and P. Chandran, \u0026ldquo;Microbial Degradation of Petroleum Hydrocarbon Contaminants: An Overview,\u0026rdquo; \u003cem\u003eBiotechnology Research International\u003c/em\u003e, vol. 2011, pp. 1\u0026ndash;13, Sep. 2011, doi: 10.4061/2011/941810.\u003c/li\u003e\n\u003cli\u003eS. Varjani, A. Pandey, and V. N. Upasani, \u0026ldquo;Petroleum sludge polluted soil remediation: Integrated approach involving novel bacterial consortium and nutrient application,\u0026rdquo; \u003cem\u003eScience of The Total Environment\u003c/em\u003e, vol. 763, p. 142934, Apr. 2021, doi: 10.1016/j.scitotenv.2020.142934.\u003c/li\u003e\n\u003cli\u003eA. K. Bej, D. Saul, and J. Aislabie, \u0026ldquo;Cold-tolerant alkane-degrading Rhodococcus species from Antarctica,\u0026rdquo; \u003cem\u003ePolar Biology\u003c/em\u003e, vol. 23, no. 2, pp. 100\u0026ndash;105, Jan. 2000, doi: 10.1007/s003000050014.\u003c/li\u003e\n\u003cli\u003eJ. A. Viesser, M. H. Sugai-Guerios, L. C. Malucelli, M. R. Pincerati, S. G. Karp, and L. T. Maranho, \u0026ldquo;Petroleum-Tolerant Rhizospheric Bacteria: Isolation, Characterization and Bioremediation Potential,\u0026rdquo; \u003cem\u003eSci Rep\u003c/em\u003e, vol. 10, no. 1, p. 2060, Feb. 2020, doi: 10.1038/s41598-020-59029-9.\u003c/li\u003e\n\u003cli\u003eM. S. Kuyukina and I. B. Ivshina, \u0026ldquo;Bioremediation of Contaminated Environments Using Rhodococcus,\u0026rdquo; in \u003cem\u003eBiology of Rhodococcus\u003c/em\u003e, vol. 16, H. M. Alvarez, Ed., in Microbiology Monographs, vol. 16. , Cham: Springer International Publishing, 2019, pp. 231\u0026ndash;270. doi: 10.1007/978-3-030-11461-9_9.\u003c/li\u003e\n\u003cli\u003eX. Chen, G. Shan, J. Shen, F. Zhang, Y. Liu, and C. Cui, \u0026ldquo;In situ bioremediation of petroleum hydrocarbon\u0026ndash;contaminated soil: isolation and application of a Rhodococcus strain,\u0026rdquo; \u003cem\u003eInt Microbiol\u003c/em\u003e, vol. 26, no. 2, pp. 411\u0026ndash;421, Dec. 2022, doi: 10.1007/s10123-022-00305-1.\u003c/li\u003e\n\u003cli\u003eM. T. Nazari \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Rhodococcus: A promising genus of actinomycetes for the bioremediation of organic and inorganic contaminants,\u0026rdquo; \u003cem\u003eJournal of Environmental Management\u003c/em\u003e, vol. 323, p. 116220, Dec. 2022, doi: 10.1016/j.jenvman.2022.116220.\u003c/li\u003e\n\u003cli\u003eM. Kaur, V. Singh, A. Khan, K. Sharma, F. J. B. Mendoonca Junior, and A. Nayarisseri, \u0026ldquo;Navigating the genomic landscape: A deep dive into clinical genetics with deep learning,\u0026rdquo; in \u003cem\u003eDeep Learning in Genetics and Genomics\u003c/em\u003e, Elsevier, 2025, pp. 185\u0026ndash;224. doi: 10.1016/B978-0-443-27574-6.00006-0.\u003c/li\u003e\n\u003cli\u003eA. Nayarisseri \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Impact of Next-Generation Whole-Exome sequencing in molecular diagnostics,\u0026rdquo; \u003cem\u003eDrug Invention Today\u003c/em\u003e, vol. 5, no. 4, pp. 327\u0026ndash;334, Dec. 2013, doi: 10.1016/j.dit.2013.07.005.\u003c/li\u003e\n\u003cli\u003eH. Tilgner \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events,\u0026rdquo; \u003cem\u003eNat Biotechnol\u003c/em\u003e, vol. 33, no. 7, pp. 736\u0026ndash;742, Jul. 2015, doi: 10.1038/nbt.3242.\u003c/li\u003e\n\u003cli\u003eS. Oikonomopoulos, Y. C. Wang, H. Djambazian, D. Badescu, and J. Ragoussis, \u0026ldquo;Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations,\u0026rdquo; \u003cem\u003eSci Rep\u003c/em\u003e, vol. 6, no. 1, p. 31602, Aug. 2016, doi: 10.1038/srep31602.\u003c/li\u003e\n\u003cli\u003eN. De Maio \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes,\u0026rdquo; \u003cem\u003eMicrobial Genomics\u003c/em\u003e, vol. 5, no. 9, Sep. 2019, doi: 10.1099/mgen.0.000294.\u003c/li\u003e\n\u003cli\u003eY. Lu \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Hybrid Clustering of Long and Short-read for Improved Metagenome Assembly,\u0026rdquo; Jan. 26, 2021. doi: 10.1101/2021.01.25.428115.\u003c/li\u003e\n\u003cli\u003eD. Das \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Complete genome sequence analysis of Pseudomonas aeruginosa N002 reveals its genetic adaptation for crude oil degradation,\u0026rdquo; \u003cem\u003eGenomics\u003c/em\u003e, vol. 105, no. 3, pp. 182\u0026ndash;190, Mar. 2015, doi: 10.1016/j.ygeno.2014.12.006.\u003c/li\u003e\n\u003cli\u003eM. S. Hossain, B. Iken, and R. Iyer, \u0026ldquo;Whole genome analysis of 26 bacterial strains reveals aromatic and hydrocarbon degrading enzymes from diverse environmental soil samples,\u0026rdquo; \u003cem\u003eSci Rep\u003c/em\u003e, vol. 14, no. 1, p. 30685, Dec. 2024, doi: 10.1038/s41598-024-78564-3.\u003c/li\u003e\n\u003cli\u003eA. Nayarisseri, P. Singh, and S. K. Singh, \u0026ldquo;Screening, isolation and characterization of biosurfactant-producing Bacillus tequilensis strain ANSKLAB04 from brackish river water,\u0026rdquo; \u003cem\u003eInt. J. Environ. Sci. Technol.\u003c/em\u003e, vol. 16, no. 11, pp. 7103\u0026ndash;7112, Nov. 2019, doi: 10.1007/s13762-018-2089-9.\u003c/li\u003e\n\u003cli\u003eKrishnan and A. Nayarisseri, \u0026ldquo;Biodegradation effects of o-cresol by Pseudomonas monteilii SHY on mustard seed germination,\u0026rdquo; \u003cem\u003eBioinformation\u003c/em\u003e, vol. 14, no. 06, pp. 271\u0026ndash;278, Jun. 2018, doi: 10.6026/97320630014271.\u003c/li\u003e\n\u003cli\u003eA. Nayarisseri, R. Khandelwal, and S. K. Singh, \u0026ldquo;Identification and Characterization of Lipopeptide Biosurfactant Producing Microbacterium sp Isolated from Brackish River Water,\u0026rdquo; \u003cem\u003eCTMC\u003c/em\u003e, vol. 20, no. 24, pp. 2221\u0026ndash;2234, Nov. 2020, doi: 10.2174/1568026620666200628144716.\u003c/li\u003e\n\u003cli\u003eM. Mohan, S. Kozhithodi, and A. Nayarisseri, \u0026ldquo;Screening, Purification and Characterization of Protease Inhibitor from Capsicum frutescens,\u0026rdquo; \u003cem\u003eBioinformation\u003c/em\u003e, vol. 14, no. 06, pp. 285\u0026ndash;293, Jun. 2018, doi: 10.6026/97320630014285.\u003c/li\u003e\n\u003cli\u003eA. Nayarisseri \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;IDENTIFICATION AND CHARACTERIZATION OF NEUTRAL PROTEASE PRODUCING Paenibacillus Polymyxa SPECIES EMBS024 BY 16S rRNA GENE SEQUENCING,\u0026rdquo; \u003cem\u003eInt J of Micr Res\u003c/em\u003e, vol. 4, no. 5, pp. 236\u0026ndash;239, Jun. 2012, doi: 10.9735/0975-5276.4.5.236-239.\u003c/li\u003e\n\u003cli\u003eA. Ravi \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Characterization of petroleum degrading bacteria and its optimization conditions on effective utilization of petroleum hydrocarbons,\u0026rdquo; \u003cem\u003eMicrobiological Research\u003c/em\u003e, vol. 265, p. 127184, Dec. 2022, doi: 10.1016/j.micres.2022.127184.\u003c/li\u003e\n\u003cli\u003eA. Nayarisseri, A. Suppahia, A. G. Nadh, and A. S. Nair, \u0026ldquo;Identification and Characterization of a Pesticide Degrading Flavobacterium Species EMBS0145 by 16S rRNA Gene Sequencing,\u0026rdquo; \u003cem\u003eInterdiscip Sci Comput Life Sci\u003c/em\u003e, vol. 7, no. 2, pp. 93\u0026ndash;99, Jun. 2015, doi: 10.1007/s12539-015-0016-z.\u003c/li\u003e\n\u003cli\u003eA. Nayarisseri and S. K. Singh, \u0026ldquo;Genome analysis of biosurfactant producing bacterium, Bacillus tequilensis,\u0026rdquo; \u003cem\u003ePLoS ONE\u003c/em\u003e, vol. 18, no. 6, p. e0285994, Jun. 2023, doi: 10.1371/journal.pone.0285994.\u003c/li\u003e\n\u003cli\u003eQiagen, \u0026ldquo;Qiagen. (2020). QIAamp DNA Mini Kit (Catalog No. 51304). Qiagen. Available at:,\u0026rdquo; 2020, [Online]. Available: https://www.qiagen.com/us/products/dna-analysis/dna-purification/qiamp-dna-mini-kit\u003c/li\u003e\n\u003cli\u003eDate,L.E, \u0026ldquo;Date, L. E. Thermo Scientific CloneJET PCR Cloning Kit.,\u0026rdquo; 2017.\u003c/li\u003e\n\u003cli\u003eP. Amareshwari \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Isolation and characterization of a novel chlorpyrifos degrading flavobacterium species EMBS0145 by 16S rRNA gene sequencing,\u0026rdquo; \u003cem\u003eInterdiscip Sci Comput Life Sci\u003c/em\u003e, vol. 7, no. 1, pp. 1\u0026ndash;6, Mar. 2015, doi: 10.1007/s12539-012-0207-9.\u003c/li\u003e\n\u003cli\u003eA. Ns, S. Mk, M. Yadav, and J. K, \u0026ldquo;IDENTIFICATION AND CHARACTERIZATION OF PROTEASES AND AMYLASES PRODUCING Bacillus licheniformis STRAIN EMBS026 BY 16S rRNA GENE SEQUENCING,\u0026rdquo; \u003cem\u003eInt J of Micr Res\u003c/em\u003e, vol. 4, no. 5, pp. 231\u0026ndash;235, Jun. 2012, doi: 10.9735/0975-5276.4.5.231-235.\u003c/li\u003e\n\u003cli\u003e\u0026ldquo;GeneJET Gel Extraction Kit,\u0026rdquo; 2015, [Online]. Available: https://www.thermofisher.com/order/catalog/product/K0691\u003c/li\u003e\n\u003cli\u003e\u0026ldquo;DNA Baser v5.15(),\u0026rdquo; 2022, [Online]. Available: DNA Baser v5.15(2022), SciVance Technologies, www.DnaBaser.com\u003c/li\u003e\n\u003cli\u003eH. Chandok \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Screening, Isolation and Identification of Probiotic Producing Lactobacillus acidophilus Strains EMBS081 \u0026amp; EMBS082 by 16S rRNA Gene Sequencing,\u0026rdquo; \u003cem\u003eInterdiscip Sci Comput Life Sci\u003c/em\u003e, vol. 7, no. 3, pp. 242\u0026ndash;248, Sep. 2015, doi: 10.1007/s12539-015-0002-5.\u003c/li\u003e\n\u003cli\u003eP. Rice, I. Longden, and A. Bleasby, \u0026ldquo;EMBOSS: The European Molecular Biology Open Software Suite,\u0026rdquo; \u003cem\u003eTrends in Genetics\u003c/em\u003e, vol. 16, no. 6, pp. 276\u0026ndash;277, Jun. 2000, doi: 10.1016/S0168-9525(00)02024-2.\u003c/li\u003e\n\u003cli\u003eM. Bhatia, A. Girdhar, A. Tiwari, and A. Nayarisseri, \u0026ldquo;Implications of a novel Pseudomonas species on low density polyethylene biodegradation: an in vitro to in silico approach,\u0026rdquo; \u003cem\u003eSpringerPlus\u003c/em\u003e, vol. 3, no. 1, p. 497, Dec. 2014, doi: 10.1186/2193-1801-3-497.\u003c/li\u003e\n\u003cli\u003eA. G. Nadh \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Identification of Azo Dye Degrading Sphingomonas Strain EMBS022 and EMBS023 Using 16S rRNA Gene Sequencing,\u0026rdquo; \u003cem\u003eCBIO\u003c/em\u003e, vol. 10, no. 5, pp. 599\u0026ndash;605, Nov. 2015, doi: 10.2174/1574893610666151008012312.\u003c/li\u003e\n\u003cli\u003eA. N. Pyde \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Identification and characterization of foodborne pathogen Listeria monocytogenes strain Pyde1 and Pyde2 using 16S rRNA gene sequencing,\u0026rdquo; \u003cem\u003eJournal of Pharmacy Research\u003c/em\u003e, vol. 6, no. 7, pp. 736\u0026ndash;741, Jul. 2013, doi: 10.1016/j.jopr.2013.07.009.\u003c/li\u003e\n\u003cli\u003eK. P. Shah, K. H. Chandok, P. Rathore, M. V. Sharma, M. Yadav, and S. A. Nayarisseri, \u0026ldquo;Screening, Isolation and Identification of Polygalacturonase Producing Bacillus tequilensis Strain EMBS083 Using 16S rRNA Gene Sequencing,\u0026rdquo; 2013.\u003c/li\u003e\n\u003cli\u003eK. Sharma, A. Nayarisseri, and S. K. Singh, \u0026ldquo;Biodegradation of plasticizers by novel strains of bacteria isolated from plastic waste near Juhu Beach, Mumbai, India,\u0026rdquo; \u003cem\u003eSci Rep\u003c/em\u003e, vol. 14, no. 1, p. 30824, Dec. 2024, doi: 10.1038/s41598-024-81239-8.\u003c/li\u003e\n\u003cli\u003eK. Venkatesh, D. Lajwanti, Sandhya. P. Kiran, D. V. Raje, and A. Nayarisseri, \u0026ldquo;Differentially expressed genes in tumors of prostate cancer in American patients with European and African origin,\u0026rdquo; \u003cem\u003eJournal of Pharmacy Research\u003c/em\u003e, vol. 6, no. 5, pp. 583\u0026ndash;588, May 2013, doi: 10.1016/j.jopr.2013.04.036.\u003c/li\u003e\n\u003cli\u003eAgilent Technologies, \u0026ldquo;4200 TapeStation System,\u0026rdquo; 2015, [Online]. Available: https://www.agilent.com/en/product/automated-electrophoresis/tapestation-systems/tapestation-instruments/4200-tapestation-system-228263\u003c/li\u003e\n\u003cli\u003eS. K. Pradhan \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Illumina MiSeq based assessment of bacterial community structure and diversity along the heavy metal concentration gradient in Sukinda chromite mine area soils, India,\u0026rdquo; \u003cem\u003eEcological Genetics and Genomics\u003c/em\u003e, vol. 15, p. 100054, May 2020, doi: 10.1016/j.egg.2020.100054.\u003c/li\u003e\n\u003cli\u003eN. Versmessen \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Average Nucleotide Identity and Digital DNA-DNA Hybridization Analysis Following PromethION Nanopore-Based Whole Genome Sequencing Allows for Accurate Prokaryotic Typing,\u0026rdquo; \u003cem\u003eDiagnostics\u003c/em\u003e, vol. 14, no. 16, p. 1800, Aug. 2024, doi: 10.3390/diagnostics14161800.\u003c/li\u003e\n\u003cli\u003eANDREWS, S, \u0026ldquo;Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data.,\u0026rdquo; 2010, [Online]. Available: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/\u003c/li\u003e\n\u003cli\u003eA. M. Bolger, M. Lohse, and B. Usadel, \u0026ldquo;Trimmomatic: a flexible trimmer for Illumina sequence data,\u0026rdquo; \u003cem\u003eBioinformatics\u003c/em\u003e, vol. 30, no. 15, pp. 2114\u0026ndash;2120, Aug. 2014, doi: 10.1093/bioinformatics/btu170.\u003c/li\u003e\n\u003cli\u003eA. Bankevich \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing,\u0026rdquo; \u003cem\u003eJournal of Computational Biology\u003c/em\u003e, vol. 19, no. 5, pp. 455\u0026ndash;477, May 2012, doi: 10.1089/cmb.2012.0021.\u003c/li\u003e\n\u003cli\u003eWick, R, \u0026ldquo;Wick, R. R. Filtlong: Read Trimming and Filtering Tool for Long Reads. GitHub Repository.,\u0026rdquo; 2021, [Online]. Available: https://github.com/rrwick/Filtlong\u003c/li\u003e\n\u003cli\u003eR. Wick, \u0026ldquo;Wick, R. R. Porechop: Adapter Trimmer for Oxford Nanopore Reads. GitHub Repository.,\u0026rdquo; 2021, [Online]. Available: https://github.com/rrwick/Porechop.\u003c/li\u003e\n\u003cli\u003eR. R. Wick, L. M. Judd, C. L. Gorrie, and K. E. Holt, \u0026ldquo;Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads,\u0026rdquo; \u003cem\u003ePLoS Comput Biol\u003c/em\u003e, vol. 13, no. 6, p. e1005595, Jun. 2017, doi: 10.1371/journal.pcbi.1005595.\u003c/li\u003e\n\u003cli\u003eA. Gurevich, V. Saveliev, N. Vyahhi, and G. Tesler, \u0026ldquo;QUAST: quality assessment tool for genome assemblies,\u0026rdquo; \u003cem\u003eBioinformatics\u003c/em\u003e, vol. 29, no. 8, pp. 1072\u0026ndash;1075, Apr. 2013, doi: 10.1093/bioinformatics/btt086.\u003c/li\u003e\n\u003cli\u003eR. R. Wick, M. B. Schultz, J. Zobel, and K. E. Holt, \u0026ldquo;Bandage: interactive visualization of \u003cem\u003ede novo\u003c/em\u003e genome assemblies,\u0026rdquo; \u003cem\u003eBioinformatics\u003c/em\u003e, vol. 31, no. 20, pp. 3350\u0026ndash;3352, Oct. 2015, doi: 10.1093/bioinformatics/btv383.\u003c/li\u003e\n\u003cli\u003eO. Schwengers, L. Jelonek, M. A. Dieckmann, S. Beyvers, J. Blom, and A. Goesmann, \u0026ldquo;Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification: Find out more about Bakta, the motivation, challenges and applications, here.,\u0026rdquo; \u003cem\u003eMicrobial Genomics\u003c/em\u003e, vol. 7, no. 11, Nov. 2021, doi: 10.1099/mgen.0.000685.\u003c/li\u003e\n\u003cli\u003eC. P. Cantalapiedra, A. Hern\u0026aacute;ndez-Plaza, I. Letunic, P. Bork, and J. Huerta-Cepas, \u0026ldquo;eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale,\u0026rdquo; \u003cem\u003eMolecular Biology and Evolution\u003c/em\u003e, vol. 38, no. 12, pp. 5825\u0026ndash;5829, Dec. 2021, doi: 10.1093/molbev/msab293.\u003c/li\u003e\n\u003cli\u003eD. H. Parks, M. Imelfort, C. T. Skennerton, P. Hugenholtz, and G. W. Tyson, \u0026ldquo;CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes,\u0026rdquo; \u003cem\u003eGenome Res.\u003c/em\u003e, vol. 25, no. 7, pp. 1043\u0026ndash;1055, Jul. 2015, doi: 10.1101/gr.186072.114.\u003c/li\u003e\n\u003cli\u003eA. Chklovski, D. H. Parks, B. J. Woodcroft, and G. W. Tyson, \u0026ldquo;CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning,\u0026rdquo; \u003cem\u003eNat Methods\u003c/em\u003e, vol. 20, no. 8, pp. 1203\u0026ndash;1212, Aug. 2023, doi: 10.1038/s41592-023-01940-w.\u003c/li\u003e\n\u003cli\u003eM. Shaffer \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;DRAM for distilling microbial metabolism to automate the curation of microbiome function,\u0026rdquo; \u003cem\u003eNucleic Acids Research\u003c/em\u003e, vol. 48, no. 16, pp. 8883\u0026ndash;8900, Sep. 2020, doi: 10.1093/nar/gkaa621.\u003c/li\u003e\n\u003cli\u003eF. A. Sim\u0026atilde;o, R. M. Waterhouse, P. Ioannidis, E. V. Kriventseva, and E. M. Zdobnov, \u0026ldquo;BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs,\u0026rdquo; \u003cem\u003eBioinformatics\u003c/em\u003e, vol. 31, no. 19, pp. 3210\u0026ndash;3212, Oct. 2015, doi: 10.1093/bioinformatics/btv351.\u003c/li\u003e\n\u003cli\u003eC. Jain, L. M. Rodriguez-R, A. M. Phillippy, K. T. Konstantinidis, and S. Aluru, \u0026ldquo;High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries,\u0026rdquo; \u003cem\u003eNat Commun\u003c/em\u003e, vol. 9, no. 1, p. 5114, Nov. 2018, doi: 10.1038/s41467-018-07641-9.\u003c/li\u003e\n\u003cli\u003eJ. P. Meier-Kolthoff, A. F. Auch, H.-P. Klenk, and M. G\u0026ouml;ker, \u0026ldquo;Genome sequence-based species delimitation with confidence intervals and improved distance functions,\u0026rdquo; \u003cem\u003eBMC Bioinformatics\u003c/em\u003e, vol. 14, no. 1, p. 60, Dec. 2013, doi: 10.1186/1471-2105-14-60.\u003c/li\u003e\n\u003cli\u003eJ. P. Meier-Kolthoff, J. S. Carbasse, R. L. Peinado-Olarte, and M. G\u0026ouml;ker, \u0026ldquo;TYGS and LPSN: a database tandem for fast and reliable genome-based classification and nomenclature of prokaryotes,\u0026rdquo; \u003cem\u003eNucleic Acids Research\u003c/em\u003e, vol. 50, no. D1, pp. D801\u0026ndash;D807, Jan. 2022, doi: 10.1093/nar/gkab902.\u003c/li\u003e\n\u003cli\u003eJ. P. Meier-Kolthoff and M. G\u0026ouml;ker, \u0026ldquo;TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy,\u0026rdquo; \u003cem\u003eNat Commun\u003c/em\u003e, vol. 10, no. 1, p. 2182, May 2019, doi: 10.1038/s41467-019-10210-3.\u003c/li\u003e\n\u003cli\u003eV. Khot \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;CANT-HYD: A Curated Database of Phylogeny-Derived Hidden Markov Models for Annotation of Marker Genes Involved in Hydrocarbon Degradation,\u0026rdquo; \u003cem\u003eFront. Microbiol.\u003c/em\u003e, vol. 12, p. 764058, Jan. 2022, doi: 10.3389/fmicb.2021.764058.\u003c/li\u003e\n\u003cli\u003eJ. Rojas-Vargas, H. G. Castel\u0026aacute;n-S\u0026aacute;nchez, and L. Pardo-L\u0026oacute;pez, \u0026ldquo;HADEG: A Curated Hydrocarbon Aerobic Degradation Enzymes and Genes Database,\u0026rdquo; Sep. 01, 2022, \u003cem\u003eBioinformatics\u003c/em\u003e. doi: 10.1101/2022.08.30.505856.\u003c/li\u003e\n\u003cli\u003eS. Wang \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;HMDB: A curated database of genes involved in hydrocarbon monooxygenation reaction with homologous genes as background,\u0026rdquo; \u003cem\u003eJournal of Hazardous Materials\u003c/em\u003e, vol. 460, p. 132397, Oct. 2023, doi: 10.1016/j.jhazmat.2023.132397.\u003c/li\u003e\n\u003cli\u003eK. Blin \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline,\u0026rdquo; \u003cem\u003eNucleic Acids Research\u003c/em\u003e, vol. 47, no. W1, pp. W81\u0026ndash;W87, Jul. 2019, doi: 10.1093/nar/gkz310.\u003c/li\u003e\n\u003cli\u003eM. Binazadeh, I. A. Karimi, and Z. Li, \u0026ldquo;Fast biodegradation of long chain n-alkanes and crude oil at high concentrations with Rhodococcus sp. Moj-3449,\u0026rdquo; \u003cem\u003eEnzyme and Microbial Technology\u003c/em\u003e, vol. 45, no. 3, pp. 195\u0026ndash;202, Sep. 2009, doi: 10.1016/j.enzmictec.2009.06.001.\u003c/li\u003e\n\u003cli\u003eT. Kawagoe, K. Kubota, K. S. Araki, and M. Kubo, \u0026ldquo;Analysis of the Alkane Hydroxylase Gene and Long-Chain Cyclic Alkane Degradation in \u0026amp;lt;i\u0026amp;gt;Rhodococcus\u0026amp;lt;/i\u0026amp;gt;,\u0026rdquo; \u003cem\u003eAiM\u003c/em\u003e, vol. 09, no. 03, pp. 151\u0026ndash;163, 2019, doi: 10.4236/aim.2019.93012.\u003c/li\u003e\n\u003cli\u003eA. Krivoruchko, M. Kuyukina, T. Peshkur, C. J. Cunningham, and I. Ivshina, \u0026ldquo;Rhodococcus Strains from the Specialized Collection of Alkanotrophs for Biodegradation of Aromatic Compounds,\u0026rdquo; \u003cem\u003eMolecules\u003c/em\u003e, vol. 28, no. 5, p. 2393, Mar. 2023, doi: 10.3390/molecules28052393.\u003c/li\u003e\n\u003cli\u003eD. Cerqueda-Garc\u0026iacute;a, J. Q. Garc\u0026iacute;a-Maldonado, L. Aguirre-Macedo, and U. Garc\u0026iacute;a-Cruz, \u0026ldquo;A succession of marine bacterial communities in batch reactor experiments during the degradation of five different petroleum types,\u0026rdquo; \u003cem\u003eMarine Pollution Bulletin\u003c/em\u003e, vol. 150, p. 110775, Jan. 2020, doi: 10.1016/j.marpolbul.2019.110775.\u003c/li\u003e\n\u003cli\u003eB. Z. Fathepure, \u0026ldquo;Recent studies in microbial degradation of petroleum hydrocarbons in hypersaline environments,\u0026rdquo; \u003cem\u003eFront. Microbiol.\u003c/em\u003e, vol. 5, Apr. 2014, doi: 10.3389/fmicb.2014.00173.\u003c/li\u003e\n\u003cli\u003eM. I. Roslund \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Endocrine disruption and commensal bacteria alteration associated with gaseous and soil PAH contamination among daycare children,\u0026rdquo; \u003cem\u003eEnvironment International\u003c/em\u003e, vol. 130, p. 104894, Sep. 2019, doi: 10.1016/j.envint.2019.06.004.\u003c/li\u003e\n\u003cli\u003eY. Delegan \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Complete Genome Analysis of Rhodococcus opacus S8 Capable of Degrading Alkanes and Producing Biosurfactant Reveals Its Genetic Adaptation for Crude Oil Decomposition,\u0026rdquo; \u003cem\u003eMicroorganisms\u003c/em\u003e, vol. 10, no. 6, p. 1172, Jun. 2022, doi: 10.3390/microorganisms10061172.\u003cstrong\u003e\u003c/strong\u003e\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Petroleum hydrocarbon-degrading bacteria, Bioremediation, and degradation, Whole Genome Sequencing, Oxford Nanopore, Illumina, Genome Annotation, Petroleum hydrocarbon-degrading genes","lastPublishedDoi":"10.21203/rs.3.rs-6309542/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6309542/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003ePetroleum contamination presents a significant environmental challenge, contributing to soil and water pollution. Bioremediation provides a sustainable and cost-effective approach. In this study, we isolated and characterized a novel petroleum-degrading strain, \u003cem\u003eRhodococcus indonesiensis\u003c/em\u003e SARSHI1. Whole-genome sequencing of SARSHI1 was conducted using a hybrid sequencing approach, integrating Oxford Nanopore Technologies (ONT) (PromethION) and Illumina (NovaSeq 6000) platforms. The complete genome of SARSHI1 comprises 5.7 Mbp, along with a plasmid of 159,118 bp, encoding a total of 5,150 coding sequences (CDS). The genome consists of 5,695,289 base pairs, with 5,220 identified genes comprising 5,094 protein-coding genes. Additionally, it contains 12 ribosomal RNA (rRNA) genes, 55 transfer RNA (tRNA) genes, one non-coding RNA, one CRISPR array, 56 pseudogenes, and 243 hypothetical proteins. The raw reads obtained were 13,900,477 from Illumina and 2,539,063 from ONT, with processed reads of 13,169,190 and 1,567,736, respectively. Genome assembly achieved 100% completeness, confirming the reconstruction of a fully intact genome without missing sequences. A total of 570 single-copy marker genes were identified, resulting in a coding density of 91.4%. Functional annotation and comparative genomic analysis revealed key genes associated with hydrocarbon degradation, including \u003cem\u003ealkB\u003c/em\u003e, \u003cem\u003eahyA\u003c/em\u003e, and \u003cem\u003ealmA\u003c/em\u003e (Group I) families for long-chain alkane degradation, as well as \u003cem\u003ebph\u003c/em\u003e, \u003cem\u003eben\u003c/em\u003e, and \u003cem\u003exylC\u003c/em\u003e clusters for aromatic hydrocarbon degradation under aerobic conditions. Additionally, multiple antibiotic resistance genes, including those conferring resistance to beta-lactams, were identified. Secondary metabolite analysis identified 19 distinct biosynthetic gene clusters (BGCs), encoding variants of known compounds, highlighting the genomic potential for diverse secondary metabolite production. The complete genome sequence has been deposited in GenBank under accession numbers CP180630 (chromosome) and CP180631 (plasmid). The raw sequencing reads have been submitted to the Sequence Read Archive (SRA), NCBI, under accession numbers SRX27520007 (Illumina) and SRX27520006 (ONT).\u003c/p\u003e","manuscriptTitle":"Whole Genome of Petroleum Hydrocarbon Degrading Rhodococcus indonesiensis isolated from Nacharam, Hyderabad, India","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-26 15:19:39","doi":"10.21203/rs.3.rs-6309542/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-10-01T14:59:43+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-27T13:09:56+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-09-27T07:19:15+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"302785479970294721280440913358389795656","date":"2025-09-18T14:38:10+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"201081764611319316471686762994656304279","date":"2025-09-17T16:08:58+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"237961423381617417705115907578494997233","date":"2025-09-17T10:06:42+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"19707826083423662864354326221773939747","date":"2025-09-17T10:00:36+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-09-17T09:38:39+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-03-26T11:43:11+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-03-26T11:42:48+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-03-26T07:04:54+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a955aa0a-fa53-4439-b6e7-ea58752ecd22","owner":[],"postedDate":"September 26th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":55070119,"name":"Biological sciences/Biological techniques"},{"id":55070120,"name":"Biological sciences/Biotechnology"},{"id":55070121,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":55070122,"name":"Biological sciences/Microbiology"}],"tags":[],"updatedAt":"2025-12-08T16:11:25+00:00","versionOfRecord":{"articleIdentity":"rs-6309542","link":"https://doi.org/10.1038/s41598-025-28934-2","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2025-12-02 15:57:12","publishedOnDateReadable":"December 2nd, 2025"},"versionCreatedAt":"2025-09-26 15:19:39","video":"","vorDoi":"10.1038/s41598-025-28934-2","vorDoiUrl":"https://doi.org/10.1038/s41598-025-28934-2","workflowStages":[]},"version":"v1","identity":"rs-6309542","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6309542","identity":"rs-6309542","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00