Annotated Quranic Qira’at Dataset: AQQD v1.0

doi:10.21203/rs.3.rs-8804884/v1

Annotated Quranic Qira’at Dataset: AQQD v1.0

2026 · doi:10.21203/rs.3.rs-8804884/v1

preprint OA: closed

Full text JSON View at publisher

Full text 65,132 characters · extracted from preprint-html · click to expand

Annotated Quranic Qira’at Dataset: AQQD v1.0 | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Annotated Quranic Qira’at Dataset: AQQD v1.0 Linda Smail, Mohammed Lataifeh, Md Sohazur Islam Sozib, Arthur Diniz De Souza This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8804884/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract AQQD v1.0 is an open dataset of Quranic recitations annotated across multiple Qira’at to exclusively cover the ten canonical reading styles of the Quran. The dataset is developed to support a wide range of machine learning and audio analysis frameworks by providing carefully selected audio samples with rich, structured annotations suitable for classification, representation learning, and interpretability-focused modeling. The dataset encompasses recordings from 308 reciters, covering 70 Quranic Surahs segmented into representative verses, with each segment recited in 6–8 distinct Qira’at styles. Each audio file is accompanied by structured metadata embedded within its filename, indicating the reciter, recitation style, surah, ayah, and clip number. The first version of AQQD v1.0 fills a critical gap in computational Quranic studies by bridging traditional Qira’at scholarship with modern machine learning. It provides a high-quality and interpretable resource for researchers in audio processing, Islamic studies, and educational technology, contributing to the analysis and preservation of the diverse recitation heritage. Figures Figure 1 Figure 2 Background & Summary While the Holy Quran is commonly encountered today in written form, the science of qira’at is fundamentally grounded in oral transmission. The Quran was revealed to the Prophet Muhammad (peace be upon him) as recited speech, and its preservation and propagation have historically depended on direct auditory transmission (mushafahah) from teacher to student through rigorously authenticated chains of narration (isnad) (Denny, 1989 ). The written muṣḥaf serves primarily as a mnemonic and referential aid; authoritative knowledge of recitation resides in sound rather than script. As emphasized by Nelson ( 1985 ), the Quran is not fully realized as Quran unless it is heard, since its acoustic realization, intonation, rhythm, articulation, and timing, is integral to both its meaning and religious function. This oral foundation gave rise to multiple valid modes of recitation, known as qira’at, which represent distinct but canonically accepted schools of pronunciation, vocalization, and limited textual variation. These recitation styles trace their origins to the earliest period of Islam and are understood within the tradition as divinely sanctioned accommodations to the linguistic diversity of early Arabic-speaking communities (Nelson, 1985 ; Nasser, 2013 ; Al-Imam, 2006 ). Over time, the system evolved from an early phase of individual pedagogical choice (ikhtiyar) into a formalized scholarly discipline. Ibn Mujahid (d. 936 CE) canonized seven readings in Kitab al-Sabʿa , establishing rigorous criteria for authenticity, including conformity with the rasm (consonantal text), sound linguistic basis, and reliable transmission (Shah, 2020 ). This canon was later expanded by Ibn al-Jazari (d. 1429 CE), who validated three additional readings meeting the same criteria, resulting in the ten canonical qira’at recognized in Sunni orthodoxy today (Tlili, 2021 ) as it can be seen from Fig. 1 below. From a technical perspective, differences among the qira’at are traditionally classified into two categories: usul (general principles) and farsh (specific lexical variants). Usul refer to consistent phonological rules applied throughout the Quran that define the acoustic “dialect” of a given reading. These include: vowel inclination ( imala ), such as the tilting of /a/ toward /i/ employed prominently in the readings of Ḥamza and al-Kisaʾi (Nasser, 2013 ); consonant assimilation ( idgham ), notably idgham kabir in the reading of Abu ʿAmr, where consonants merge across word boundaries (Nasser, 2013 ); systematic variation in vowel lengthening ( madd ), with reciters such as Warsh and Ḥamza employing longer durations ( ṭūl ) (Nelson, 1985 ); differences in the realization of the glottal stop ( hamzah ), which is frequently softened or elided in the Warsh tradition (Tlili, 2021 ). Farsh , by contrast, concerns discrete word-level variations that occur at specific locations in the text and may affect grammatical structure or nuance without altering theological meaning. For example, in Qurʾan 2:37, Ibn Kathir reads “Adam” in the accusative and “kalimat” in the nominative, reversing the syntactic roles found in other readings. Other cases involve subtle differences in letter realization, such as reading nunsheruha (“We resurrect them”) versus nunshizuha (“We raise them up”), reflecting dialectal variation preserved within the canonical tradition (Nasser, 2013 ). Crucially, both usul and farsh distinctions are defined, preserved, and authenticated through sound, not through written representation. Their accurate transmission depends on direct auditory instruction and memorization, reinforcing the fundamentally oral nature of the qira’at tradition. Therefore, the critical value of this work is to provide a standardized, public, and comprehensive dataset to cover eight of the ten recitation styles linked fundamentally to the oral nature of the Quran. To the best of our knowledge, AQQD v1.0 is the first comprehensive annotated dataset featuring multiple canonical Quranic Qira’at. No prior open dataset has captured these authentic style variations at this scale (cf. Lataifeh & Elnagar, 2020 ), filling a vital gap for both Islamic scholarship and speech technology research. The dataset allows researchers to train and evaluate machine learning models for recitation style classification, study acoustic and phonetic differences between Qira’at, and develop educational tools (e.g., applications that identify a recitation style or correct style-specific mistakes). It thus opens the door for data-driven analysis of Qira’at that was previously only possible through expert knowledge. Each recording is accompanied by structured metadata embedded in its filename, specifying the reciter, recitation style, surah, ayah, and clip number. This organization enables systematic identification and comparison of Qira’at variations, supporting interpretable feature extraction and Bayesian modeling in future stages. With about 308 reciters of various backgrounds, the dataset captures intra-style variation and speaker diversity. This robustness improves generalization of AI models and ensures that conclusions drawn (for example, which features are characteristic of a style) are not tied to a single voice or recording. The breadth of content (70 surahs across different lengths and themes) further guarantees a wide coverage of linguistic contexts, enhancing the dataset’s utility for numerous applications. Methods Audio data were collected from publicly available, trusted online sources, including official pages of certified Quran reciters, the Midad Quran Audio repository (Midad, 2025), the Holy Quran Recitaion Archive-MP3Quran website (MP3Quran, 2025), and verified reciters’ YouTube channels. Recordings were curated, segmented, and standardized to ensure consistent quality and coverage across Qira’at styles. All selections were manually reviewed to confirm authenticity and correct labeling of each recitation style. Primary data sourcing, updating, and comparative validation were carried out at Zayed University, United Arab Emirates, which served as the central hub for collecting, organizing, and verifying recordings from all sources. The audio clips were uniformly down-sampled or up-sampled as needed to a 44.1 kHz sampling rate and 16-bit depth, and converted to single-channel (mono) format to eliminate any channel-dependent variability arising from heterogeneous recording conditions. Each recording was then trimmed or padded to ensure a duration between 6 and 17 seconds, eliminating overly short or excessively long segments to maintain consistency across samples. Importantly, all audio segmentation and editing were performed using Audacity software (Audacity, 2025) for precise control over timing and quality. No additional post-processing (such as noise reduction or normalization beyond the initial standardization) was applied, to preserve the natural characteristics of each recitation. The methods described here focused on curating the raw audio data; no separate transcription or acoustic feature extraction is included in this first version of the dataset. Data Record The AQQD v1.0 dataset has been deposited in the Harvard Dataverse repository (Dataverse “AQQD”) under the CC0 1.0 Public Domain license and is publicly accessible via its permanent DOI https://doi.org/10.7910/DVN/A8GM5Y (Smail, 2026). The repository contains the full set of 23,111 audio clips in WAV format, along with accompanying documentation such as metadata description. All data files are organized in a single top-level directory. Data Overview The dataset is structured conceptually into three content tiers based on the selection strategy of Quranic passages, but these tier labels are not reflected in the directory structure or filenames. In total, AQQD v1.0 covers 70 surahs (chapters) of the Quran, each represented by one or more specific verse segments, with each selected segment recited in multiple canonical Qira’at styles. This three-tier structure ensures inclusion of both frequently recited passages and those that exemplify characteristic Qira’at differences: Tier 1 – Core Short Surahs: Complete short chapters of the Quran that are foundational in daily recitation (e.g., Al-Fatiha 1:1–7, Al-Ikhlas 112:1–4, An-Nas 114:1–6). These 19 surahs are fully included in all available Qira’at styles and serve as a consistent baseline for comparative analysis across recitation traditions. Tier 2 – Representative Long/Medium Surahs: Selected segments from 30 surahs of medium to long length, each represented by typically an opening, middle, and closing passage (e.g., Al-Kahf 18:1, 18:50, 18:100, last; Ya-Sin 36:1, 36:36–40, last). This structure captures variety in linguistic context and melodic cadence without requiring recording of entire lengthy chapters. Tier 3 – Special Qira’at Points: Verses or short passages (21 in total) known for Qira’at-specific variation, such as As-Sajdah 32:15, Al-Ahzab 33:56, and Al-Qalam 68:1–4. These selections highlight subtle phonetic or textual differences—e.g., imala (vowel tilting), handling of hamzah , or madd (vowel lengthening)—across canonical reading styles, providing high-value material for detailed comparative analysis. Across all tiers, the dataset spans 70 unique surahs out of the 114 in the Quran, encompassing both Makki and Madani chapters and a broad range of lengths and themes. Each selected passage has between six and eight Qira’at renditions, depending on the availability of qualified reciters. Canonical readings represented include Ḥafṣ ʿan ʿAṣim , Warsh ʿan Nafiʿ , Qalūn ʿan Nafiʿ , Ad-Dūri ʿan Abi ʿAmr , among others. (Some less common styles not present in v1.0 will be expanded in future dataset versions.) Table 1 provides an overview of the tiered content selection strategy in AQQD v1.0. Table 1. Overview of the tiered content-selection strategy in AQQD v1.0. Tier Description Number of Surahs/Passages Coverage Purpose Tier 1 Complete short surahs commonly used in daily recitation (e.g., Al-Fatiḥa, Al-Ikhlaṣ, An-Nas) 19 complete surahs Baseline comparison across Qira’at Tier 2 Selected segments from medium and long surahs (opening, middle, closing passages) 30 surahs (partial coverage) Linguistic and melodic diversity Tier 3 Verses with known Qira’at-specific variation (e.g., imala , hamzah , madd ) 21 targeted passages Fine-grained Qira’at analysis Each recorded recitation in AQQD v1.0 is provided as an individual audio file in uncompressed WAV format (44.1 kHz sampling rate, 16-bit depth, mono). Mono audio was used to ensure consistent acoustic features across sources, as spatial/stereo information is not relevant to Qira’at classification, phonetic analysis, or the traditional oral transmission modeling. The duration of clips ranges between 6 and 17 seconds, depending on the length of the passage. Filenames encode key metadata following a structured convention that identifies the reciter, recitation style, surah, ayah, and clip number (e.g., R012_S03_A005_C02.wav). Each file name follows a fixed schema: R[ReciterID]_Q[QiraatID]_S[SurahNumber]_A[AyahNumber]_C[ClipNumber].wav This structured naming convention allows users to extract metadata directly from filenames using automated scripts, without requiring auxiliary metadata files (Figure 2). For example, the filename R023_Q05_S036_A040_C02.wav represents Reciter 23 , Qira’at style 05 , Surah 36 ( Ya-Sin ), Ayah 40 , Clip 2. This organization enables flexible data retrieval, such as filtering all samples for a specific reciter, Qira’at style, or surah. It also ensures compatibility with standard data loaders and preprocessing tools in machine learning pipelines. No separate annotation files, transcripts, or pre-computed feature files are included in this version of the dataset; those will be added in subsequent releases. All clips were curated from trusted online repositories, standardized in format, and quality-checked to ensure clarity and consistency across sources. The current data organization emphasizes clarity and interoperability: researchers can parse the structured filenames to group or filter samples by surah, by recitation style, or by reciter as needed. Technical Validation To ensure integrity and reliability, all audio files underwent a multi-stage validation and cleaning process: Source Verification: Each recording was obtained from trusted online repositories (e.g., official reciter channels, Midad, and MP3Quran), ensuring that all sources are recognized and publicly available. Recordings were cross-checked for authenticity by confirming reciter identity and consistency with known Qira’at styles. Quality Control: Audio files were standardized to 44.1 kHz sampling rate, 16-bit depth, and mono format to eliminate channel-dependent variability from heterogeneous recording conditions and to ensure consistent acoustic feature extraction (spatial stereo information is not relevant to Qira’at analysis). Files containing significant background noise, echo, or distortion were either cleaned or excluded. Clip duration was normalized to fall between 6 and 17 seconds, eliminating overly short or excessively long segments to maintain consistency across samples. Structural Consistency Check: Automated scripts verified full compliance of each filename with the naming schema and validated that each combination of (Reciter, Qira’at, Surah, Ayah) is unique within the dataset, ensuring the absence of duplicate or conflicting identifiers. Human Review: A subset of recordings, sampled across all tiers and Qira’at styles, was manually reviewed by members of the research team to confirm labeling accuracy, clarity of pronunciation, and adherence to expected Qira’at characteristics. These validation steps collectively guarantee the technical quality of AQQD v1.0. By combining automated consistency checks with expert human auditing, we minimized the chance of mis-labeled styles or poor-quality audio in the released dataset. The rigorous standardization in format and duration further ensures that the dataset is immediately usable for machine learning pipelines without additional cleaning. Usage Notes AQQD v1.0 is designed to support a broad range of applications in computational analysis of Quranic recitation. Researchers can use the dataset to train classifiers that automatically identify the Qira’at style of a given recitation, to study acoustic and phonetic variations among canonical styles, or to develop educational software that provides feedback on recitation style and pronunciation. The structured metadata in filenames makes it straightforward to subset the data by reciter, style, or passage for specific analyses or training scenarios. For instance, one could isolate all clips of a particular Qira’at to analyze its characteristic acoustic features, or compare how different Qira’at recite the same verse. The current dataset version (v1.0) has certain limitations that users should be aware of. Notably, no textual transcripts or phonetic annotations are included with the audio files. This means that any analysis linking the audio to the Quranic text must rely on external Quran text sources, and researchers focusing on phonetics will need to manually derive or add annotations for now. Additionally, because recordings come from varied sources, there may be subtle differences in recording environments; we have standardized bitrates and removed obviously noisy samples (as described in Technical Validation) to mitigate this. Users should also note that the dataset focuses on the most prevalent 8 Qira’at styles in this release, so some of the ten canonical styles are underrepresented or absent in v1.0 (these will be addressed in future versions). Future Enhancements AQQD v1.0 is the first release in an ongoing effort. We have plans well underway for AQQD v2.0, which will significantly expand the dataset’s scope and address some limitations of the initial version. In upcoming versions (AQQD v2.0 and beyond), the dataset will incorporate controlled studio recordings with unified acoustic conditions (to complement the diverse live recordings and eliminate remaining background noise differences). In addition, we aim to include more Surahs of the Quran. This will increase the diversity of linguistic content and ensure that even more Qira’at differences (some of which might only appear in certain chapters) are captured. While v1.0 focused on the most prevalent 8 styles, v2.0 will stive to cover all ten canonical Qira’at recitations comprehensively. We may also include auxiliary data such as recordings of the same reciter reading the same passage in different styles (for those rare experts who can do so), to directly compare the exact same voice in two styles, a powerful demonstration for interpretability. This will be possible with the These enhancements will extend AQQD from a curated audio collection into a fully annotated benchmark for computational Qira’at research and classification studies. The timeline for AQQD v2.0 is within this year, and it will be released as a complement to v1.0, with versioning to distinguish the two. All users of v1.0 can thus look forward to an even richer dataset, and backward compatibility (in terms of data format and ease of merging the new data with the old) will be maintained. Our continuous expansion underscores a commitment to making this the definitive dataset for Quranic Qira’at in the AI era, useful for research, education, and preservation of the oral traditions. We welcome collaboration and feedback as we build v2.0 and beyond. Ultimately, AQQD will evolve into a living corpus, updated and improved over time to serve the community’s needs. Declarations Code Availability The metadata parsing scripts and audio processing scripts are publicly available alongside the dataset in the Harvard Dataverse repository. No proprietary or closed-source software is required to use the dataset. The provided scripts, structured metadata, and audio files are sufficient to reproduce the data preparation steps and to support basic exploration and reuse of the dataset. Data Availability The metadata parsing scripts and audio processing scripts are publicly available in the Harvard Dataverse repository alongside the AQQD v1.0 dataset. ( https://dataverse.harvard.edu/dataverse/AQQD ). The repository provides the complete set of curated audio recordings and structured metadata files required to map reciters, Qira’at styles, surahs, and ayahs to the corresponding audio samples. No proprietary or closed-source software is required. The released scripts, metadata, and audio files are sufficient to reproduce the data preparation steps and to enable basic inspection and reuse of the dataset. The dataset is released under the CC0 1.0 Public Domain license, permitting unrestricted use, distribution, and reproduction. Detailed documentation is provided in the accompanying README file. Acknowledgements The authors gratefully acknowledge the support of the Dubai Future Foundation (DFF) under the Research and Development Initiative Grant No. 2024ZUNIV-SMA-047, a project led by Zayed University. Special thanks are extended to the research assistants and students at Zayed University who contributed to data verification, segmentation, and quality control. The authors also express appreciation to the Quranic recitation community and to the creators of the Ar-DAD Arabic Diversified Audio Dataset (Lataifeh & Elnagar, 2020) for laying the groundwork that inspired this initiative. Ethics Statement This research was conducted in compliance with institutional and national ethical guidelines. All recordings in AQQD v1.0 were obtained from publicly available, verified sources featuring certified Quranic reciters. These recordings are distributed as Islamic endowment materials ( Waqf ) intended for educational and research use. No personal or sensitive data were collected. Competing Interests The authors declare that they have no known financial or personal relationships that could have influenced the work reported in this article. Funding This work was supported by the Dubai Future Foundation (DFF) under the Research and Development Initiative Grant No. 2024ZUNIV-SMA-047, led by Zayed University in collaboration with the University of Sharjah. The funder had no role in the design, data collection, analysis, interpretation, or decision to publish this dataset. References Lataifeh, M. & Elnagar, A. Arabic Diversified Audio Dataset (Ar-DAD) [Dataset]. Data in Brief 33 , 106503 (2020). https://doi.org/10.1016/j.dib.2020.106503 Shah, M. “The Corpus of Qur’anic Readings (Qirāʾāt): History, Synthesis and Authentication.” In The Oxford Handbook of Qur’anic Studies , edited by M. Shah and M. Abdel-Haleem, 194–216. Oxford: Oxford University Press (2020). https://doi.org/10.1093/oxfordhb/9780199698646.013.47 Al-Imam, A. A. Variant Readings of the Qurʾan: A Critical Study of Their Historical and Linguistic Origins . Herndon, VA: International Institute of Islamic Thought (2006). Nelson, K. The Art of Reciting the Qur’an . Austin, TX: University of Texas Press (1985). Ayoub, M. “The Qur’an Recited.” Middle East Studies Association Bulletin 27(2), 169–171 (1993). Denny, F. M. “Qur’an Recitation: A Tradition of Oral Performance and Transmission.” In The Oral Tradition in Islam , edited by G. S. Colin. Columbus, OH: Slavica Publishers (1989). Nasser, S. H. “The Transmission of the Variant Readings of the Qurʾān: The Problem of Tawātur and the Emergence of Shawādhdh.” In The Transmission of the Variant Readings of the Qurʾān , i–xi. Leiden: Brill (2013). https://doi.org/10.1163/9789004241794_001 Tlili, V. Uṣūl al-Qirāʾāt: A Brief Overview of the Science of Qurʾān Recitations and Its Formation from the Perspective of Traditional Qirāʾāt Literature. Plzeň: Západočeská univerzita v Plzni (2021). http://hdl.handle.net/11025/46465 Midad Quran Audio Repository. https://midad.com (accessed 2025). Holy Quran Recitation Archive- MP3Quran website. https://mp3quran.net (accessed 2025). Audacity Team. Audacity®: Free Audio Editor and Recorder (Version 3.7.6). https://www.audacityteam.org. Smail, L. AQQD v1.0 [Dataset]. Harvard Dataverse (2026). https://doi.org/10.7910/DVN/A8GM5Y Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8804884","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":588705913,"identity":"880b1844-6f18-4a9b-a773-c409c38d9e77","order_by":0,"name":"Linda Smail","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA50lEQVRIiWNgGAWjYBACPmYGBgkwi5m58QFRWtgQWhibDYjTwgDTwsDYJkGcFnbegzd+MNRF87cztlX+bGOQ529gfojXhWzMfMmWPQyHc2ccZmy7IdnGYDjjAJsxXheyMfOYSfAwHMhtAGkxbGNg3MDAYIbXhSAtkn8Y6nLnA7UUJLYx2G9gYP/+g5AWaR4G5twNQC0MB9sYEjcw8Jjh0wHSYmwtY3A4d+NhxmbJhnMSyTMO8xTjdRg//xnDm28q6nLnnT988OOPMhvb/vb2jR/wWgMGiBACms9MWP0oGAWjYBSMAgIAAPTCPSVXw2zfAAAAAElFTkSuQmCC","orcid":"","institution":"Zayed University","correspondingAuthor":true,"prefix":"","firstName":"Linda","middleName":"","lastName":"Smail","suffix":""},{"id":588705914,"identity":"5774ee3c-046c-4b88-ab30-6323e7ce4bec","order_by":1,"name":"Mohammed Lataifeh","email":"","orcid":"","institution":"University of Sharjah","correspondingAuthor":false,"prefix":"","firstName":"Mohammed","middleName":"","lastName":"Lataifeh","suffix":""},{"id":588705915,"identity":"97857ddf-a7aa-4665-8262-ff5740cae5c4","order_by":2,"name":"Md Sohazur Islam Sozib","email":"","orcid":"","institution":"Zayed University","correspondingAuthor":false,"prefix":"","firstName":"Md","middleName":"Sohazur Islam","lastName":"Sozib","suffix":""},{"id":588705917,"identity":"ff9a2157-5a45-45e2-ab93-e0f222d7568c","order_by":3,"name":"Arthur Diniz De Souza","email":"","orcid":"","institution":"Zayed University","correspondingAuthor":false,"prefix":"","firstName":"Arthur","middleName":"Diniz","lastName":"De Souza","suffix":""}],"badges":[],"createdAt":"2026-02-06 09:08:44","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8804884/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8804884/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":102948195,"identity":"a66b5dc6-2047-40b4-a19b-012cfc97e644","added_by":"auto","created_at":"2026-02-18 19:33:35","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":369485,"visible":true,"origin":"","legend":"\u003cp\u003eThe Ten Canonical Qirāʾāt of the Qurʾān: Transmission and Classification According to Ibn al-Jazari.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8804884/v1/d3e6cf92c3d77402d9931fe2.png"},{"id":102948196,"identity":"bd137891-ea64-4db7-9b86-9edf576846eb","added_by":"auto","created_at":"2026-02-18 19:33:35","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":49737,"visible":true,"origin":"","legend":"\u003cp\u003eFilename structure in AQQD v1.0, illustrating the encoded metadata fields within each audio file name. The example \u003ccode\u003eR023_Q05_S036_A040_C02.wav\u003c/code\u003e represents Reciter 23, Qira’at style 05, Surah 36 (\u003cem\u003eYa-Sin\u003c/em\u003e), Ayah 40, Clip 2.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8804884/v1/30efd4a3663d31dc08966786.png"},{"id":109168355,"identity":"fef43203-f063-476d-915a-86f094534d4e","added_by":"auto","created_at":"2026-05-13 08:33:29","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":564658,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8804884/v1/0633b600-d1ae-4d05-b615-effd35158821.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Annotated Quranic Qira’at Dataset: AQQD v1.0","fulltext":[{"header":"Background \u0026 Summary","content":"\u003cp\u003eWhile the Holy Quran is commonly encountered today in written form, the science of qira\u0026rsquo;at is fundamentally grounded in oral transmission. The Quran was revealed to the Prophet Muhammad (peace be upon him) as recited speech, and its preservation and propagation have historically depended on direct auditory transmission (mushafahah) from teacher to student through rigorously authenticated chains of narration (isnad) (Denny, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e1989\u003c/span\u003e). The written muṣḥaf serves primarily as a mnemonic and referential aid; authoritative knowledge of recitation resides in sound rather than script. As emphasized by Nelson (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e1985\u003c/span\u003e), the Quran is not fully realized as Quran unless it is heard, since its acoustic realization, intonation, rhythm, articulation, and timing, is integral to both its meaning and religious function.\u003c/p\u003e \u003cp\u003eThis oral foundation gave rise to multiple valid modes of recitation, known as qira\u0026rsquo;at, which represent distinct but canonically accepted schools of pronunciation, vocalization, and limited textual variation. These recitation styles trace their origins to the earliest period of Islam and are understood within the tradition as divinely sanctioned accommodations to the linguistic diversity of early Arabic-speaking communities (Nelson, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e1985\u003c/span\u003e; Nasser, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Al-Imam, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2006\u003c/span\u003e). Over time, the system evolved from an early phase of individual pedagogical choice (ikhtiyar) into a formalized scholarly discipline. Ibn Mujahid (d. 936 CE) canonized seven readings in \u003cem\u003eKitab al-Sabʿa\u003c/em\u003e, establishing rigorous criteria for authenticity, including conformity with the rasm (consonantal text), sound linguistic basis, and reliable transmission (Shah, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). This canon was later expanded by Ibn al-Jazari (d. 1429 CE), who validated three additional readings meeting the same criteria, resulting in the ten canonical qira\u0026rsquo;at recognized in Sunni orthodoxy today (Tlili, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) as it can be seen from Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e below.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFrom a technical perspective, differences among the qira\u0026rsquo;at are traditionally classified into two categories: usul (general principles) and farsh (specific lexical variants). \u003cem\u003eUsul\u003c/em\u003e refer to consistent phonological rules applied throughout the Quran that define the acoustic \u0026ldquo;dialect\u0026rdquo; of a given reading. These include:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003evowel inclination (\u003cem\u003eimala\u003c/em\u003e), such as the tilting of /a/ toward /i/ employed prominently in the readings of Ḥamza and al-Kisaʾi (Nasser, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2013\u003c/span\u003e);\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003econsonant assimilation (\u003cem\u003eidgham\u003c/em\u003e), notably \u003cem\u003eidgham kabir\u003c/em\u003e in the reading of Abu ʿAmr, where consonants merge across word boundaries (Nasser, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2013\u003c/span\u003e);\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003esystematic variation in vowel lengthening (\u003cem\u003emadd\u003c/em\u003e), with reciters such as Warsh and Ḥamza employing longer durations (\u003cem\u003eṭūl\u003c/em\u003e) (Nelson, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e1985\u003c/span\u003e);\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003edifferences in the realization of the glottal stop (\u003cem\u003ehamzah\u003c/em\u003e), which is frequently softened or elided in the Warsh tradition (Tlili, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003eFarsh\u003c/em\u003e, by contrast, concerns discrete word-level variations that occur at specific locations in the text and may affect grammatical structure or nuance without altering theological meaning. For example, in Qurʾan 2:37, Ibn Kathir reads \u0026ldquo;Adam\u0026rdquo; in the accusative and \u0026ldquo;kalimat\u0026rdquo; in the nominative, reversing the syntactic roles found in other readings. Other cases involve subtle differences in letter realization, such as reading \u003cem\u003enunsheruha\u003c/em\u003e (\u0026ldquo;We resurrect them\u0026rdquo;) versus \u003cem\u003enunshizuha\u003c/em\u003e (\u0026ldquo;We raise them up\u0026rdquo;), reflecting dialectal variation preserved within the canonical tradition (Nasser, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2013\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eCrucially, both \u003cem\u003eusul\u003c/em\u003e and \u003cem\u003efarsh\u003c/em\u003e distinctions are defined, preserved, and authenticated through sound, not through written representation. Their accurate transmission depends on direct auditory instruction and memorization, reinforcing the fundamentally oral nature of the qira\u0026rsquo;at tradition. Therefore, the critical value of this work is to provide a standardized, public, and comprehensive dataset to cover eight of the ten recitation styles linked fundamentally to the oral nature of the Quran.\u003c/p\u003e \u003cp\u003eTo the best of our knowledge, AQQD v1.0 is the first comprehensive annotated dataset featuring multiple canonical Quranic Qira\u0026rsquo;at. No prior open dataset has captured these authentic style variations at this scale (cf. Lataifeh \u0026amp; Elnagar, \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2020\u003c/span\u003e), filling a vital gap for both Islamic scholarship and speech technology research. The dataset allows researchers to train and evaluate machine learning models for recitation style classification, study acoustic and phonetic differences between Qira\u0026rsquo;at, and develop educational tools (e.g., applications that identify a recitation style or correct style-specific mistakes). It thus opens the door for data-driven analysis of Qira\u0026rsquo;at that was previously only possible through expert knowledge.\u003c/p\u003e \u003cp\u003eEach recording is accompanied by structured metadata embedded in its filename, specifying the reciter, recitation style, surah, ayah, and clip number. This organization enables systematic identification and comparison of Qira\u0026rsquo;at variations, supporting interpretable feature extraction and Bayesian modeling in future stages. With about 308 reciters of various backgrounds, the dataset captures intra-style variation and speaker diversity. This robustness improves generalization of AI models and ensures that conclusions drawn (for example, which features are characteristic of a style) are not tied to a single voice or recording. The breadth of content (70 surahs across different lengths and themes) further guarantees a wide coverage of linguistic contexts, enhancing the dataset\u0026rsquo;s utility for numerous applications.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003eAudio data were collected from publicly available, trusted online sources, including official pages of certified Quran reciters, the Midad Quran Audio repository (Midad, 2025), the Holy Quran Recitaion Archive-MP3Quran website (MP3Quran, 2025), and verified reciters\u0026rsquo; YouTube channels. Recordings were curated, segmented, and standardized to ensure consistent quality and coverage across Qira\u0026rsquo;at styles. All selections were manually reviewed to confirm authenticity and correct labeling of each recitation style. Primary data sourcing, updating, and comparative validation were carried out at Zayed University, United Arab Emirates, which served as the central hub for collecting, organizing, and verifying recordings from all sources.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe audio clips were uniformly down-sampled or up-sampled as needed to a 44.1 kHz sampling rate and 16-bit depth, and converted to single-channel (mono) format to eliminate any channel-dependent variability arising from heterogeneous recording conditions. Each recording was then trimmed or padded to ensure a duration between 6 and 17 seconds, eliminating overly short or excessively long segments to maintain consistency across samples.\u0026nbsp;Importantly, all audio segmentation and editing were performed using Audacity software (Audacity, 2025) for precise control over timing and quality. No additional post-processing (such as noise reduction or normalization beyond the initial standardization) was applied, to preserve the natural characteristics of each recitation. The methods described here focused on curating the raw audio data; no separate transcription or acoustic feature extraction is included in this first version of the dataset.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003e\u003cstrong\u003eData Record\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eThe AQQD v1.0 dataset has been deposited in the Harvard Dataverse repository (Dataverse \u0026ldquo;AQQD\u0026rdquo;) under the CC0 1.0 Public Domain license and is publicly accessible via its permanent DOI\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003ehttps://doi.org/10.7910/DVN/A8GM5Y\u003c/strong\u003e (Smail, 2026).\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003eThe repository contains the full set of 23,111 audio clips in WAV format, along with accompanying documentation such as metadata description. All data files are organized in a single top-level directory.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003e\u003cstrong\u003eData Overview\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eThe dataset is structured conceptually into three\u0026nbsp;\u003cem\u003econtent tiers\u003c/em\u003e based on the selection strategy of Quranic passages, but these tier labels are not reflected in the directory structure or filenames. In total, AQQD v1.0 covers 70 surahs (chapters) of the Quran, each represented by one or more specific verse segments, with each selected segment recited in multiple canonical Qira\u0026rsquo;at styles. This three-tier structure ensures inclusion of both frequently recited passages and those that exemplify characteristic Qira\u0026rsquo;at differences:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003e\u003cstrong\u003eTier\u0026nbsp;1 \u0026ndash; Core Short Surahs:\u003c/strong\u003e Complete short chapters of the Quran that are foundational in daily recitation (e.g., Al-Fatiha 1:1\u0026ndash;7, Al-Ikhlas 112:1\u0026ndash;4, An-Nas 114:1\u0026ndash;6). These 19 surahs are fully included in all available Qira\u0026rsquo;at styles and serve as a consistent baseline for comparative analysis across recitation traditions.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eTier\u0026nbsp;2 \u0026ndash; Representative Long/Medium Surahs:\u003c/strong\u003e Selected segments from 30 surahs of medium to long length, each represented by typically an opening, middle, and closing passage (e.g., Al-Kahf 18:1, 18:50, 18:100, last; Ya-Sin 36:1, 36:36\u0026ndash;40, last). This structure captures variety in linguistic context and melodic cadence without requiring recording of entire lengthy chapters.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eTier\u0026nbsp;3 \u0026ndash; Special Qira\u0026rsquo;at Points:\u003c/strong\u003e Verses or short passages (21 in total) known for Qira\u0026rsquo;at-specific variation, such as As-Sajdah 32:15, Al-Ahzab 33:56, and Al-Qalam 68:1\u0026ndash;4. These selections highlight subtle phonetic or textual differences\u0026mdash;e.g.,\u0026nbsp;\u003cem\u003eimala\u003c/em\u003e (vowel tilting), handling of\u0026nbsp;\u003cem\u003ehamzah\u003c/em\u003e\u003cem\u003e,\u003c/em\u003e or\u003cem\u003e\u0026nbsp;\u003c/em\u003e\u003cem\u003emadd\u003c/em\u003e\u003cem\u003e\u0026nbsp;\u003c/em\u003e(vowel lengthening)\u0026mdash;across canonical reading styles, providing high-value material for detailed comparative analysis.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eAcross all tiers, the dataset spans 70 unique surahs out of the 114 in the Quran, encompassing both Makki and Madani chapters and a broad range of lengths and themes. Each selected passage has between six and eight Qira\u0026rsquo;at renditions, depending on the availability of qualified reciters. Canonical readings represented include\u0026nbsp;\u003cstrong\u003eḤafṣ ʿan ʿAṣim\u003c/strong\u003e\u003cstrong\u003e,\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eWarsh ʿan Nafiʿ\u003c/strong\u003e\u003cstrong\u003e,\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eQalūn ʿan Nafiʿ\u003c/strong\u003e\u003cstrong\u003e,\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eAd-Dūri ʿan Abi ʿAmr\u003c/strong\u003e, among others. (Some less common styles not present in v1.0 will be expanded in future dataset versions.)\u0026nbsp;\u003cem\u003eTable\u0026nbsp;1\u003c/em\u003e provides an overview of the tiered content selection strategy in AQQD v1.0.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 1.\u003c/strong\u003e Overview of the tiered content-selection strategy in AQQD v1.0.\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 56px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTier\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 265px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eDescription\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 146px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eNumber of Surahs/Passages\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 156px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eCoverage Purpose\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 56px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTier\u0026nbsp;1\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 265px;\"\u003e\n \u003cp\u003eComplete short surahs commonly used in daily recitation (e.g., Al-Fatiḥa, Al-Ikhlaṣ, An-Nas)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 146px;\"\u003e\n \u003cp\u003e19 complete surahs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 156px;\"\u003e\n \u003cp\u003eBaseline comparison across Qira\u0026rsquo;at\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 56px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTier\u0026nbsp;2\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 265px;\"\u003e\n \u003cp\u003eSelected segments from medium and long surahs (opening, middle, closing passages)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 146px;\"\u003e\n \u003cp\u003e30 surahs (partial coverage)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 156px;\"\u003e\n \u003cp\u003eLinguistic and melodic diversity\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 56px;\"\u003e\n \u003cp\u003e\u003cstrong\u003eTier\u0026nbsp;3\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 265px;\"\u003e\n \u003cp\u003eVerses with known Qira\u0026rsquo;at-specific variation (e.g., \u003cem\u003eimala\u003c/em\u003e, \u003cem\u003ehamzah\u003c/em\u003e, \u003cem\u003emadd\u003c/em\u003e)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 146px;\"\u003e\n \u003cp\u003e21 targeted passages\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 156px;\"\u003e\n \u003cp\u003eFine-grained Qira\u0026rsquo;at analysis\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eEach recorded recitation in AQQD v1.0 is provided as an individual audio file in uncompressed WAV format (44.1 kHz sampling rate, 16-bit depth, mono). Mono audio was used to ensure consistent acoustic features across sources, as spatial/stereo information is not relevant to Qira\u0026rsquo;at classification, phonetic analysis, or the traditional oral transmission modeling. The duration of clips ranges between 6 and 17 seconds, depending on the length of the passage. Filenames encode key metadata following a structured convention that identifies the reciter, recitation style, surah, ayah, and clip number (e.g.,\u0026nbsp;R012_S03_A005_C02.wav). Each file name follows a fixed schema:\u003c/p\u003e\n\u003cp\u003eR[ReciterID]_Q[QiraatID]_S[SurahNumber]_A[AyahNumber]_C[ClipNumber].wav\u003c/p\u003e\n\u003cp\u003eThis structured naming convention allows users to extract metadata directly from filenames using automated scripts, without requiring auxiliary metadata files (Figure 2). For example, the filename R023_Q05_S036_A040_C02.wav represents \u003cstrong\u003eReciter\u0026nbsp;23\u003c/strong\u003e\u003cstrong\u003e,\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eQira\u0026rsquo;at style\u0026nbsp;05\u003c/strong\u003e\u003cstrong\u003e,\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eSurah\u0026nbsp;36\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e(\u003cem\u003eYa-Sin\u003c/em\u003e),\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eAyah\u0026nbsp;40\u003c/strong\u003e, Clip 2. This organization enables flexible data retrieval, such as filtering all samples for a specific reciter, Qira\u0026rsquo;at style, or surah. It also ensures compatibility with standard data loaders and preprocessing tools in machine learning pipelines.\u003c/p\u003e\n\u003cp\u003eNo separate annotation files, transcripts, or pre-computed feature files are included in this version of the dataset; those will be added in subsequent releases. All clips were curated from trusted online repositories, standardized in format, and quality-checked to ensure clarity and consistency across sources. The current data organization emphasizes clarity and interoperability: researchers can parse the structured filenames to group or filter samples by surah, by recitation style, or by reciter as needed.\u003c/p\u003e\n\u003ch2\u003e\u003cstrong\u003eTechnical Validation\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eTo ensure integrity and reliability, all audio files underwent a multi-stage validation and cleaning process:\u003c/p\u003e\n\u003cul class=\"decimal_type\"\u003e\n \u003cli\u003e\u003cstrong\u003eSource Verification:\u003c/strong\u003e Each recording was obtained from trusted online repositories (e.g., official reciter channels, Midad, and MP3Quran), ensuring that all sources are recognized and publicly available. Recordings were cross-checked for authenticity by confirming reciter identity and consistency with known Qira\u0026rsquo;at styles.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eQuality Control:\u003c/strong\u003e Audio files were standardized to 44.1 kHz sampling rate, 16-bit depth, and mono format to eliminate channel-dependent variability from heterogeneous recording conditions and to ensure consistent acoustic feature extraction (spatial stereo information is not relevant to Qira\u0026rsquo;at analysis). Files containing significant background noise, echo, or distortion were either cleaned or excluded. Clip duration was normalized to fall between 6 and 17 seconds, eliminating overly short or excessively long segments to maintain consistency across samples.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eStructural Consistency Check:\u003c/strong\u003e Automated scripts verified full compliance of each filename with the naming schema and validated that each combination of (Reciter, Qira\u0026rsquo;at, Surah, Ayah) is unique within the dataset, ensuring the absence of duplicate or conflicting identifiers.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eHuman Review:\u003c/strong\u003e A subset of recordings, sampled across all tiers and Qira\u0026rsquo;at styles, was manually reviewed by members of the research team to confirm labeling accuracy, clarity of pronunciation, and adherence to expected Qira\u0026rsquo;at characteristics.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThese validation steps collectively guarantee the technical quality of AQQD v1.0. By combining automated consistency checks with expert human auditing, we minimized the chance of mis-labeled styles or poor-quality audio in the released dataset. The rigorous standardization in format and duration further ensures that the dataset is immediately usable for machine learning pipelines without additional cleaning.\u003c/p\u003e\n\u003ch2\u003e\u003cstrong\u003eUsage Notes\u003c/strong\u003e\u003c/h2\u003e\n\u003cp\u003eAQQD v1.0 is designed to support a broad range of applications in computational analysis of Quranic recitation. Researchers can use the dataset to train classifiers that automatically identify the Qira\u0026rsquo;at style of a given recitation, to study acoustic and phonetic variations among canonical styles, or to develop educational software that provides feedback on recitation style and pronunciation. The structured metadata in filenames makes it straightforward to subset the data by reciter, style, or passage for specific analyses or training scenarios. For instance, one could isolate all clips of a particular Qira\u0026rsquo;at to analyze its characteristic acoustic features, or compare how different Qira\u0026rsquo;at recite the same verse.\u003c/p\u003e\n\u003cp\u003eThe current dataset version (v1.0) has certain limitations that users should be aware of. Notably,\u0026nbsp;\u003cstrong\u003eno textual transcripts or phonetic annotations are included\u003c/strong\u003e with the audio files. This means that any analysis linking the audio to the Quranic text must rely on external Quran text sources, and researchers focusing on phonetics will need to manually derive or add annotations for now. Additionally, because recordings come from varied sources, there may be subtle differences in recording environments; we have standardized bitrates and removed obviously noisy samples (as described in Technical Validation) to mitigate this. Users should also note that the dataset focuses on the most prevalent 8 Qira\u0026rsquo;at styles in this release, so some of the ten canonical styles are underrepresented or absent in v1.0 (these will be addressed in future versions).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFuture Enhancements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAQQD v1.0 is the first release in an ongoing effort. We have plans well underway for AQQD v2.0, which will significantly expand the dataset\u0026rsquo;s scope and address some limitations of the initial version. In upcoming versions (AQQD v2.0 and beyond), the dataset will incorporate controlled studio recordings with unified acoustic conditions (to complement the diverse live recordings and eliminate remaining background noise differences).\u003c/p\u003e\n\u003cp\u003eIn addition, we aim to include more Surahs of the Quran. This will increase the diversity of linguistic content and ensure that even more Qira\u0026rsquo;at differences (some of which might only appear in certain chapters) are captured. While v1.0 focused on the most prevalent 8 styles, v2.0 will stive to cover\u0026nbsp;\u003cstrong\u003eall ten\u003c/strong\u003e canonical Qira\u0026rsquo;at recitations comprehensively. We may also include auxiliary data such as recordings of the same reciter reading the same passage in different styles (for those rare experts who can do so), to directly compare the exact same voice in two styles, a powerful demonstration for interpretability. This will be possible with the\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThese enhancements will extend AQQD from a curated audio collection into a fully annotated benchmark for computational Qira\u0026rsquo;at research and classification studies. The timeline for AQQD v2.0 is within this year, and it will be released as \u003cspan dir=\"RTL\"\u003e\u0026nbsp;\u003c/span\u003ea complement to v1.0, with versioning to distinguish the two. All users of v1.0 can thus look forward to an even richer dataset, and backward compatibility (in terms of data format and ease of merging the new data with the old) will be maintained. Our continuous expansion underscores a commitment to making this the definitive dataset for Quranic Qira\u0026rsquo;at in the AI era, useful for research, education, and preservation of the oral traditions. We welcome collaboration and feedback as we build v2.0 and beyond. Ultimately, AQQD will evolve into a living corpus, updated and improved over time to serve the community\u0026rsquo;s needs.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eCode Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe metadata parsing scripts and audio processing scripts are publicly available alongside the dataset in the Harvard Dataverse repository. No proprietary or closed-source software is required to use the dataset. The provided scripts, structured metadata, and audio files are sufficient to reproduce the data preparation steps and to support basic exploration and reuse of the dataset.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe metadata parsing scripts and audio processing scripts are publicly available in the Harvard Dataverse repository alongside the AQQD v1.0 dataset. \u003cstrong\u003e(\u003c/strong\u003ehttps://dataverse.harvard.edu/dataverse/AQQD\u003cstrong\u003e).\u003c/strong\u003e The repository provides the complete set of curated audio recordings and structured metadata files required to map reciters, Qira\u0026rsquo;at styles, surahs, and ayahs to the corresponding audio samples. No proprietary or closed-source software is required. The released scripts, metadata, and audio files are sufficient to reproduce the data preparation steps and to enable basic inspection and reuse of the dataset. The dataset is released under the CC0 1.0 Public Domain license, permitting unrestricted use, distribution, and reproduction. Detailed documentation is provided in the accompanying README file.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors gratefully acknowledge the support of the Dubai Future Foundation (DFF) under the Research and Development Initiative Grant No. 2024ZUNIV-SMA-047, a project led by Zayed University. Special thanks are extended to the research assistants and students at Zayed University who contributed to data verification, segmentation, and quality control. The authors also express appreciation to the Quranic recitation community and to the creators of the Ar-DAD Arabic Diversified Audio Dataset (Lataifeh \u0026amp; Elnagar, 2020) for laying the groundwork that inspired this initiative.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research was conducted in compliance with institutional and national ethical guidelines. All recordings in AQQD v1.0 were obtained from publicly available, verified sources featuring certified Quranic reciters. These recordings are distributed as Islamic endowment materials (\u003cem\u003eWaqf\u003c/em\u003e) intended for educational and research use. No personal or sensitive data were collected.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no known financial or personal relationships that could have influenced the work reported in this article.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by the Dubai Future Foundation (DFF) under the Research and Development Initiative Grant No. 2024ZUNIV-SMA-047, led by Zayed University in collaboration with the University of Sharjah. The funder had no role in the design, data collection, analysis, interpretation, or decision to publish this dataset.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eLataifeh, M. \u0026amp; Elnagar, A. \u003cstrong\u003eArabic Diversified Audio Dataset (Ar-DAD)\u003c/strong\u003e [Dataset]. \u003cem\u003eData in Brief\u003c/em\u003e\u003cstrong\u003e33\u003c/strong\u003e, 106503 (2020). https://doi.org/10.1016/j.dib.2020.106503\u003c/li\u003e\n\u003cli\u003eShah, M. \u0026ldquo;The Corpus of Qur\u0026rsquo;anic Readings (Qirāʾāt): History, Synthesis and Authentication.\u0026rdquo; In \u003cem\u003eThe Oxford Handbook of Qur\u0026rsquo;anic Studies\u003c/em\u003e, edited by M. Shah and M. Abdel-Haleem, 194\u0026ndash;216. Oxford: Oxford University Press (2020). https://doi.org/10.1093/oxfordhb/9780199698646.013.47\u003c/li\u003e\n\u003cli\u003eAl-Imam, A. A. \u003cem\u003eVariant Readings of the Qurʾan: A Critical Study of Their Historical and Linguistic Origins\u003c/em\u003e. Herndon, VA: International Institute of Islamic Thought (2006).\u003c/li\u003e\n\u003cli\u003eNelson, K. \u003cem\u003eThe Art of Reciting the Qur\u0026rsquo;an\u003c/em\u003e. Austin, TX: University of Texas Press (1985).\u003c/li\u003e\n\u003cli\u003eAyoub, M. \u0026ldquo;The Qur\u0026rsquo;an Recited.\u0026rdquo; \u003cem\u003eMiddle East Studies Association Bulletin\u003c/em\u003e 27(2), 169\u0026ndash;171 (1993). \u003c/li\u003e\n\u003cli\u003eDenny, F. M. \u0026ldquo;Qur\u0026rsquo;an Recitation: A Tradition of Oral Performance and Transmission.\u0026rdquo; In \u003cem\u003eThe Oral Tradition in Islam\u003c/em\u003e, edited by G. S. Colin. Columbus, OH: Slavica Publishers (1989).\u003c/li\u003e\n\u003cli\u003eNasser, S. H. \u0026ldquo;The Transmission of the Variant Readings of the Qurʾān: The Problem of Tawātur and the Emergence of Shawādhdh.\u0026rdquo; In \u003cem\u003eThe Transmission of the Variant Readings of the Qurʾān\u003c/em\u003e, i\u0026ndash;xi. Leiden: Brill (2013). https://doi.org/10.1163/9789004241794_001\u003c/li\u003e\n\u003cli\u003eTlili, V.\u003cem\u003e Uṣūl al-Qirāʾāt: A Brief Overview of the Science of Qurʾān Recitations and Its Formation from the Perspective of Traditional Qirāʾāt Literature. Plzeň: Z\u0026aacute;padočesk\u0026aacute; univerzita v Plzni (2021). \u003c/em\u003e\u003cem\u003ehttp://hdl.handle.net/11025/46465\u003c/em\u003e\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eMidad Quran Audio Repository.\u003c/strong\u003e https://midad.com (accessed 2025).\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003eHoly Quran Recitation Archive- MP3Quran website.\u003c/strong\u003e https://mp3quran.net (accessed 2025).\u003c/li\u003e\n\u003cli\u003eAudacity Team. \u003cem\u003eAudacity\u0026reg;: Free Audio Editor and Recorder\u003c/em\u003e (Version 3.7.6). https://www.audacityteam.org.\u003c/li\u003e\n\u003cli\u003eSmail, L. \u003cem\u003eAQQD v1.0\u003c/em\u003e [Dataset]. Harvard Dataverse (2026). https://doi.org/10.7910/DVN/A8GM5Y\u003cstrong\u003e\u003cbr\u003e \u003c/strong\u003e\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-8804884/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8804884/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAQQD v1.0 is an open dataset of Quranic recitations annotated across multiple Qira\u0026rsquo;at to exclusively cover the ten canonical reading styles of the Quran. The dataset is developed to support a wide range of machine learning and audio analysis frameworks by providing carefully selected audio samples with rich, structured annotations suitable for classification, representation learning, and interpretability-focused modeling. The dataset encompasses recordings from 308 reciters, covering 70 Quranic Surahs segmented into representative verses, with each segment recited in 6\u0026ndash;8 distinct Qira\u0026rsquo;at styles. Each audio file is accompanied by structured metadata embedded within its filename, indicating the reciter, recitation style, surah, ayah, and clip number. The first version of AQQD v1.0 fills a critical gap in computational Quranic studies by bridging traditional Qira\u0026rsquo;at scholarship with modern machine learning. It provides a high-quality and interpretable resource for researchers in audio processing, Islamic studies, and educational technology, contributing to the analysis and preservation of the diverse recitation heritage.\u003c/p\u003e","manuscriptTitle":"Annotated Quranic Qira’at Dataset: AQQD v1.0","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-18 19:33:30","doi":"10.21203/rs.3.rs-8804884/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ee34f37f-9c29-4a26-b6bf-6e0ea132bd01","owner":[],"postedDate":"February 18th, 2026","published":true,"recentEditorialEvents":[{"type":"decision","content":"Withdrawn","date":"2026-05-13T08:21:33+00:00","index":"","fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-05-13T08:31:56+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-18 19:33:30","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8804884","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8804884","identity":"rs-8804884","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00