Literacy Enhances Lexical Variation, Not Quantity, in Adult Oral Production | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Literacy Enhances Lexical Variation, Not Quantity, in Adult Oral Production Tan Arda Gedik This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7134036/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 07 Nov, 2025 Read the published version in Journal of Cultural Cognitive Science → Version 1 posted 7 You are reading this latest preprint version Abstract Adult language use varies substantially across speakers, with literacy experience emerging as a crucial but understudied factor in creating this variation. While written language exposes speakers to broader, more diverse vocabulary than speech alone, most psycholinguistic research focuses on highly literate populations, leaving gaps in our understanding of how literacy shapes oral production. This study addresses a critical question: Does literacy acquisition affect lexical diversity in spontaneous oral narrative production in Turkish? We compared lexical diversity patterns between semi-literate and fully literate adult Turkish speakers during a structured storytelling task. Using Root Type-Token Ratio analyses across six parts of speech, we found that literate speakers consistently demonstrated significantly higher lexical diversity than semi-literate speakers (d = 1.18–2.08 for most categories). Crucially, this occurred without increased word production, indicating that literacy enhances vocabulary variation rather than quantity. The largest effects emerged for elaborative categories—conjunctions, adverbs, and adjectives. These findings reveal that literacy fundamentally affects lexical organization and deployment in oral productions. literacy frog story narration lexical diversity Turkish oral production Figures Figure 1 1. Introduction Despite being native speakers of the same language, adult individuals vary substantially in their linguistic knowledge and language use. A growing body of research suggests that one major source of this variation is literacy experience—that is, the extent to which individuals have been exposed to and engaged with written language throughout their lives (Dąbrowska, 2012; Author year). Among the areas most affected by literacy, vocabulary knowledge stands out: speakers show particularly large individual differences in their lexical repertoire. This variation stems in part from differences in input. Written language tends to contain a far richer and more diverse vocabulary than everyday speech. For instance, Hayes and Ahrens (1988) found that children’s books contained significantly more rare words than even the speech of college graduates (see Table 1 below). Similarly, Cunningham and Stanovich (1998) demonstrate that written texts consistently surpass spoken ones in lexical richness, suggesting that non-basic vocabulary is often acquired incidentally through print exposure, especially in adulthood (see Dąbrowska, 2009). Since there is only so much vocabulary that can be acquired through oral input alone, sustained engagement with written language becomes a key driver of vocabulary expansion. Table 1 Richness of vocabulary across selected written and spoken modalities (adapted from Hayes & Ahrens, 1988) Proportion of text from 5,000 basic lexicon Rank of median Word Number of rare words per 1,000 tokens College graduates in conversation with friends and spouses .94 496 17.3 Popular prime time tv .94 490 22.7 Children’s books .92 627 30.9 Adult books .88 1,058 52.7 Newspapers .84 1,690 68.3 Scientific articles .70 4,389 128.0 Indeed, numerous studies report robust correlations between reading experience and vocabulary knowledge, with coefficients ranging from .40 to .80—even after controlling for reading comprehension and nonverbal IQ (Dąbrowska, 2018; Nation, 2013; Cunningham & Stanovich, 1998; Mol & Bus, 2011; Author year). These findings suggest that reading itself contributes directly to vocabulary growth, rather than being merely a byproduct of other cognitive abilities such as inference-making. Yet, literacy shapes more than just vocabulary size. Research in cognitive neuroscience and psycholinguistics shows that literacy acquisition restructures multiple domains of linguistic and cognitive processing, including phonological awareness, memory, syntactic parsing, and lexical access (Dehaene et al., 2010; Dąbrowska, 2012; Huettig & Mishra, 2014; Kosmidis et al., 2006; Morais et al., 1979). Despite these wide-ranging effects, most psycholinguistic research is based on highly literate speakers from WEIRD populations (Henrich et al., 2010; Blasi et al., 2022), overlooking the fact that limited literacy remains widespread: UNESCO estimates that over 770 million adults lack basic reading and writing skills worldwide. This gap in representation is especially relevant when studying productive vocabulary. Although prior studies have shown that literacy and education support receptive vocabulary knowledge, far less is known about how literacy shapes speakers’ ability to deploy diverse lexical items in real-time communication. To date, only two studies have examined the effect of literacy on receptive vocabulary knowledge, broadly speaking (Kim et al., 2013; Kosmidis et al. 2006). Kim and colleagues (2013) tested L1 Korean speakers from varying literacy levels (illiterate, semi-literate, literate), and tested them using the Boston Naming Test. They found that the purely illiterate group performed the lowest and literacy level was the best predictor of performance). Kosmidis and colleagues (2006) took a slightly different approach and focused on receptive vocabulary knowledge via a lexical decision task. In this task semi-literate speakers with little to no education, literates with low education, and literates with many years of education were presented with pseudowords and real words in Greek and were asked to decide if the words presented were real or fake. High education-literate speakers outperformed low education-literate speakers, who outperformed semi-literate speakers. Since deciding which words in the task are real depends on one’s vocabulary size, Kosmidis and colleagues conclude based on the results that education is the main predictor there, over and above literacy. While such tasks tap into recognition and provide an important basis for related future research, they do not reflect the demands of spontaneous speech, where speakers must retrieve and select words dynamically depending on the context. In light of this, productive vocabulary offers a complementary and arguably more dynamic measure of lexical competence, as it captures not just recognition but the ability to retrieve and use words appropriately in real-time language production. This can be systematically assessed using metrics that quantify the diversity of vocabulary used in spontaneous speech (i.e., lexical diversity). One such widely used metric is the type–token ratio (TTR), which has been suggested [1] to correlate with receptive vocabulary (Hess et al., 1984) and arguably reflects how flexibly speakers draw on their lexical resources. For example, studies on Turkish speakers with agrammatic aphasia have found that while total verb production remains intact, verb diversity (as measured by TTR) declines significantly (Arslan, Bamyacı, & Bastiaanse, 2016; Maviş et al., 2014). This suggests that reduced lexical diversity can reflect constraints in access and selection—not just in receptive knowledge. Crucially, such constraints may arise not only from brain injury but also from differences in developmental experience, including literacy. While some studies have begun to explore the effects of literacy on morphology and syntax (Dąbrowska et al., 2023; Author year), the impact of literacy on productive vocabulary use, particularly lexical diversity in oral production, remains poorly understood. To address this gap, the current study examines whether and how literacy influences lexical diversity in narrative speech. Since freestyle oral production introduces many confounding variables (i.e., length, discourse, topic) which can influence the qualities of the output, narrations emerge as a viable candidate to elicit oral production. Narratives require speakers to select and sequence events, manage referents, and use cohesive devices (Berman & Slobin, 1994). Narratives are often used in L1 acquisition studies with children or with L2 speakers to measure proficiency. Although previous research has linked print exposure to vocabulary breadth in literate populations (author year; Mol & Bus, 2011), little is known about how this knowledge is mobilized in oral production by individuals with different literacy profiles. To our knowledge, this is the first study to systematically compare lexical diversity in narrative production between semi-literate and literate adults. We ask: Are there measurable differences in lexical diversity between the two groups? If so, what form do these differences take? To answer this, we elicited narratives from age-matched, cognitively healthy adult Turkish speakers using a wordless picture book. The choice of Turkish speakers was based on availability, and the task was designed to be accessible across literacy levels, enabling a controlled yet naturalistic comparison of productive lexical diversity. 2. Methodology 2.1 Participants We gathered data from 24 illiterate adult native Turkish speakers (all female, mean age = 51.54, SD = 12.95) and 24 age-matched literate adults (all female, mean age = 45.33, SD = 12.44). The literate group included participants with at least a secondary school education, distributed as follows: 7 associate degree holders, 7 bachelor’s degree holders, 7 master’s degree holders, and 3 PhD holders. The semi-literate participants were attending literacy classes at an adult education center in Ankara, Turkey, where they had been enrolled for an average of 4 months (SD = 0.20). Some were repeating the course for a second or third time. The curriculum, covering around 80 teaching units of 40 minutes each, included basic literacy alongside mathematics, Turkish language, and introductory history. Successful completion of the program grants a certificate equivalent to a primary school diploma in Turkey. Before enrollment, participants were screened by the literacy center in collaboration with health professionals from local hospitals for speech, communication, or cognitive impairments such as dyslexia. Individuals with such conditions were not admitted to the literacy program, and consequently, were excluded from this study. Ethical approval for the study was obtained from the Bilkent University Ethics Committee (approval number: 2022_12_21_01, dated December 22, 2022). 2.2 Materials 2.2.1 Narrations Narratives were elicited using a modified edition of Frog, Where Are You? (Mayer, 1969 ), a 24-page wordless picture book. The Frog Story has been widely used in linguistic research because it offers several practical advantages for eliciting spoken narratives. Visual prompts allow researchers to avoid potential interference from a participant’s second language or literacy level, since no reading is required. The storybook format also encourages speakers to produce extended, connected speech rather than isolated sentences, making it suitable for examining discourse-level features. Its accessibility and ability to generate rich linguistic data make it especially useful in contexts where literacy cannot be assumed. To accommodate semi-literate participants, who may find black-and-white illustrations challenging to interpret, we presented a colored version of the story; aside from the addition of color, no other changes were made. This particular book was chosen because its clear narrative arc naturally encourages storytellers to reflect on the characters’ feelings and thoughts, creating a structured yet open-ended context to observe lexical choices, and was used in several previous studies, especially with children (e.g., Friend & Bates, 2014 ; Küntay & Nakamura, 2004 ). Participants were first told that the book contains no words and follows the story of a boy, a dog, and a frog. They were then asked to examine each page in sequence and narrate the events to the experimenter in their own words while viewing the illustrations. The experimenter reminded them to describe not just what happened but also what the characters might be experiencing or thinking. Except for clarifying questions about unfamiliar characters (e.g., identifying animals like the deer or gopher), the experimenter refrained from guiding or interrupting the storytelling. Once the narratives were complete, participants were thanked for their time. 2.2.2 Transcriptions All recordings of the Frog Story narrations were transcribed verbatim by two native Turkish speakers with degrees in linguistics. Each transcript was first prepared by one transcriber and then reviewed independently by the second. Inter-rater agreement between the two transcribers before reconciliation was assessed using Cohen’s kappa (κ = 0.96). Any disagreements of words were discussed and resolved collaboratively. Unintelligible words were repetitively listened to until they became intelligible. The transcriptions captured all elements of spontaneous speech, including repetitions, hesitations, and fillers, in order to preserve the natural structure of the narratives. To ensure the transcripts focused solely on the participants’ storytelling, any interactions with the experimenter (such as clarifying questions or brief comments) were removed. While the transcripts reflected what speakers actually said, spelling and word forms followed the conventions of standard written Turkish, rather than phonetic transcription, to allow for consistent lexical analysis. Each transcript was first prepared by one transcriber and then reviewed by the second. Any differences were discussed and resolved collaboratively, based on an agreed-upon set of transcription conventions for marking pauses, self-corrections, and unintelligible segments. All transcripts were anonymized by removing personally identifying information and saved in UTF-8 encoded plain text format for further analysis. 2.3 Extraction of linguistic indices and Type-Token Ratio To extract lexical diversity measures from the narrative transcripts, we used Sketch Engine (Kilgarriff et al., 2014 ), a corpus analysis tool that provides automated part-of-speech tagging and frequency analysis capabilities for many languages. Sketch Engine was selected for several methodological advantages: (1) it offers robust morphological analysis specifically designed for Turkish, a morphologically rich language where accurate POS tagging requires sophisticated handling of complex inflectional and derivational processes; (2) it provides standardized tokenization procedures that ensure consistent identification of word boundaries across all transcripts; (3) its automated POS tagging reduces potential human coding errors and eliminates inter-rater reliability concerns that would arise from manual classification; and (4) it enables systematic extraction of both type (unique lexical items) and token (total word instances) counts across six major word classes: adjectives, adverbs, conjunctions, nouns, pronouns, and verbs. Finally, Sketch Engine is the only publicly available tool that can do lemmatization and POS tagging for Turkish for personal corpora. All transcripts were uploaded to the Sketch Engine platform, where they underwent automatic morphological parsing and POS tagging using the Turkish language model. The resulting frequency counts were extracted for each participant and word class, providing the raw data for subsequent RootTTR calculations and statistical analyses. To measure lexical diversity, we employed RootTTR (Root Type-Token Ratio), calculated as the number of unique words (types) divided by the square root of the total number of words (tokens). While more sophisticated measures such as MATTR (Moving Average Type-Token Ratio) or MTLD (Measure of Textual Lexical Diversity) theoretically provide more robust control for text length effects, these measures require specialized software implementations that are not currently available for Turkish morphological analysis. We also considered logTTR (logarithmic Type-Token Ratio), which applies a logarithmic transformation to reduce length dependency, but RootTTR provides improved length normalization compared to simple TTR, though more sophisticated measures such as MTLD may offer superior length independence (Koizumi & In'nami, 2012; Fergadiotis, Wright, & West, 2013 ). RootTTR was selected as the most appropriate available measure for several reasons: (1) it provides better length normalization than simple TTR by reducing the mathematical dependency between type-token ratios and text length; (2) it has been successfully employed in cross-linguistic research and is widely recognized in the lexical diversity literature; (3) it can be reliably calculated from the frequency data extracted through Sketch Engine without requiring additional software dependencies; and (4) most importantly, the potential limitations of RootTTR's length sensitivity are minimal in our dataset since narrative lengths did not differ significantly between groups (Mean = 286.16 words for semi-literate vs. Mean = 338.66 words for literate participants, p = 0.28). This non-significant length difference ensures that any observed RootTTR differences between groups reflect genuine lexical diversity rather than text length artifacts, making RootTTR an appropriate and valid measure for our comparative analysis. 2.4 Statistical analyses After preprocessing data for statistical analyses, we imported the data to RStudio (2024). We first obtained descriptive statistics. Then, to investigate the first research question, we coded group as literate vs. semi-literate and ran two-tailed t.tests using the t.test() function in R to test the reliability of the differences between the literate and semi-literate group. We used the lsr package (Navarro, 2021) to calculate effect sizes. The t-tests were carried out on each variable individually so as to minimize statistical interference effects of simultaneous multiple comparisons. Data, and the R code are available at the following link: https://osf.io/n725u/?view_only=fdde3783851244c3ac2cfe9d301c650f 2.5 Procedure Semi-literate participants were tested individually in a quiet, familiar room within the adult education centers, and each session was audio recorded. Before starting, the experimenters explained the study to participants and ensured they felt at ease. For illiterate participants, the experimenter read the consent form aloud, obtained oral consent, and marked the form with a plus sign in the appropriate space before beginning the recording. Literate participants were tested individually in a quiet office at the university; they read and signed the consent form themselves before recording began. Given that illiterate adults are considered a vulnerable population, special care was taken to support their comfort and well-being during the study. The experimenter provided frequent positive feedback—for example, by saying "Yes, amazing" or "You're doing very well"—and regularly checked whether participants were comfortable. Most participants were highly engaged and interacted actively with the experimenter, with nearly all reporting that they enjoyed the experience. A few even asked to take part again. At the beginning of each session, demographic information was collected, including age, length of time spent learning to read (for literate participants, this referred to how long they had been literate), years of formal education, highest educational qualification, and any previous literacy instruction before joining the adult education program. Following this, participants completed the Frog Story task. Each testing session took around 10–15 minutes. During the task, the experimenter repeated prompts neutrally if participants hesitated or asked to hear them again and also entered participants’ responses into a separate computer for backup. 3. Results Table 2 provides data on age, number of years spent in formal schooling, and the length of narrations. As can be seen, the two groups differ significantly from each other on all aspects except for age. The average educational attainment of the illiterate speakers in this study was 0.04 years (roughly 4 months), while literate speakers had, on average, about 17 years of schooling. The difference in narrative length between the two groups was not statistically significant, with each group producing narrations of similar length (literate speakers: 338 words per narration on average, semi-literate speakers: 286 words per narration on average). The literate corpus consisted of 9.613 tokens while the semi-literate corpus had 8.935 tokens, totaling 18.548 tokens. Table 2 Descriptive statistics for the variables per group, standard deviation in parentheses, t-tests and effect sizes (Cohen’s d) Semi-literate Literate Mean (SD) Range Mean (SD) Range Group comparison Cohens d Age 51.54 (12.95) 28–80 45.33 (12.44) 21–68 t(1.69) = 45.92, p = 0.09 0.48 Years spent in formal education 0.04 (0.20) 0–1 17.95 (4.50) 12–27 t(-19.45) = 23.09, p < 0.001 5.61 Narrative length (in words) 286.16 (145.89) 128–812 338.66 (189.14) 87–845 t(-1.07) = 43.21, p = 0.28 0.31 In Fig. 1 , we analyzed the effect of literacy on lexical production by comparing semi-literate and literate participants in terms of both the number of types (unique lexical items) and tokens (total instances of word use) across six parts of speech: adjectives, adverbs, conjunctions, nouns, pronouns, and verbs. For each category, we report mean differences, statistical significance, and effect sizes using Cohen’s d. In terms of types, literate speakers consistently produced a greater variety of words than semi-literate speakers. This difference was statistically significant for adjectives (p < .001, d = 1.07), adverbs (p < .001, d = 1.34), conjunctions (p < .001, d = 1.16), and nouns (p < .001, d = 1.15), with effect sizes indicating large differences in lexical diversity. The difference in verb types approached significance (p = .087) and showed a moderate effect size (d = 0.50). For pronouns, no difference was found between groups (p = 1, d = 0). As for the token data, literate speakers produced significantly more adjective tokens (p = .0086, d = 0.79), and adverb tokens (p = .0098, d = 0.77), all with medium to large effect sizes. However, the differences in noun token counts were not statistically significant (p = .17, d = 0.40), conjunction tokens (p = 0.47, d = -0.21), and there was no reliable difference in the number of pronouns (p = .19, d = − 0.38) or verb tokens (p = .94, d = − 0.02) between the groups. Table 3 Descriptive statistics for RootTTR values of parts-of-speech and overall narration per group, standard deviation in parentheses, t-tests and effect sizes (Cohen’s d) Semi-literate Literate RootTTR values Mean (SD) Range Mean (SD) Range Group comparison Cohens d Adjectives 1.82 (0.54) 0.57–3.05 3.26 (1.32) 1.15–6.12 t(-4.93) = 30.65, p < 0.001 1.42 Adverbs 1.62 (0.61) 0-2.75 3.19 (0.89) 1.73–5.16 t(-7.07) = 40.62, p < 0.001 2.04 Verbs 3.51 (0.57) 2.60–4.83 4.24 (0.65) 2.93–5.67 t(-4.11) = 45.31, p < 0.001 1.18 Pronouns 1.09 (0.36) 0.26–1.75 1.27 (0.31) 0.70–1.78 t(-1.75) = 45.10, p = 0.08 0.50 Conjunctions 0.82 (0.29) 0.30–1.45 1.50 (0.35) 1-2.26 t(-7.25) = 44.11, p < 0.001 2.08 Nouns 3.49 (0.78) 2.37–5.29 5.35 (1.02) 3.79–8.15 t(-7.05) = 43.18, p < 0.001 2.03 Overall 7.43 (1.33) 5.38–9.68 7.45 (2.96) 2.69–16.03 t(-0.03) = 31.92, p = 0.97 0.00 To evaluate the effect of literacy on lexical diversity, in Table 3 we compared RootTTR scores across six parts of speech using independent t-tests: adjectives, adverbs, verbs, pronouns, conjunctions, and nouns. Across most categories, literate speakers demonstrated significantly higher RootTTR values than semi-literate speakers, indicating more diverse lexical production. The RootTTR for adjectives was significantly higher in the literate group (Mean = 3.26) than in the semi-literate group (Mean = 1.82), t(30.65) = − 4.93, p < .001, with a large effect size (d = 1.42). A similar pattern emerged for adverbs, where literate speakers again outperformed semi-literate speakers (Means = 3.19 vs. 1.62), t(40.62) = − 7.07, p < .001, with a very large effect (d = 2.04). For verbs, the literate group also showed higher RootTTR values (Mean = 4.24) compared to the semi-literate group (Mean = 3.51), t(45.31) = − 4.11, p < .001, d = 1.18. The contrast in pronouns was less pronounced. While the literate group had slightly higher RootTTR values (Mean = 1.27) than the semi-literate group (Mean = 1.09), the difference did not reach conventional levels of statistical significance, t(45.10) = − 1.75, p = .08, with a moderate effect size (d = 0.50). For conjunctions, RootTTR was significantly greater in the literate group (Mean = 1.50) than in the semi-literate group (Mean = 0.82), t(44.11) = − 7.25, p < .001, yielding a very large effect (d = 2.08). A similarly strong effect was observed for nouns, where literate speakers (Mean = 5.35) showed significantly higher RootTTR values than semi-literate speakers (Mean = 3.49), p < .001, with Cohen’s d = 2.03. 4. Discussion The present study investigated how literacy acquisition affects lexical diversity in adult L1 Turkish speakers by comparing semi-literate and literate participants’ output in a narrative production task. To investigate this, we elicited narrations from semi-literate and literate L1 Turkish speakers on a wordless picture book (Frog Story), and conducted a lexical diversity analysis by comparing types and tokens as well as RootTTRs of various parts-of-speech (POS) using SketchEngine. Our findings reveal systematic differences in lexical diversity that illuminates the impact of years of literacy experience on minute differences in lexical usage in oral production. The absence of a group difference in total word count indicates that both literate and semi-literate participants were equally willing or able to participate in the narrative task. Given that the Frog Story has been used with young children to elicit narrations (e.g., Ögel Balaban & Hohenberger, 2020), we contend that the task did not pose cognitive demands. Moreover, participants had ample time to familiarize themselves with the story before beginning their narration, and were allowed to look at the pictures while speaking, further supporting the accessibility of the task. Nevertheless, the internal linguistic architecture of their stories diverged sharply between groups. The type-token comparison analysis revealed a consistent pattern across word classes: while both groups produced comparable quantities of words (tokens) per POS (except for adverbs and adjectives), literate participants demonstrated significantly greater lexical variety (types) in their vocabulary choices. This pattern was most pronounced for nouns, where despite using similar numbers of noun tokens (~ 140 each), literate participants employed significantly more distinct nouns (d = 1.15, p < .001), suggesting they varied their referential expressions rather than repeatedly using the same terms. A similar trend emerged for the rest of the POS categories except for pronouns, indicating that literate speakers consistently drew from a broader lexical repertoire across word classes. The fact that there was no difference in pronouns is expected, since there is only a small number of pronouns, which need to be used in both written and spoken language. Thus, there is no reason to expect literacy-related advantages in pronoun usage in terms of lexical diversity. Given the robust and widespread differences in lexical types across parts of speech, it was especially compelling to examine type–token ratios as an index of productive vocabulary. Our findings revealed a systematic advantage for literate speakers in RootTTR scores across nearly all categories. The most pronounced disparities were observed for conjunctions and adverbs—categories crucial for linking, elaborating, and structuring discourse. These are precisely the lexical classes that spoken interaction in oral cultures tends to underuse or convey through prosody, repetition, or parataxis (cf. Yıldız, 2006; Gökçe, 2016). Substantial differences in adjectives (d = 1.42) and verbs (d = 1.18)—the descriptive and predicative core of narrative—further suggest that literacy contributes to deeper lexical precision and semantic range. Of course, it is important to acknowledge that literacy is not an isolated variable. As will be discussed below, it is closely intertwined with education, cognitive development, and broader life experience—including opportunities for social interaction, autonomy in communication, and exposure to diverse linguistic registers (Fingeret, 1983; Gökçe & Yıldız, 2018 ). The differences observed in RootTTR may therefore reflect the cumulative impact of these intersecting factors. However, we interpret literacy as a central explanatory variable because it encapsulates many of these life experiences and reflects the availability of written language as a persistent linguistic resource. In this light, literacy appears to enhance productive vocabulary by reducing lexical repetition and facilitating access to more diverse and precise vocabulary. The consistent pattern across multiple POS categories supports a domain-general effect, in line with theories that posit richer, more abstract lexical representations among speakers with regular exposure to written language (e.g., Bybee, 2010). This interpretation also resonates with cognitive and neuropsychological work showing that lexical diversity is sensitive to constraints in lexical access and selection, rather than just storage (Arslan et al., 2016 ; Maviş et al., 2014 ). Overall, these findings reinforce the view that literacy—understood as both a linguistic and socio-cognitive resource—broadens the lexicon and shapes the way speakers access and deploy vocabulary in narrative discourse. Before turning to the discussion on why literacy may be an important predictor of lexical diversity, it is important to acknowledge a key methodological factor: the literate and semi-literate groups in this study differ not only in their literacy experience but also in broader life experience. These life experience factors include the amount and quality of social interaction (e.g., with family, peers, or strangers), the level and duration of formal education, habits of language use such as reading, writing, viewing, and speaking, work-related activities, hobbies and interests, and knowledge of other languages. In Turkey, semi-literate individuals often come from low socioeconomic backgrounds (Aktaş, 2007) and are unable to attend formal schooling due to structural and patriarchal barriers (Gökçe & Yıldız, 2018 ). Semi-literacy in this context functions as a cumulative disadvantage, affecting cognitive domains such as nonverbal reasoning and grammatical comprehension (author under review), and also undermining non-cognitive factors such as self-confidence, autonomy, and communicative independence (Fingeret, 1983; Gökçe, 2016). Women in these communities frequently lead restricted lives; they are perceived as incompetent by others, rely on close relatives to manage bureaucratic or medical appointments, and are often discouraged from engaging in public life without accompaniment (Gökçe & Yıldız, 2018 ). As a result, their interactions tend to occur within close-knit, esoteric communities where oral culture dominates and social exchanges are confined to familiar interlocutors (Yıldız, 2006; Wray & Grace, 2007; Gökçe, 2016). This limits not only opportunities for language development, but also exposure to varied communicative settings. In practice, literacy and education are deeply intertwined—those who spend more years in school are more exposed to written language and are more likely to engage in sustained reading. This relationship is reflected in our sample: group membership (literate vs. semi-literate) is strongly correlated with years of education (r = .94, p < .001), confirming that the two variables are tightly coupled. Prior research has also documented this association, showing that time spent in formal education predicts reading habits and literacy outcomes (Dąbrowska, 2018 ; author year). While we refer to literacy as the primary explanatory factor in the discussion below, the effects may also reflect, or be amplified by, broader differences in educational and experiential backgrounds. These interconnected influences—such as socioeconomic status, gendered restrictions on mobility, and the richness of communicative environments—deserve closer investigation in future work. However, given the correlational nature of these factors with semi-literacy, and given the theoretical implications of literacy experience on the structure and accessibility of linguistic knowledge, the present study focuses specifically on literacy as a central variable in understanding individual differences in vocabulary use and representation, as the semi-literate/literate distinction also draws on these other factors explained above. Why should literacy be an important predictor in lexical diversity in oral narrations? First, literacy acquisition has been shown to fundamentally restructure cognitive architecture. Dehaene and colleagues ( 2010 ) demonstrated that learning to read leads to measurable changes in brain function, including enhanced working memory, phonological processing, and abstraction. These general cognitive effects of literacy can make lexical access more efficient and flexible, especially under the demands of spontaneous speech while looking at pictures. In the present study, literate participants used a wider range of lexical types despite producing a comparable number of tokens—suggesting that literacy supports not just language learning, but the dynamic retrieval and deployment of vocabulary during real-time language use. Second, literacy changes the nature of linguistic input by exposing speakers to a lexically richer variety of language. Written texts, including children’s books, consistently contain more rare and diverse vocabulary than spoken conversation (Hayes & Ahrens, 1988 ; Cunningham & Stanovich, 1998 ). This enriched exposure enables the acquisition and entrenchment of low-frequency words that rarely occur in speech. Dąbrowska ( 2009 ) argues that most adult vocabulary is acquired through reading, and studies have repeatedly shown that print exposure is a strong predictor of receptive vocabulary size (e.g., Dąbrowska, 2018 ; author year; Mold & Bus, 2011). However, the effects observed in this study go beyond mere exposure to more types. If frequency alone explained the pattern, we would expect increases in both types and tokens across all categories. Instead, the selective enhancement of elaborative categories (e.g., adjectives, adverbs, conjunctions) suggests that written input not only expands the lexicon, but shapes its structure and usage in nuanced ways. Third, sustained engagement with written language may promote discourse-level strategies that prioritize explicitness, precision, and coherence in writing, i.e., being explicit. Although the Frog stories were delivered orally with help from pictures, the task itself is relatively decontextualized: it presents new content and requires speakers to construct a narrative from scratch. In such tasks, speakers may prefer more explicit reference strategies—such as repeating full noun phrases—instead of relying on pronouns that assume shared contextual knowledge. This pattern is reflected in our data: both groups used pronouns at comparable rates, consistent with their role in spoken, contextually embedded discourse. However, literate participants employed a significantly greater variety of nouns, suggesting that they were more likely to vary and elaborate referents using lexical means. This aligns with Norrby and Håkansson’s (2007) observation that a higher frequency and diversity of nouns marks a shift toward a more explicit, context-independent style—a tendency associated with written language. Thus, even in oral narrative, literacy appears to shape how speakers manage reference, favoring lexical precision over reliance on discourse deixis. This aligns well with the nature of written language. Written discourse must explicitly mark relationships between events and ideas. As a result, literate individuals receive prolonged practice in how to structure descriptions, introduce referents, and connect clauses in ways that are self-contained and context-independent. While currently, there is little research on this issue, an emerging study provides suggestive evidence that more experience in receiving training to be more explicit in written language may have consequences for being explicit in oral language production (De la Garza et al., under review), among other cognitive skills such as Theory of Mind. In this vein, literate speakers may have used more varied vocabulary to be more explicit and to give more information about the story to their listeners. Fourth, literacy fosters more efficient lexical retrieval and enhances semantic organization. Research has shown that literate individuals outperform their non-literate peers on verbal fluency and lexical decision tasks (da Silva et al., 2004 ; author author under review; Kosmidis et al., 2006 ). This suggests that literacy does not just increase vocabulary size, but improves access to that vocabulary by strengthening semantic networks and reducing reliance on default, high-frequency forms. In the present study, literate participants’ ability to draw on more varied lexical items without increasing wordiness may reflect a more efficiently organized lexicon—one that enables them to select precise and varied words for the task at hand. Finally, and perhaps most critically, it is plausible that these mechanisms are not isolated or mutually exclusive; rather, they are deeply interconnected and mutually reinforcing. The cognitive restructuring brought about by literacy acquisition, such as enhanced working memory, and phonological awareness, forms a foundation that enables individuals to better process, retain, and manipulate the rich linguistic input available through written language. At the same time, more efficient semantic organization and lexical retrieval, supported by both cognitive and linguistic enhancements as a result of literacy acquisition, can allow literate speakers to select more precise, context-sensitive forms during real-time language use. These efficiencies feed back into discourse planning and narrative production: speakers are better able to anticipate listener needs, maintain coherence, and modulate information flow. In short, the effects of literacy are best understood not as parallel but as cascading, where each domain (i.e., cognition, lexicon, input exposure, and discourse structure) amplifies the development of the others, i.e., the Matthew Effect, Cunningham & Stanovich, 1998 ). This systemic interaction likely accounts for why literate individuals perform more consistently and with greater flexibility across a wide range of linguistic and communicative tasks, from morphosyntactic processing to narrative elaboration and referential communication. Understanding these mechanisms as mutually reinforcing also underscores the central hypothesis of this research program: that literacy serves not merely as a tool for decoding print, but as a developmental pathway that restructures and refines fundamental aspects of language and thought (Nielsen & Waldemar, 2013). To our knowledge, this is the first study to examine the effects of literacy on lexical diversity in Turkish adult speakers using a structured narrative production task. While previous research has established that literacy affects receptive vocabulary size and syntactic comprehension (e.g., Kim et al., 2013; Kosmidis et al., 2006 ; Dąbrowska, 2009 , 2018 ), the current study shows that these effects extend to productive language use, specifically, to the selection and variation of words in oral discourse. By conducting a part-of-speech-specific RootTTR analysis, we provide new, fine-grained evidence that literacy experience influences not just how many words are known, but how lexical resources are organized and deployed during real-time oral production. These findings also contribute to efforts to redress the WEIRD bias in psycholinguistics and cognitive science (Blasi et al., 2022 ; Henrich et al., 2010 ). Much of what we know about language processing and representation comes from literate, highly educated speakers in industrialized societies. By focusing on semi-literate Turkish adults, this study challenges the assumption that native speakers form a uniform baseline of linguistic competence. Instead, our results support a view of language as shaped by developmental experiences such as schooling and literacy, with measurable consequences for everyday communication. Differences in lexical diversity may have subtle but important implications for real-world communication, particularly for semi-literate speakers. The reduced variety in lexical choice, especially in elaborative categories such as adjectives, adverbs, and conjunctions, may pose challenges in more complex communicative situations. In contexts where clarity, precision, and explicitness are crucial, such as healthcare encounters, legal settings, or bureaucratic procedures, limited lexical flexibility could hinder speakers’ ability to describe events accurately, formulate questions, or follow instructions. Prior research highlights how speakers with limited linguistic resources may struggle in institutional interactions, particularly when additional barriers such as social stigma, minority status, or low education are present (Eades, 2008 ; Filipović, 2022 ). Although our semi-literate participants are native speakers of Turkish, the reduced lexical variety may go unnoticed by their interlocutors, potentially leading to misjudgments of communicative intent or competence. This is especially concerning in environments where clarification requests may be discouraged or met with stigma, such as hospitals or administrative offices (Gökçe & Yıldız, 2018 ). While our study does not claim that lower lexical diversity necessarily results in poorer communication, these patterns underscore the need for greater awareness of how language experience, including literacy, shapes expressive capabilities in ways that may affect individuals’ ability to navigate high-stakes interactions. While this study was not designed to evaluate educational or communicative interventions directly, the findings may offer preliminary insights for applied domains. The observed differences in lexical diversity between semi-literate and literate speakers suggest that sustained literacy experience may support not only vocabulary growth, but also the flexible use of language in communicative contexts. This has potential relevance for adult education programs, particularly those focused on enhancing oral communication skills alongside reading instruction. In contexts such as healthcare, legal settings, or bureaucratic interactions—where individuals are often required to describe situations clearly and understand complex spoken or written information—greater lexical flexibility and precision could plausibly support more effective communication. Although further research is needed to establish causal relationships and test applicability in specific domains, the current findings contribute to a growing body of evidence suggesting that literacy acquisition may enhance communicative competence in ways that extend beyond decoding written text. These findings may also inform policy discussions about adult literacy by highlighting the broader cognitive and linguistic benefits associated with access to reading and writing instruction. Similarly, In this context, policy-level interventions that improve communicative accessibility may be valuable. For example, German public institutions increasingly provide information in both standard and simplified German (Leichte Sprache), allowing for more inclusive access to essential services. Currently, there are no comparable guidelines or standardized practices for simplified Turkish. Developing accessible Turkish communication strategies—especially in institutional settings—may help reduce the burden on speakers whose struggles often go unnoticed and improve equitable access to services and information. Several limitations should be noted. First, the semi-literate group had very limited formal education (0.04 years on average). Future research should examine populations with intermediate literacy levels to better understand the gradient relationship between literacy, education and lexical diversity. Second, type-token ratio (and its derivatives) is useful but is prone to errors due to text length. Researchers (e.g., Fergadiotis et al., 2013 ) argue that other forms of lexical diversity such as the Measure of Textual Lexical Diversity, the Moving-Average Type-Token Ratio, D, and the Hypergeometric Distribution are much more sensitive in documenting differences in lexical diversity. Because these measures are calculated using specific software tools which are not yet available for Turkish, it was not possible in this study to measure them. Considering this was an initial attempt at documenting lexical diversity differences between semi-literate and literate speakers, future research could examine several other aspects. First, future research should use these more sensitive methods of measuring lexical diversity when software tools for Turkish are available. Second, considering good Theory of Mind skills improve reading others’ mental states, and what interlocutors do not know, higher Theory of Mind skills may improve lexical diversity. Third, investigating whether literacy-related differences in lexical diversity hold across other genres—such as conversation, explanation, or argumentation—would test the generality of these effects, since different spoken registers may necessitate the use of various vocabulary items. Finally, longitudinal studies of adults acquiring literacy later in life could provide valuable insight into how lexical diversity evolves with increased print exposure and formal language use. 5. Conclusion This study provides the first suggestive evidence that literacy acquisition fundamentally reshapes the lexical knowledge of adult language use in narrations. By analyzing the oral narratives of literate and semi-literate Turkish speakers, we demonstrated that literacy experience significantly enhances lexical diversity across nearly all parts of speech, not by increasing output quantity, but by enabling greater variation and precision in vocabulary use. These findings extend prior research on vocabulary size and syntactic comprehension by showing that literacy also affects the organization and deployment of lexical resources in spontaneous, discourse-level production. Our part-of-speech-specific RootTTR analysis revealed that literacy and its concomitant effects particularly enhances the use of elaborative and connective word categories—those most crucial for structuring explicit and coherent narratives. These patterns support usage-based accounts of language that emphasize the role of input frequency, cognitive restructuring, and discourse conventions in shaping linguistic competence. They also challenge the assumption that native speaker grammars are uniform across populations, highlighting literacy as a key factor in explaining individual variation. By centering a semi-literate population in a non-WEIRD context, this study contributes to a more inclusive and ecologically valid understanding of adult language. It emphasizes that literacy is not merely a tool for reading and writing, but a developmental experience that leaves deep and lasting imprints on how language is processed, structured, and used—even in oral domains. As such, literacy must be recognized as a critical variable in models of language acquisition, representation, and use. Declarations Author Contribution The author was solely responsible for the conception and design of the study, data collection, analysis and interpretation, and the drafting and revision of the manuscript. No other individuals contributed to the research or writing process. Acknowledgement I would like to sincerely thank Gülçin İrem Yıldırım, and Ece Gökçe all of their help during data collection. I would also like to thank all the semi-literate participants for their courage to participate in the study, and their teachers at the literacy centers. Without your help, this research would not have been possible. Data Availability https://osf.io/n725u/?view_only=fdde3783851244c3ac2cfe9d301c650f Funding declaration There was no funding in this study. References Author year Author year Author year Author year Arslan, S., Bamyacı, E., & Bastiaanse, R. (2016). A characterization of verb use in Turkish agrammatic narrative speech. Clinical Linguistics & Phonetics , 30 (6), 449–469. https://doi.org/10.3109/02699206.2016.1144224 Blasi, D. E., Henrich, J., Adamou, E., Kemmerer, D., & Majid, A. (2022). Over-reliance on English hinders cognitive science. Trends in Cognitive Sciences , 26 (12), 1153–1170. Cunningham, A. E., & Stanovich, K. E. (1998). What reading does for the mind. American Educator , 22 , 8–17. da Silva, C. G., Petersson, K. M., Faísca, L., Ingvar, M., & Reis, A. (2004). The effects of literacy and education on the quantitative and qualitative aspects of semantic verbal fluency. Journal of Clinical and Experimental Neuropsychology , 26 (2), 266–277. Dąbrowska, E. (2009). Words as constructions. In V. Evans & S. Pourcel (Eds.), Human Cognitive Processing (Vol. 24, pp. 201–223). John Benjamins Publishing Company. https://doi.org/10.1075/hcp.24.16dab Dąbrowska, E. (2012). Different speakers, different grammars: Individual differences in native language attainment. Linguistic Approaches to Bilingualism , 2 (3), 219–253. Dąbrowska, E. (2018). Experience, aptitude and individual differences in native language ultimate attainment. Cognition , 178 , 222–235. https://doi.org/10.1016/j.cognition.2018.05.018 Dąbrowska, E., Pascual, E., Macías-Gómez-Estern, B., & Llompart, M. (2023). Literacy-related differences in morphological knowledge: A nonce-word study. Frontiers in Psychology , 14 , 1136337. https://doi.org/10.3389/fpsyg.2023.1136337 Dehaene, S., Pegado, F., Braga, L. W., Ventura, P., Filho, G. N., Jobert, A., Dehaene-Lambertz, G., Kolinsky, R., Morais, J., & Cohen, L. (2010). How learning to read changes the cortical networks for vision and language. Science , 330 (6009), 1359–1364. Eades, D. (2008). Language and disadvantage before the law. Dimensions of Forensic Linguistics , 179–195. Fergadiotis, G., Wright, H. H., & West, T. M. (2013). Measuring Lexical Diversity in Narrative Discourse of People With Aphasia. American Journal of Speech-Language Pathology , 22 (2). https://doi.org/10.1044/1058-0360(2013/12-0083) Filipović, L. (2022). Language and culture as sources of inequality in US police interrogations. Applied Linguistics , 43 (6), 1073–1093. Friend, M., & Bates, R. P. (2014). The union of narrative and executive function: Different but complementary. Frontiers in Psychology , 5 , 469. Gökçe, N., & Yıldız, A. (2018). Türkiye’de okuma-yazma bilmeyen kadınlar ve okuma-yazma kurslarına katılmama nedenleri:“Ne edeyim okumayı, hayatım mı değişecek?” Kastamonu Education Journal , 26 (6), 2151–2161. Hayes, D. P., & Ahrens, M. G. (1988). Vocabulary simplification for children: A special case of ‘motherese’? Journal of Child Language , 15 (2), 395–410. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences , 33 (2–3), 61–83. https://doi.org/10.1017/S0140525X0999152X Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlỳ, P., & Suchomel, V. (2014). The sketch engine. Lexicography , 1 (1), 7–36. Kim, J., Yoon, J. H., Kim, S. R., & Kim, H. (2014). Effect of literacy level on cognitive and language tests in Korean illiterate older adults. Geriatrics & gerontology international , 14 (4), 911-917. Kosmidis, M. H., Tsapkini, K., & Folia, V. (2006). Lexical processing in illiteracy: Effect of literacy or education? Cortex , 42 (7), 1021–1027. Küntay, A. C., & Nakamura, K. (2004). Linguistic Strategies Serving Evaluative Functions: A Comparison between Japanese and Turkish Narratives. In Relating Events in Narrative, Volume 2 (pp. 329–358). Psychology Press. Maviş, İ., Tunçer, M., Üre, I., & Öztürk, S. (2014). The lexical quantification of aphasic spontaneous speech in Turkish: A comparison across discourse types. Proceedings of the 16th International Aphasia Rehabilitation Conference . Mayer, M. (1969). Frog, Where Are You? Dial Press. Mol, S. E., & Bus, A. G. (2011). To read or not to read: A meta-analysis of print exposure from infancy to early adulthood. Psychological Bulletin , 137 (2), 267. Morais, J., Cary, L., Alegria, J., & Bertelson, P. (1979). Does awareness of speech as a sequence of phones arise spontaneously? Cognition , 7 (4), 323–331. Navarro, D. (2015). Learning statistics with R: A tutorial for psychology students and other beginners. (Version 0.6) . University of New South Wales. https://learningstatisticswithr.com Nielsen, T. R., & Jørgensen, K. (2013). Visuoconstructional Abilities in Cognitively Healthy Illiterate Turkish Immigrants: A Quantitative and Qualitative Investigation. The Clinical Neuropsychologist , 27 (4), 681–692. https://doi.org/10.1080/13854046.2013.767379 UNESCO. (2024). Literacy . https://www.unesco.org/en/literacy Footnotes [1] It was not simple TTR but a different TTR “calculated by dividing the total number of different words in an entire sample by the square root of twice the total number of words in that sample” (Hess et al., 1984, pp. 52-53). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 07 Nov, 2025 Read the published version in Journal of Cultural Cognitive Science → Version 1 posted Editorial decision: Revision requested 08 Oct, 2025 Reviews received at journal 06 Oct, 2025 Reviewers agreed at journal 12 Aug, 2025 Reviewers invited by journal 11 Aug, 2025 Editor assigned by journal 11 Aug, 2025 Submission checks completed at journal 18 Jul, 2025 First submitted to journal 15 Jul, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7134036","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":500556440,"identity":"3eace9e6-7a2a-4792-8d9a-049f7aa1a0c5","order_by":0,"name":"Tan Arda Gedik","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAx0lEQVRIiWNgGAWjYPACZh4J9sYHIAZhtTxwLTyHDUjTwiAhkUykFnv2swcYv1RYy0jOfMz4mYfBWo6wLTx5CcwyZ9J5pKWTmaV5GNKNiXBYjgGzZNthHjnp/ANALYcTGwhq4X8D1PIPqEXyMPNvoJZ6wlokcgwYPzYc5pGWYGYD2ZJA2GE33hgcZjiWziPZk8xmOccg3ZCgLez9OYYPf9RY20scP8x8402FtTxBW0DgMA+caUCUBgYGxh9EKhwFo2AUjIIRCgA++S/YzfGeWAAAAABJRU5ErkJggg==","orcid":"","institution":"University of Erlangen-Nuremberg","correspondingAuthor":true,"prefix":"","firstName":"Tan","middleName":"Arda","lastName":"Gedik","suffix":""}],"badges":[],"createdAt":"2025-07-15 21:23:07","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7134036/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7134036/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s41809-025-00189-3","type":"published","date":"2025-11-07T15:58:02+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":89357648,"identity":"25310665-4c81-4d30-9f0d-d164bf3439de","added_by":"auto","created_at":"2025-08-19 07:40:58","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":187122,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eToken and type count per group and per part-of-speech\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7134036/v1/144b2d1e00f679ade3ff7070.png"},{"id":95564093,"identity":"2e50db8d-aff3-4a40-8146-d0c499b39593","added_by":"auto","created_at":"2025-11-10 16:07:45","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":737791,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7134036/v1/12ebd95e-cb3f-4c26-b188-5fea999f7438.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Literacy Enhances Lexical Variation, Not Quantity, in Adult Oral Production","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eDespite being native speakers of the same language, adult individuals vary substantially in their linguistic knowledge and language use. A growing body of research suggests that one major source of this variation is literacy experience\u0026mdash;that is, the extent to which individuals have been exposed to and engaged with written language throughout their lives (Dąbrowska, 2012; Author year). Among the areas most affected by literacy, vocabulary knowledge stands out: speakers show particularly large individual differences in their lexical repertoire.\u003c/p\u003e\n\u003cp\u003eThis variation stems in part from differences in input. Written language tends to contain a far richer and more diverse vocabulary than everyday speech. For instance, Hayes and Ahrens (1988) found that children\u0026rsquo;s books contained significantly more rare words than even the speech of college graduates (see Table 1 below). Similarly, Cunningham and Stanovich (1998) demonstrate that written texts consistently surpass spoken ones in lexical richness, suggesting that non-basic vocabulary is often acquired incidentally through print exposure, especially in adulthood (see Dąbrowska, 2009). Since there is only so much vocabulary that can be acquired through oral input alone, sustained engagement with written language becomes a key driver of vocabulary expansion.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 1\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eRichness of vocabulary across selected written and spoken modalities (adapted from Hayes \u0026amp; Ahrens, 1988)\u003c/em\u003e\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"592\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eProportion of text from 5,000 basic lexicon\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eRank of median Word\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNumber of rare words per 1,000 tokens\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eCollege graduates in conversation with friends and spouses\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e.94\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e496\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e17.3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003ePopular prime time tv\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e.94\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e490\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e22.7\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eChildren\u0026rsquo;s books\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e.92\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e627\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e30.9\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eAdult books\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e.88\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e1,058\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e52.7\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eNewspapers\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e.84\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e1,690\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e68.3\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eScientific articles\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e.70\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e4,389\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003e128.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eIndeed, numerous studies report robust correlations between reading experience and vocabulary knowledge, with coefficients ranging from .40 to .80\u0026mdash;even after controlling for reading comprehension and nonverbal IQ (Dąbrowska, 2018; Nation, 2013; Cunningham \u0026amp; Stanovich, 1998; Mol \u0026amp; Bus, 2011; Author year). These findings suggest that reading itself contributes directly to vocabulary growth, rather than being merely a byproduct of other cognitive abilities such as inference-making.\u003c/p\u003e\n\u003cp\u003eYet, literacy shapes more than just vocabulary size. Research in cognitive neuroscience and psycholinguistics shows that literacy acquisition restructures multiple domains of linguistic and cognitive processing, including phonological awareness, memory, syntactic parsing, and lexical access (Dehaene et al., 2010; Dąbrowska, 2012; Huettig \u0026amp; Mishra, 2014; Kosmidis et al., 2006; Morais et al., 1979). Despite these wide-ranging effects, most psycholinguistic research is based on highly literate speakers from WEIRD populations (Henrich et al., 2010; Blasi et al., 2022), overlooking the fact that limited literacy remains widespread: UNESCO estimates that over 770 million adults lack basic reading and writing skills worldwide.\u003c/p\u003e\n\u003cp\u003eThis gap in representation is especially relevant when studying productive vocabulary. Although prior studies have shown that literacy and education support receptive vocabulary knowledge, far less is known about how literacy shapes speakers\u0026rsquo; ability to deploy diverse lexical items in real-time communication. To date, only two studies have examined the effect of literacy on receptive vocabulary knowledge, broadly speaking (Kim et al., 2013; Kosmidis et al. 2006). Kim and colleagues (2013) tested L1 Korean speakers from varying literacy levels (illiterate, semi-literate, literate), and tested them using the Boston Naming Test. They found that the purely illiterate group performed the lowest and literacy level was the best predictor of performance). Kosmidis and colleagues (2006) took a slightly different approach and focused on receptive vocabulary knowledge via a lexical decision task. In this task semi-literate speakers with little to no education, literates with low education, and literates with many years of education were presented with pseudowords and real words in Greek and were asked to decide if the words presented were real or fake. High education-literate speakers outperformed low education-literate speakers, who outperformed semi-literate speakers. Since deciding which words in the task are real depends on one\u0026rsquo;s vocabulary size, Kosmidis and colleagues conclude based on the results that education is the main predictor there, over and above literacy. While such tasks tap into recognition and provide an important basis for related future research, they do not reflect the demands of spontaneous speech, where speakers must retrieve and select words dynamically depending on the context.\u003c/p\u003e\n\u003cp\u003eIn light of this, productive vocabulary offers a complementary and arguably more dynamic measure of lexical competence, as it captures not just recognition but the ability to retrieve and use words appropriately in real-time language production. This can be systematically assessed using metrics that quantify the diversity of vocabulary used in spontaneous speech (i.e., lexical diversity). One such widely used metric is the type\u0026ndash;token ratio (TTR), which has been suggested\u003csup\u003e\u003csup\u003e[1]\u003c/sup\u003e\u003c/sup\u003e to correlate with receptive vocabulary (Hess et al., 1984) and arguably reflects how flexibly speakers draw on their lexical resources. For example, studies on Turkish speakers with agrammatic aphasia have found that while total verb production remains intact, verb diversity (as measured by TTR) declines significantly (Arslan, Bamyacı, \u0026amp; Bastiaanse, 2016; Maviş et al., 2014). This suggests that reduced lexical diversity can reflect constraints in access and selection\u0026mdash;not just in receptive knowledge.\u003c/p\u003e\n\u003cp\u003eCrucially, such constraints may arise not only from brain injury but also from differences in developmental experience, including literacy. While some studies have begun to explore the effects of literacy on morphology and syntax (Dąbrowska et al., 2023; Author year), the impact of literacy on productive vocabulary use, particularly lexical diversity in oral production, remains poorly understood.\u003c/p\u003e\n\u003cp\u003eTo address this gap, the current study examines whether and how literacy influences lexical diversity in narrative speech. Since freestyle oral production introduces many confounding variables (i.e., length, discourse, topic) which can influence the qualities of the output, narrations emerge as a viable candidate to elicit oral production. Narratives require speakers to select and sequence events, manage referents, and use cohesive devices (Berman \u0026amp; Slobin, 1994). Narratives are often used in L1 acquisition studies with children or with L2 speakers to measure proficiency.\u003c/p\u003e\n\u003cp\u003eAlthough previous research has linked print exposure to vocabulary breadth in literate populations (author year; Mol \u0026amp; Bus, 2011), little is known about how this knowledge is mobilized in oral production by individuals with different literacy profiles. To our knowledge, this is the first study to systematically compare lexical diversity in narrative production between semi-literate and literate adults. We ask: Are there measurable differences in lexical diversity between the two groups? If so, what form do these differences take?\u003c/p\u003e\n\u003cp\u003eTo answer this, we elicited narratives from age-matched, cognitively healthy adult Turkish speakers using a wordless picture book. The choice of Turkish speakers was based on availability, and the task was designed to be accessible across literacy levels, enabling a controlled yet naturalistic comparison of productive lexical diversity.\u003c/p\u003e"},{"header":"2. Methodology","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003e2.1 Participants\u003c/h2\u003e\u003cp\u003eWe gathered data from 24 illiterate adult native Turkish speakers (all female, mean age\u0026thinsp;=\u0026thinsp;51.54, SD\u0026thinsp;=\u0026thinsp;12.95) and 24 age-matched literate adults (all female, mean age\u0026thinsp;=\u0026thinsp;45.33, SD\u0026thinsp;=\u0026thinsp;12.44). The literate group included participants with at least a secondary school education, distributed as follows: 7 associate degree holders, 7 bachelor\u0026rsquo;s degree holders, 7 master\u0026rsquo;s degree holders, and 3 PhD holders. The semi-literate participants were attending literacy classes at an adult education center in Ankara, Turkey, where they had been enrolled for an average of 4 months (SD\u0026thinsp;=\u0026thinsp;0.20). Some were repeating the course for a second or third time. The curriculum, covering around 80 teaching units of 40 minutes each, included basic literacy alongside mathematics, Turkish language, and introductory history. Successful completion of the program grants a certificate equivalent to a primary school diploma in Turkey.\u003c/p\u003e\u003cp\u003eBefore enrollment, participants were screened by the literacy center in collaboration with health professionals from local hospitals for speech, communication, or cognitive impairments such as dyslexia. Individuals with such conditions were not admitted to the literacy program, and consequently, were excluded from this study. Ethical approval for the study was obtained from the Bilkent University Ethics Committee (approval number: 2022_12_21_01, dated December 22, 2022).\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\u003ch2\u003e2.2 Materials\u003c/h2\u003e\u003cdiv id=\"Sec5\" class=\"Section3\"\u003e\u003ch2\u003e2.2.1 Narrations\u003c/h2\u003e\u003cp\u003eNarratives were elicited using a modified edition of \u003cem\u003eFrog, Where Are You?\u003c/em\u003e (Mayer, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e1969\u003c/span\u003e), a 24-page wordless picture book. The Frog Story has been widely used in linguistic research because it offers several practical advantages for eliciting spoken narratives. Visual prompts allow researchers to avoid potential interference from a participant\u0026rsquo;s second language or literacy level, since no reading is required. The storybook format also encourages speakers to produce extended, connected speech rather than isolated sentences, making it suitable for examining discourse-level features. Its accessibility and ability to generate rich linguistic data make it especially useful in contexts where literacy cannot be assumed.\u003c/p\u003e\u003cp\u003eTo accommodate semi-literate participants, who may find black-and-white illustrations challenging to interpret, we presented a colored version of the story; aside from the addition of color, no other changes were made. This particular book was chosen because its clear narrative arc naturally encourages storytellers to reflect on the characters\u0026rsquo; feelings and thoughts, creating a structured yet open-ended context to observe lexical choices, and was used in several previous studies, especially with children (e.g., Friend \u0026amp; Bates, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; K\u0026uuml;ntay \u0026amp; Nakamura, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2004\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eParticipants were first told that the book contains no words and follows the story of a boy, a dog, and a frog. They were then asked to examine each page in sequence and narrate the events to the experimenter in their own words while viewing the illustrations. The experimenter reminded them to describe not just what happened but also what the characters might be experiencing or thinking. Except for clarifying questions about unfamiliar characters (e.g., identifying animals like the deer or gopher), the experimenter refrained from guiding or interrupting the storytelling. Once the narratives were complete, participants were thanked for their time.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec6\" class=\"Section3\"\u003e\u003ch2\u003e2.2.2 Transcriptions\u003c/h2\u003e\u003cp\u003eAll recordings of the Frog Story narrations were transcribed verbatim by two native Turkish speakers with degrees in linguistics. Each transcript was first prepared by one transcriber and then reviewed independently by the second. Inter-rater agreement between the two transcribers before reconciliation was assessed using Cohen\u0026rsquo;s kappa (κ\u0026thinsp;=\u0026thinsp;0.96). Any disagreements of words were discussed and resolved collaboratively. Unintelligible words were repetitively listened to until they became intelligible. The transcriptions captured all elements of spontaneous speech, including repetitions, hesitations, and fillers, in order to preserve the natural structure of the narratives. To ensure the transcripts focused solely on the participants\u0026rsquo; storytelling, any interactions with the experimenter (such as clarifying questions or brief comments) were removed. While the transcripts reflected what speakers actually said, spelling and word forms followed the conventions of standard written Turkish, rather than phonetic transcription, to allow for consistent lexical analysis. Each transcript was first prepared by one transcriber and then reviewed by the second. Any differences were discussed and resolved collaboratively, based on an agreed-upon set of transcription conventions for marking pauses, self-corrections, and unintelligible segments. All transcripts were anonymized by removing personally identifying information and saved in UTF-8 encoded plain text format for further analysis.\u003c/p\u003e\u003c/div\u003e\u003c/div\u003e\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\u003ch2\u003e2.3 Extraction of linguistic indices and Type-Token Ratio\u003c/h2\u003e\u003cp\u003eTo extract lexical diversity measures from the narrative transcripts, we used Sketch Engine (Kilgarriff et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2014\u003c/span\u003e), a corpus analysis tool that provides automated part-of-speech tagging and frequency analysis capabilities for many languages. Sketch Engine was selected for several methodological advantages: (1) it offers robust morphological analysis specifically designed for Turkish, a morphologically rich language where accurate POS tagging requires sophisticated handling of complex inflectional and derivational processes; (2) it provides standardized tokenization procedures that ensure consistent identification of word boundaries across all transcripts; (3) its automated POS tagging reduces potential human coding errors and eliminates inter-rater reliability concerns that would arise from manual classification; and (4) it enables systematic extraction of both type (unique lexical items) and token (total word instances) counts across six major word classes: adjectives, adverbs, conjunctions, nouns, pronouns, and verbs. Finally, Sketch Engine is the only publicly available tool that can do lemmatization and POS tagging for Turkish for personal corpora.\u003c/p\u003e\u003cp\u003eAll transcripts were uploaded to the Sketch Engine platform, where they underwent automatic morphological parsing and POS tagging using the Turkish language model. The resulting frequency counts were extracted for each participant and word class, providing the raw data for subsequent RootTTR calculations and statistical analyses.\u003c/p\u003e\u003cp\u003eTo measure lexical diversity, we employed RootTTR (Root Type-Token Ratio), calculated as the number of unique words (types) divided by the square root of the total number of words (tokens). While more sophisticated measures such as MATTR (Moving Average Type-Token Ratio) or MTLD (Measure of Textual Lexical Diversity) theoretically provide more robust control for text length effects, these measures require specialized software implementations that are not currently available for Turkish morphological analysis. We also considered logTTR (logarithmic Type-Token Ratio), which applies a logarithmic transformation to reduce length dependency, but RootTTR provides improved length normalization compared to simple TTR, though more sophisticated measures such as MTLD may offer superior length independence (Koizumi \u0026amp; In'nami, 2012; Fergadiotis, Wright, \u0026amp; West, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2013\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eRootTTR was selected as the most appropriate available measure for several reasons: (1) it provides better length normalization than simple TTR by reducing the mathematical dependency between type-token ratios and text length; (2) it has been successfully employed in cross-linguistic research and is widely recognized in the lexical diversity literature; (3) it can be reliably calculated from the frequency data extracted through Sketch Engine without requiring additional software dependencies; and (4) most importantly, the potential limitations of RootTTR's length sensitivity are minimal in our dataset since narrative lengths did not differ significantly between groups (Mean\u0026thinsp;=\u0026thinsp;286.16 words for semi-literate vs. Mean\u0026thinsp;=\u0026thinsp;338.66 words for literate participants, p\u0026thinsp;=\u0026thinsp;0.28). This non-significant length difference ensures that any observed RootTTR differences between groups reflect genuine lexical diversity rather than text length artifacts, making RootTTR an appropriate and valid measure for our comparative analysis.\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003e2.4 Statistical analyses\u003c/h2\u003e\u003cp\u003eAfter preprocessing data for statistical analyses, we imported the data to RStudio (2024). We first obtained descriptive statistics. Then, to investigate the first research question, we coded group as literate vs. semi-literate and ran two-tailed t.tests using the t.test() function in R to test the reliability of the differences between the literate and semi-literate group. We used the lsr package (Navarro, 2021) to calculate effect sizes. The t-tests were carried out on each variable individually so as to minimize statistical interference effects of simultaneous multiple comparisons. Data, and the R code are available at the following link: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://osf.io/n725u/?view_only=fdde3783851244c3ac2cfe9d301c650f\u003c/span\u003e\u003cspan address=\"https://osf.io/n725u/?view_only=fdde3783851244c3ac2cfe9d301c650f\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003e2.5 Procedure\u003c/h2\u003e\u003cp\u003e Semi-literate participants were tested individually in a quiet, familiar room within the adult education centers, and each session was audio recorded. Before starting, the experimenters explained the study to participants and ensured they felt at ease. For illiterate participants, the experimenter read the consent form aloud, obtained oral consent, and marked the form with a plus sign in the appropriate space before beginning the recording. Literate participants were tested individually in a quiet office at the university; they read and signed the consent form themselves before recording began.\u003c/p\u003e\u003cp\u003eGiven that illiterate adults are considered a vulnerable population, special care was taken to support their comfort and well-being during the study. The experimenter provided frequent positive feedback\u0026mdash;for example, by saying \"Yes, amazing\" or \"You're doing very well\"\u0026mdash;and regularly checked whether participants were comfortable. Most participants were highly engaged and interacted actively with the experimenter, with nearly all reporting that they enjoyed the experience. A few even asked to take part again.\u003c/p\u003e\u003cp\u003eAt the beginning of each session, demographic information was collected, including age, length of time spent learning to read (for literate participants, this referred to how long they had been literate), years of formal education, highest educational qualification, and any previous literacy instruction before joining the adult education program. Following this, participants completed the Frog Story task. Each testing session took around 10\u0026ndash;15 minutes. During the task, the experimenter repeated prompts neutrally if participants hesitated or asked to hear them again and also entered participants\u0026rsquo; responses into a separate computer for backup.\u003c/p\u003e\u003c/div\u003e"},{"header":"3. Results","content":"\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e provides data on age, number of years spent in formal schooling, and the length of narrations. As can be seen, the two groups differ significantly from each other on all aspects except for age. The average educational attainment of the illiterate speakers in this study was 0.04 years (roughly 4 months), while literate speakers had, on average, about 17 years of schooling. The difference in narrative length between the two groups was not statistically significant, with each group producing narrations of similar length (literate speakers: 338 words per narration on average, semi-literate speakers: 286 words per narration on average). The literate corpus consisted of 9.613 tokens while the semi-literate corpus had 8.935 tokens, totaling 18.548 tokens.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003e\u003cem\u003eDescriptive statistics for the variables per group, standard deviation in parentheses, t-tests and effect sizes (Cohen\u0026rsquo;s d)\u003c/em\u003e\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSemi-literate\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eLiterate\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eMean (SD)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eRange\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eMean (SD)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eRange\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003eGroup comparison\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003eCohens d\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAge\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e51.54 (12.95)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e28\u0026ndash;80\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e45.33\u003c/p\u003e\u003cp\u003e(12.44)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e21\u0026ndash;68\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003et(1.69)\u0026thinsp;=\u0026thinsp;45.92, p\u0026thinsp;=\u0026thinsp;0.09\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.48\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eYears spent in formal education\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.04\u003c/p\u003e\u003cp\u003e(0.20)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0\u0026ndash;1\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e17.95\u003c/p\u003e\u003cp\u003e(4.50)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e12\u0026ndash;27\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003et(-19.45)\u0026thinsp;=\u0026thinsp;23.09, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e5.61\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNarrative length (in words)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e286.16\u003c/p\u003e\u003cp\u003e(145.89)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e128\u0026ndash;812\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e338.66\u003c/p\u003e\u003cp\u003e(189.14)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e87\u0026ndash;845\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003et(-1.07)\u0026thinsp;=\u0026thinsp;43.21, p\u0026thinsp;=\u0026thinsp;0.28\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.31\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eIn Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, we analyzed the effect of literacy on lexical production by comparing semi-literate and literate participants in terms of both the number of types (unique lexical items) and tokens (total instances of word use) across six parts of speech: adjectives, adverbs, conjunctions, nouns, pronouns, and verbs. For each category, we report mean differences, statistical significance, and effect sizes using Cohen\u0026rsquo;s d.\u003c/p\u003e\u003cp\u003eIn terms of types, literate speakers consistently produced a greater variety of words than semi-literate speakers. This difference was statistically significant for adjectives (p\u0026thinsp;\u0026lt;\u0026thinsp;.001, d\u0026thinsp;=\u0026thinsp;1.07), adverbs (p\u0026thinsp;\u0026lt;\u0026thinsp;.001, d\u0026thinsp;=\u0026thinsp;1.34), conjunctions (p\u0026thinsp;\u0026lt;\u0026thinsp;.001, d\u0026thinsp;=\u0026thinsp;1.16), and nouns (p\u0026thinsp;\u0026lt;\u0026thinsp;.001, d\u0026thinsp;=\u0026thinsp;1.15), with effect sizes indicating large differences in lexical diversity. The difference in verb types approached significance (p\u0026thinsp;=\u0026thinsp;.087) and showed a moderate effect size (d\u0026thinsp;=\u0026thinsp;0.50). For pronouns, no difference was found between groups (p\u0026thinsp;=\u0026thinsp;1, d\u0026thinsp;=\u0026thinsp;0).\u003c/p\u003e\u003cp\u003eAs for the token data, literate speakers produced significantly more adjective tokens (p\u0026thinsp;=\u0026thinsp;.0086, d\u0026thinsp;=\u0026thinsp;0.79), and adverb tokens (p\u0026thinsp;=\u0026thinsp;.0098, d\u0026thinsp;=\u0026thinsp;0.77), all with medium to large effect sizes. However, the differences in noun token counts were not statistically significant (p\u0026thinsp;=\u0026thinsp;.17, d\u0026thinsp;=\u0026thinsp;0.40), conjunction tokens (p\u0026thinsp;=\u0026thinsp;0.47, d = -0.21), and there was no reliable difference in the number of pronouns (p\u0026thinsp;=\u0026thinsp;.19, d = \u0026minus;\u0026thinsp;0.38) or verb tokens (p\u0026thinsp;=\u0026thinsp;.94, d = \u0026minus;\u0026thinsp;0.02) between the groups.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003e\u003cem\u003eDescriptive statistics for RootTTR values of parts-of-speech and overall narration per group, standard deviation in parentheses, t-tests and effect sizes (Cohen\u0026rsquo;s d)\u003c/em\u003e\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSemi-literate\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eLiterate\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRootTTR values\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eMean (SD)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eRange\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003eMean (SD)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003eRange\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003eGroup comparison\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003eCohens d\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAdjectives\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1.82\u003c/p\u003e\u003cp\u003e(0.54)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.57\u0026ndash;3.05\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3.26\u003c/p\u003e\u003cp\u003e(1.32)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e1.15\u0026ndash;6.12\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003et(-4.93)\u0026thinsp;=\u0026thinsp;30.65, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e1.42\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAdverbs\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1.62\u003c/p\u003e\u003cp\u003e(0.61)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0-2.75\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3.19\u003c/p\u003e\u003cp\u003e(0.89)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e1.73\u0026ndash;5.16\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003et(-7.07)\u0026thinsp;=\u0026thinsp;40.62, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e2.04\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eVerbs\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e3.51\u003c/p\u003e\u003cp\u003e(0.57)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.60\u0026ndash;4.83\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e4.24\u003c/p\u003e\u003cp\u003e(0.65)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e2.93\u0026ndash;5.67\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003et(-4.11)\u0026thinsp;=\u0026thinsp;45.31, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e1.18\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePronouns\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1.09\u003c/p\u003e\u003cp\u003e(0.36)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.26\u0026ndash;1.75\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e1.27\u003c/p\u003e\u003cp\u003e(0.31)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.70\u0026ndash;1.78\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003et(-1.75)\u0026thinsp;=\u0026thinsp;45.10, p\u0026thinsp;=\u0026thinsp;0.08\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.50\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eConjunctions\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.82\u003c/p\u003e\u003cp\u003e(0.29)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.30\u0026ndash;1.45\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e1.50\u003c/p\u003e\u003cp\u003e(0.35)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e1-2.26\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003et(-7.25)\u0026thinsp;=\u0026thinsp;44.11, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e2.08\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNouns\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e3.49\u003c/p\u003e\u003cp\u003e(0.78)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e2.37\u0026ndash;5.29\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e5.35\u003c/p\u003e\u003cp\u003e(1.02)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e3.79\u0026ndash;8.15\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003et(-7.05)\u0026thinsp;=\u0026thinsp;43.18, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e2.03\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eOverall\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e7.43\u003c/p\u003e\u003cp\u003e(1.33)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e5.38\u0026ndash;9.68\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e7.45\u003c/p\u003e\u003cp\u003e(2.96)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e2.69\u0026ndash;16.03\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003et(-0.03)\u0026thinsp;=\u0026thinsp;31.92, p\u0026thinsp;=\u0026thinsp;0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.00\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eTo evaluate the effect of literacy on lexical diversity, in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e we compared RootTTR scores across six parts of speech using independent t-tests: adjectives, adverbs, verbs, pronouns, conjunctions, and nouns. Across most categories, literate speakers demonstrated significantly higher RootTTR values than semi-literate speakers, indicating more diverse lexical production.\u003c/p\u003e\u003cp\u003eThe RootTTR for adjectives was significantly higher in the literate group (Mean\u0026thinsp;=\u0026thinsp;3.26) than in the semi-literate group (Mean\u0026thinsp;=\u0026thinsp;1.82), t(30.65) = \u0026minus;\u0026thinsp;4.93, p\u0026thinsp;\u0026lt;\u0026thinsp;.001, with a large effect size (d\u0026thinsp;=\u0026thinsp;1.42). A similar pattern emerged for adverbs, where literate speakers again outperformed semi-literate speakers (Means\u0026thinsp;=\u0026thinsp;3.19 vs. 1.62), t(40.62) = \u0026minus;\u0026thinsp;7.07, p\u0026thinsp;\u0026lt;\u0026thinsp;.001, with a very large effect (d\u0026thinsp;=\u0026thinsp;2.04). For verbs, the literate group also showed higher RootTTR values (Mean\u0026thinsp;=\u0026thinsp;4.24) compared to the semi-literate group (Mean\u0026thinsp;=\u0026thinsp;3.51), t(45.31) = \u0026minus;\u0026thinsp;4.11, p\u0026thinsp;\u0026lt;\u0026thinsp;.001, d\u0026thinsp;=\u0026thinsp;1.18.\u003c/p\u003e\u003cp\u003eThe contrast in pronouns was less pronounced. While the literate group had slightly higher RootTTR values (Mean\u0026thinsp;=\u0026thinsp;1.27) than the semi-literate group (Mean\u0026thinsp;=\u0026thinsp;1.09), the difference did not reach conventional levels of statistical significance, t(45.10) = \u0026minus;\u0026thinsp;1.75, p\u0026thinsp;=\u0026thinsp;.08, with a moderate effect size (d\u0026thinsp;=\u0026thinsp;0.50).\u003c/p\u003e\u003cp\u003eFor conjunctions, RootTTR was significantly greater in the literate group (Mean\u0026thinsp;=\u0026thinsp;1.50) than in the semi-literate group (Mean\u0026thinsp;=\u0026thinsp;0.82), t(44.11) = \u0026minus;\u0026thinsp;7.25, p\u0026thinsp;\u0026lt;\u0026thinsp;.001, yielding a very large effect (d\u0026thinsp;=\u0026thinsp;2.08). A similarly strong effect was observed for nouns, where literate speakers (Mean\u0026thinsp;=\u0026thinsp;5.35) showed significantly higher RootTTR values than semi-literate speakers (Mean\u0026thinsp;=\u0026thinsp;3.49), p\u0026thinsp;\u0026lt;\u0026thinsp;.001, with Cohen\u0026rsquo;s d\u0026thinsp;=\u0026thinsp;2.03.\u003c/p\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eThe present study investigated how literacy acquisition affects lexical diversity in adult L1 Turkish speakers by comparing semi-literate and literate participants\u0026rsquo; output in a narrative production task. To investigate this, we elicited narrations from semi-literate and literate L1 Turkish speakers on a wordless picture book (Frog Story), and conducted a lexical diversity analysis by comparing types and tokens as well as RootTTRs of various parts-of-speech (POS) using SketchEngine. Our findings reveal systematic differences in lexical diversity that illuminates the impact of years of literacy experience on minute differences in lexical usage in oral production. The absence of a group difference in total word count indicates that both literate and semi-literate participants were equally willing or able to participate in the narrative task. Given that the Frog Story has been used with young children to elicit narrations (e.g., \u0026Ouml;gel Balaban \u0026amp; Hohenberger, 2020), we contend that the task did not pose cognitive demands. Moreover, participants had ample time to familiarize themselves with the story before beginning their narration, and were allowed to look at the pictures while speaking, further supporting the accessibility of the task. Nevertheless, the internal linguistic architecture of their stories diverged sharply between groups.\u003c/p\u003e\u003cp\u003e The type-token comparison analysis revealed a consistent pattern across word classes: while both groups produced comparable quantities of words (tokens) per POS (except for adverbs and adjectives), literate participants demonstrated significantly greater lexical variety (types) in their vocabulary choices. This pattern was most pronounced for nouns, where despite using similar numbers of noun tokens (~\u0026thinsp;140 each), literate participants employed significantly more distinct nouns (d\u0026thinsp;=\u0026thinsp;1.15, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), suggesting they varied their referential expressions rather than repeatedly using the same terms. A similar trend emerged for the rest of the POS categories except for pronouns, indicating that literate speakers consistently drew from a broader lexical repertoire across word classes. The fact that there was no difference in pronouns is expected, since there is only a small number of pronouns, which need to be used in both written and spoken language. Thus, there is no reason to expect literacy-related advantages in pronoun usage in terms of lexical diversity.\u003c/p\u003e\u003cp\u003eGiven the robust and widespread differences in lexical types across parts of speech, it was especially compelling to examine type\u0026ndash;token ratios as an index of productive vocabulary. Our findings revealed a systematic advantage for literate speakers in RootTTR scores across nearly all categories. The most pronounced disparities were observed for conjunctions and adverbs\u0026mdash;categories crucial for linking, elaborating, and structuring discourse. These are precisely the lexical classes that spoken interaction in oral cultures tends to underuse or convey through prosody, repetition, or parataxis (cf. Yıldız, 2006; G\u0026ouml;k\u0026ccedil;e, 2016). Substantial differences in adjectives (d\u0026thinsp;=\u0026thinsp;1.42) and verbs (d\u0026thinsp;=\u0026thinsp;1.18)\u0026mdash;the descriptive and predicative core of narrative\u0026mdash;further suggest that literacy contributes to deeper lexical precision and semantic range.\u003c/p\u003e\u003cp\u003eOf course, it is important to acknowledge that literacy is not an isolated variable. As will be discussed below, it is closely intertwined with education, cognitive development, and broader life experience\u0026mdash;including opportunities for social interaction, autonomy in communication, and exposure to diverse linguistic registers (Fingeret, 1983; G\u0026ouml;k\u0026ccedil;e \u0026amp; Yıldız, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). The differences observed in RootTTR may therefore reflect the cumulative impact of these intersecting factors. However, we interpret literacy as a central explanatory variable because it encapsulates many of these life experiences and reflects the availability of written language as a persistent linguistic resource.\u003c/p\u003e\u003cp\u003eIn this light, literacy appears to enhance productive vocabulary by reducing lexical repetition and facilitating access to more diverse and precise vocabulary. The consistent pattern across multiple POS categories supports a domain-general effect, in line with theories that posit richer, more abstract lexical representations among speakers with regular exposure to written language (e.g., Bybee, 2010). This interpretation also resonates with cognitive and neuropsychological work showing that lexical diversity is sensitive to constraints in lexical access and selection, rather than just storage (Arslan et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Maviş et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2014\u003c/span\u003e). Overall, these findings reinforce the view that literacy\u0026mdash;understood as both a linguistic and socio-cognitive resource\u0026mdash;broadens the lexicon and shapes the way speakers access and deploy vocabulary in narrative discourse.\u003c/p\u003e\u003cp\u003eBefore turning to the discussion on why literacy may be an important predictor of lexical diversity, it is important to acknowledge a key methodological factor: the literate and semi-literate groups in this study differ not only in their literacy experience but also in broader life experience. These life experience factors include the amount and quality of social interaction (e.g., with family, peers, or strangers), the level and duration of formal education, habits of language use such as reading, writing, viewing, and speaking, work-related activities, hobbies and interests, and knowledge of other languages. In Turkey, semi-literate individuals often come from low socioeconomic backgrounds (Aktaş, 2007) and are unable to attend formal schooling due to structural and patriarchal barriers (G\u0026ouml;k\u0026ccedil;e \u0026amp; Yıldız, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Semi-literacy in this context functions as a cumulative disadvantage, affecting cognitive domains such as nonverbal reasoning and grammatical comprehension (author under review), and also undermining non-cognitive factors such as self-confidence, autonomy, and communicative independence (Fingeret, 1983; G\u0026ouml;k\u0026ccedil;e, 2016). Women in these communities frequently lead restricted lives; they are perceived as incompetent by others, rely on close relatives to manage bureaucratic or medical appointments, and are often discouraged from engaging in public life without accompaniment (G\u0026ouml;k\u0026ccedil;e \u0026amp; Yıldız, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). As a result, their interactions tend to occur within close-knit, esoteric communities where oral culture dominates and social exchanges are confined to familiar interlocutors (Yıldız, 2006; Wray \u0026amp; Grace, 2007; G\u0026ouml;k\u0026ccedil;e, 2016).\u003c/p\u003e\u003cp\u003eThis limits not only opportunities for language development, but also exposure to varied communicative settings. In practice, literacy and education are deeply intertwined\u0026mdash;those who spend more years in school are more exposed to written language and are more likely to engage in sustained reading. This relationship is reflected in our sample: group membership (literate vs. semi-literate) is strongly correlated with years of education (r\u0026thinsp;=\u0026thinsp;.94, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), confirming that the two variables are tightly coupled. Prior research has also documented this association, showing that time spent in formal education predicts reading habits and literacy outcomes (Dąbrowska, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; author year). While we refer to literacy as the primary explanatory factor in the discussion below, the effects may also reflect, or be amplified by, broader differences in educational and experiential backgrounds. These interconnected influences\u0026mdash;such as socioeconomic status, gendered restrictions on mobility, and the richness of communicative environments\u0026mdash;deserve closer investigation in future work. However, given the correlational nature of these factors with semi-literacy, and given the theoretical implications of literacy experience on the structure and accessibility of linguistic knowledge, the present study focuses specifically on literacy as a central variable in understanding individual differences in vocabulary use and representation, as the semi-literate/literate distinction also draws on these other factors explained above.\u003c/p\u003e\u003cp\u003eWhy should literacy be an important predictor in lexical diversity in oral narrations? First, literacy acquisition has been shown to fundamentally restructure cognitive architecture. Dehaene and colleagues (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2010\u003c/span\u003e) demonstrated that learning to read leads to measurable changes in brain function, including enhanced working memory, phonological processing, and abstraction. These general cognitive effects of literacy can make lexical access more efficient and flexible, especially under the demands of spontaneous speech while looking at pictures. In the present study, literate participants used a wider range of lexical types despite producing a comparable number of tokens\u0026mdash;suggesting that literacy supports not just language learning, but the dynamic retrieval and deployment of vocabulary during real-time language use.\u003c/p\u003e\u003cp\u003eSecond, literacy changes the nature of linguistic input by exposing speakers to a lexically richer variety of language. Written texts, including children\u0026rsquo;s books, consistently contain more rare and diverse vocabulary than spoken conversation (Hayes \u0026amp; Ahrens, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e1988\u003c/span\u003e; Cunningham \u0026amp; Stanovich, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e1998\u003c/span\u003e). This enriched exposure enables the acquisition and entrenchment of low-frequency words that rarely occur in speech. Dąbrowska (\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2009\u003c/span\u003e) argues that most adult vocabulary is acquired through reading, and studies have repeatedly shown that print exposure is a strong predictor of receptive vocabulary size (e.g., Dąbrowska, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; author year; Mold \u0026amp; Bus, 2011). However, the effects observed in this study go beyond mere exposure to more types. If frequency alone explained the pattern, we would expect increases in both types and tokens across all categories. Instead, the selective enhancement of elaborative categories (e.g., adjectives, adverbs, conjunctions) suggests that written input not only expands the lexicon, but shapes its structure and usage in nuanced ways.\u003c/p\u003e\u003cp\u003e Third, sustained engagement with written language may promote discourse-level strategies that prioritize explicitness, precision, and coherence in writing, i.e., being explicit. Although the Frog stories were delivered orally with help from pictures, the task itself is relatively decontextualized: it presents new content and requires speakers to construct a narrative from scratch. In such tasks, speakers may prefer more explicit reference strategies\u0026mdash;such as repeating full noun phrases\u0026mdash;instead of relying on pronouns that assume shared contextual knowledge. This pattern is reflected in our data: both groups used pronouns at comparable rates, consistent with their role in spoken, contextually embedded discourse. However, literate participants employed a significantly greater variety of nouns, suggesting that they were more likely to vary and elaborate referents using lexical means. This aligns with Norrby and H\u0026aring;kansson\u0026rsquo;s (2007) observation that a higher frequency and diversity of nouns marks a shift toward a more explicit, context-independent style\u0026mdash;a tendency associated with written language. Thus, even in oral narrative, literacy appears to shape how speakers manage reference, favoring lexical precision over reliance on discourse deixis.\u003c/p\u003e\u003cp\u003eThis aligns well with the nature of written language. Written discourse must explicitly mark relationships between events and ideas. As a result, literate individuals receive prolonged practice in how to structure descriptions, introduce referents, and connect clauses in ways that are self-contained and context-independent. While currently, there is little research on this issue, an emerging study provides suggestive evidence that more experience in receiving training to be more explicit in written language may have consequences for being explicit in oral language production (De la Garza et al., under review), among other cognitive skills such as Theory of Mind. In this vein, literate speakers may have used more varied vocabulary to be more explicit and to give more information about the story to their listeners.\u003c/p\u003e\u003cp\u003eFourth, literacy fosters more efficient lexical retrieval and enhances semantic organization. Research has shown that literate individuals outperform their non-literate peers on verbal fluency and lexical decision tasks (da Silva et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2004\u003c/span\u003e; author author under review; Kosmidis et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2006\u003c/span\u003e). This suggests that literacy does not just increase vocabulary size, but improves access to that vocabulary by strengthening semantic networks and reducing reliance on default, high-frequency forms. In the present study, literate participants\u0026rsquo; ability to draw on more varied lexical items without increasing wordiness may reflect a more efficiently organized lexicon\u0026mdash;one that enables them to select precise and varied words for the task at hand.\u003c/p\u003e\u003cp\u003eFinally, and perhaps most critically, it is plausible that these mechanisms are not isolated or mutually exclusive; rather, they are deeply interconnected and mutually reinforcing. The cognitive restructuring brought about by literacy acquisition, such as enhanced working memory, and phonological awareness, forms a foundation that enables individuals to better process, retain, and manipulate the rich linguistic input available through written language. At the same time, more efficient semantic organization and lexical retrieval, supported by both cognitive and linguistic enhancements as a result of literacy acquisition, can allow literate speakers to select more precise, context-sensitive forms during real-time language use. These efficiencies feed back into discourse planning and narrative production: speakers are better able to anticipate listener needs, maintain coherence, and modulate information flow. In short, the effects of literacy are best understood not as parallel but as cascading, where each domain (i.e., cognition, lexicon, input exposure, and discourse structure) amplifies the development of the others, i.e., the Matthew Effect, Cunningham \u0026amp; Stanovich, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e1998\u003c/span\u003e). This systemic interaction likely accounts for why literate individuals perform more consistently and with greater flexibility across a wide range of linguistic and communicative tasks, from morphosyntactic processing to narrative elaboration and referential communication. Understanding these mechanisms as mutually reinforcing also underscores the central hypothesis of this research program: that literacy serves not merely as a tool for decoding print, but as a developmental pathway that restructures and refines fundamental aspects of language and thought (Nielsen \u0026amp; Waldemar, 2013).\u003c/p\u003e\u003cp\u003eTo our knowledge, this is the first study to examine the effects of literacy on lexical diversity in Turkish adult speakers using a structured narrative production task. While previous research has established that literacy affects receptive vocabulary size and syntactic comprehension (e.g., Kim et al., 2013; Kosmidis et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2006\u003c/span\u003e; Dąbrowska, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2009\u003c/span\u003e, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2018\u003c/span\u003e), the current study shows that these effects extend to productive language use, specifically, to the selection and variation of words in oral discourse. By conducting a part-of-speech-specific RootTTR analysis, we provide new, fine-grained evidence that literacy experience influences not just how many words are known, but how lexical resources are organized and deployed during real-time oral production.\u003c/p\u003e\u003cp\u003eThese findings also contribute to efforts to redress the WEIRD bias in psycholinguistics and cognitive science (Blasi et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Henrich et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). Much of what we know about language processing and representation comes from literate, highly educated speakers in industrialized societies. By focusing on semi-literate Turkish adults, this study challenges the assumption that native speakers form a uniform baseline of linguistic competence. Instead, our results support a view of language as shaped by developmental experiences such as schooling and literacy, with measurable consequences for everyday communication.\u003c/p\u003e\u003cp\u003eDifferences in lexical diversity may have subtle but important implications for real-world communication, particularly for semi-literate speakers. The reduced variety in lexical choice, especially in elaborative categories such as adjectives, adverbs, and conjunctions, may pose challenges in more complex communicative situations. In contexts where clarity, precision, and explicitness are crucial, such as healthcare encounters, legal settings, or bureaucratic procedures, limited lexical flexibility could hinder speakers\u0026rsquo; ability to describe events accurately, formulate questions, or follow instructions. Prior research highlights how speakers with limited linguistic resources may struggle in institutional interactions, particularly when additional barriers such as social stigma, minority status, or low education are present (Eades, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2008\u003c/span\u003e; Filipović, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Although our semi-literate participants are native speakers of Turkish, the reduced lexical variety may go unnoticed by their interlocutors, potentially leading to misjudgments of communicative intent or competence. This is especially concerning in environments where clarification requests may be discouraged or met with stigma, such as hospitals or administrative offices (G\u0026ouml;k\u0026ccedil;e \u0026amp; Yıldız, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). While our study does not claim that lower lexical diversity necessarily results in poorer communication, these patterns underscore the need for greater awareness of how language experience, including literacy, shapes expressive capabilities in ways that may affect individuals\u0026rsquo; ability to navigate high-stakes interactions.\u003c/p\u003e\u003cp\u003eWhile this study was not designed to evaluate educational or communicative interventions directly, the findings may offer preliminary insights for applied domains. The observed differences in lexical diversity between semi-literate and literate speakers suggest that sustained literacy experience may support not only vocabulary growth, but also the flexible use of language in communicative contexts. This has potential relevance for adult education programs, particularly those focused on enhancing oral communication skills alongside reading instruction. In contexts such as healthcare, legal settings, or bureaucratic interactions\u0026mdash;where individuals are often required to describe situations clearly and understand complex spoken or written information\u0026mdash;greater lexical flexibility and precision could plausibly support more effective communication. Although further research is needed to establish causal relationships and test applicability in specific domains, the current findings contribute to a growing body of evidence suggesting that literacy acquisition may enhance communicative competence in ways that extend beyond decoding written text. These findings may also inform policy discussions about adult literacy by highlighting the broader cognitive and linguistic benefits associated with access to reading and writing instruction. Similarly,\u003c/p\u003e\u003cp\u003eIn this context, policy-level interventions that improve communicative accessibility may be valuable. For example, German public institutions increasingly provide information in both standard and simplified German (Leichte Sprache), allowing for more inclusive access to essential services. Currently, there are no comparable guidelines or standardized practices for simplified Turkish. Developing accessible Turkish communication strategies\u0026mdash;especially in institutional settings\u0026mdash;may help reduce the burden on speakers whose struggles often go unnoticed and improve equitable access to services and information.\u003c/p\u003e\u003cp\u003eSeveral limitations should be noted. First, the semi-literate group had very limited formal education (0.04 years on average). Future research should examine populations with intermediate literacy levels to better understand the gradient relationship between literacy, education and lexical diversity. Second, type-token ratio (and its derivatives) is useful but is prone to errors due to text length. Researchers (e.g., Fergadiotis et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2013\u003c/span\u003e) argue that other forms of lexical diversity such as the Measure of Textual Lexical Diversity, the Moving-Average Type-Token Ratio, D, and the Hypergeometric Distribution are much more sensitive in documenting differences in lexical diversity. Because these measures are calculated using specific software tools which are not yet available for Turkish, it was not possible in this study to measure them.\u003c/p\u003e\u003cp\u003eConsidering this was an initial attempt at documenting lexical diversity differences between semi-literate and literate speakers, future research could examine several other aspects. First, future research should use these more sensitive methods of measuring lexical diversity when software tools for Turkish are available. Second, considering good Theory of Mind skills improve reading others\u0026rsquo; mental states, and what interlocutors do not know, higher Theory of Mind skills may improve lexical diversity. Third, investigating whether literacy-related differences in lexical diversity hold across other genres\u0026mdash;such as conversation, explanation, or argumentation\u0026mdash;would test the generality of these effects, since different spoken registers may necessitate the use of various vocabulary items. Finally, longitudinal studies of adults acquiring literacy later in life could provide valuable insight into how lexical diversity evolves with increased print exposure and formal language use.\u003c/p\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003eThis study provides the first suggestive evidence that literacy acquisition fundamentally reshapes the lexical knowledge of adult language use in narrations. By analyzing the oral narratives of literate and semi-literate Turkish speakers, we demonstrated that literacy experience significantly enhances lexical diversity across nearly all parts of speech, not by increasing output quantity, but by enabling greater variation and precision in vocabulary use. These findings extend prior research on vocabulary size and syntactic comprehension by showing that literacy also affects the organization and deployment of lexical resources in spontaneous, discourse-level production.\u003c/p\u003e\u003cp\u003eOur part-of-speech-specific RootTTR analysis revealed that literacy and its concomitant effects particularly enhances the use of elaborative and connective word categories\u0026mdash;those most crucial for structuring explicit and coherent narratives. These patterns support usage-based accounts of language that emphasize the role of input frequency, cognitive restructuring, and discourse conventions in shaping linguistic competence. They also challenge the assumption that native speaker grammars are uniform across populations, highlighting literacy as a key factor in explaining individual variation.\u003c/p\u003e\u003cp\u003eBy centering a semi-literate population in a non-WEIRD context, this study contributes to a more inclusive and ecologically valid understanding of adult language. It emphasizes that literacy is not merely a tool for reading and writing, but a developmental experience that leaves deep and lasting imprints on how language is processed, structured, and used\u0026mdash;even in oral domains. As such, literacy must be recognized as a critical variable in models of language acquisition, representation, and use.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eThe author was solely responsible for the conception and design of the study, data collection, analysis and interpretation, and the drafting and revision of the manuscript. No other individuals contributed to the research or writing process.\u003c/p\u003e\u003ch2\u003eAcknowledgement\u003c/h2\u003e\u003cp\u003eI would like to sincerely thank G\u0026uuml;l\u0026ccedil;in İrem Yıldırım, and Ece G\u0026ouml;k\u0026ccedil;e all of their help during data collection. I would also like to thank all the semi-literate participants for their courage to participate in the study, and their teachers at the literacy centers. Without your help, this research would not have been possible.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003ehttps://osf.io/n725u/?view_only=fdde3783851244c3ac2cfe9d301c650f\u003c/p\u003e\u003cp\u003eFunding declaration\u003c/p\u003e\n\u003cp\u003eThere was no funding in this study.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAuthor year\u003c/li\u003e\n\u003cli\u003eAuthor year\u003c/li\u003e\n\u003cli\u003eAuthor year\u003c/li\u003e\n\u003cli\u003eAuthor year\u003c/li\u003e\n\u003cli\u003eArslan, S., Bamyacı, E., \u0026amp; Bastiaanse, R. (2016). A characterization of verb use in Turkish agrammatic narrative speech. \u003cem\u003eClinical Linguistics \u0026amp; Phonetics\u003c/em\u003e, \u003cem\u003e30\u003c/em\u003e(6), 449\u0026ndash;469. https://doi.org/10.3109/02699206.2016.1144224\u003c/li\u003e\n\u003cli\u003eBlasi, D. E., Henrich, J., Adamou, E., Kemmerer, D., \u0026amp; Majid, A. (2022). Over-reliance on English hinders cognitive science. \u003cem\u003eTrends in Cognitive Sciences\u003c/em\u003e, \u003cem\u003e26\u003c/em\u003e(12), 1153\u0026ndash;1170.\u003c/li\u003e\n\u003cli\u003eCunningham, A. E., \u0026amp; Stanovich, K. E. (1998). What reading does for the mind. \u003cem\u003eAmerican Educator\u003c/em\u003e, \u003cem\u003e22\u003c/em\u003e, 8\u0026ndash;17.\u003c/li\u003e\n\u003cli\u003eda Silva, C. G., Petersson, K. M., Fa\u0026iacute;sca, L., Ingvar, M., \u0026amp; Reis, A. (2004). The effects of literacy and education on the quantitative and qualitative aspects of semantic verbal fluency. \u003cem\u003eJournal of Clinical and Experimental Neuropsychology\u003c/em\u003e, \u003cem\u003e26\u003c/em\u003e(2), 266\u0026ndash;277.\u003c/li\u003e\n\u003cli\u003eDąbrowska, E. (2009). Words as constructions. In V. Evans \u0026amp; S. Pourcel (Eds.), \u003cem\u003eHuman Cognitive Processing\u003c/em\u003e (Vol. 24, pp. 201\u0026ndash;223). John Benjamins Publishing Company. https://doi.org/10.1075/hcp.24.16dab\u003c/li\u003e\n\u003cli\u003eDąbrowska, E. (2012). Different speakers, different grammars: Individual differences in native language attainment. \u003cem\u003eLinguistic Approaches to Bilingualism\u003c/em\u003e, \u003cem\u003e2\u003c/em\u003e(3), 219\u0026ndash;253.\u003c/li\u003e\n\u003cli\u003eDąbrowska, E. (2018). Experience, aptitude and individual differences in native language ultimate attainment. \u003cem\u003eCognition\u003c/em\u003e, \u003cem\u003e178\u003c/em\u003e, 222\u0026ndash;235. https://doi.org/10.1016/j.cognition.2018.05.018\u003c/li\u003e\n\u003cli\u003eDąbrowska, E., Pascual, E., Mac\u0026iacute;as-G\u0026oacute;mez-Estern, B., \u0026amp; Llompart, M. (2023). Literacy-related differences in morphological knowledge: A nonce-word study. \u003cem\u003eFrontiers in Psychology\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e, 1136337. https://doi.org/10.3389/fpsyg.2023.1136337\u003c/li\u003e\n\u003cli\u003eDehaene, S., Pegado, F., Braga, L. W., Ventura, P., Filho, G. N., Jobert, A., Dehaene-Lambertz, G., Kolinsky, R., Morais, J., \u0026amp; Cohen, L. (2010). How learning to read changes the cortical networks for vision and language. \u003cem\u003eScience\u003c/em\u003e, \u003cem\u003e330\u003c/em\u003e(6009), 1359\u0026ndash;1364.\u003c/li\u003e\n\u003cli\u003eEades, D. (2008). Language and disadvantage before the law. \u003cem\u003eDimensions of Forensic Linguistics\u003c/em\u003e, 179\u0026ndash;195.\u003c/li\u003e\n\u003cli\u003eFergadiotis, G., Wright, H. H., \u0026amp; West, T. M. (2013). Measuring Lexical Diversity in Narrative Discourse of People With Aphasia. \u003cem\u003eAmerican Journal of Speech-Language Pathology\u003c/em\u003e, \u003cem\u003e22\u003c/em\u003e(2). https://doi.org/10.1044/1058-0360(2013/12-0083)\u003c/li\u003e\n\u003cli\u003eFilipović, L. (2022). Language and culture as sources of inequality in US police interrogations. \u003cem\u003eApplied Linguistics\u003c/em\u003e, \u003cem\u003e43\u003c/em\u003e(6), 1073\u0026ndash;1093.\u003c/li\u003e\n\u003cli\u003eFriend, M., \u0026amp; Bates, R. P. (2014). The union of narrative and executive function: Different but complementary. \u003cem\u003eFrontiers in Psychology\u003c/em\u003e, \u003cem\u003e5\u003c/em\u003e, 469.\u003c/li\u003e\n\u003cli\u003eG\u0026ouml;k\u0026ccedil;e, N., \u0026amp; Yıldız, A. (2018). T\u0026uuml;rkiye\u0026rsquo;de okuma-yazma bilmeyen kadınlar ve okuma-yazma kurslarına katılmama nedenleri:\u0026ldquo;Ne edeyim okumayı, hayatım mı değişecek?\u0026rdquo; \u003cem\u003eKastamonu Education Journal\u003c/em\u003e, \u003cem\u003e26\u003c/em\u003e(6), 2151\u0026ndash;2161.\u003c/li\u003e\n\u003cli\u003eHayes, D. P., \u0026amp; Ahrens, M. G. (1988). Vocabulary simplification for children: A special case of \u0026lsquo;motherese\u0026rsquo;? \u003cem\u003eJournal of Child Language\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(2), 395\u0026ndash;410.\u003c/li\u003e\n\u003cli\u003eHenrich, J., Heine, S. J., \u0026amp; Norenzayan, A. (2010). The weirdest people in the world? \u003cem\u003eBehavioral and Brain Sciences\u003c/em\u003e, \u003cem\u003e33\u003c/em\u003e(2\u0026ndash;3), 61\u0026ndash;83. https://doi.org/10.1017/S0140525X0999152X\u003c/li\u003e\n\u003cli\u003eKilgarriff, A., Baisa, V., Bu\u0026scaron;ta, J., Jakub\u0026iacute;ček, M., Kov\u0026aacute;ř, V., Michelfeit, J., Rychlỳ, P., \u0026amp; Suchomel, V. (2014). The sketch engine. \u003cem\u003eLexicography\u003c/em\u003e, \u003cem\u003e1\u003c/em\u003e(1), 7\u0026ndash;36.\u003c/li\u003e\n\u003cli\u003eKim, J., Yoon, J. H., Kim, S. R., \u0026amp; Kim, H. (2014). Effect of literacy level on cognitive and language tests in Korean illiterate older adults. \u003cem\u003eGeriatrics \u0026amp; gerontology international\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(4), 911-917.\u003c/li\u003e\n\u003cli\u003eKosmidis, M. H., Tsapkini, K., \u0026amp; Folia, V. (2006). Lexical processing in illiteracy: Effect of literacy or education? \u003cem\u003eCortex\u003c/em\u003e, \u003cem\u003e42\u003c/em\u003e(7), 1021\u0026ndash;1027.\u003c/li\u003e\n\u003cli\u003eK\u0026uuml;ntay, A. C., \u0026amp; Nakamura, K. (2004). Linguistic Strategies Serving Evaluative Functions: A Comparison between Japanese and Turkish Narratives. In \u003cem\u003eRelating Events in Narrative, Volume 2\u003c/em\u003e (pp. 329\u0026ndash;358). Psychology Press.\u003c/li\u003e\n\u003cli\u003eMaviş, İ., Tun\u0026ccedil;er, M., \u0026Uuml;re, I., \u0026amp; \u0026Ouml;zt\u0026uuml;rk, S. (2014). The lexical quantification of aphasic spontaneous speech in Turkish: A comparison across discourse types. \u003cem\u003eProceedings of the 16th International Aphasia Rehabilitation Conference\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eMayer, M. (1969). \u003cem\u003eFrog, Where Are You?\u003c/em\u003e Dial Press.\u003c/li\u003e\n\u003cli\u003eMol, S. E., \u0026amp; Bus, A. G. (2011). To read or not to read: A meta-analysis of print exposure from infancy to early adulthood. \u003cem\u003ePsychological Bulletin\u003c/em\u003e, \u003cem\u003e137\u003c/em\u003e(2), 267.\u003c/li\u003e\n\u003cli\u003eMorais, J., Cary, L., Alegria, J., \u0026amp; Bertelson, P. (1979). Does awareness of speech as a sequence of phones arise spontaneously? \u003cem\u003eCognition\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e(4), 323\u0026ndash;331.\u003c/li\u003e\n\u003cli\u003eNavarro, D. (2015). \u003cem\u003eLearning statistics with R: A tutorial for psychology students and other beginners. (Version 0.6)\u003c/em\u003e. University of New South Wales. https://learningstatisticswithr.com\u003c/li\u003e\n\u003cli\u003eNielsen, T. R., \u0026amp; J\u0026oslash;rgensen, K. (2013). Visuoconstructional Abilities in Cognitively Healthy Illiterate Turkish Immigrants: A Quantitative and Qualitative Investigation. \u003cem\u003eThe Clinical Neuropsychologist\u003c/em\u003e, \u003cem\u003e27\u003c/em\u003e(4), 681\u0026ndash;692. https://doi.org/10.1080/13854046.2013.767379\u003c/li\u003e\n\u003cli\u003eUNESCO. (2024). \u003cem\u003eLiteracy\u003c/em\u003e. https://www.unesco.org/en/literacy\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Footnotes","content":"\u003cp\u003e\u003csup\u003e\u003csup\u003e[1]\u003c/sup\u003e\u003c/sup\u003e It was not simple TTR but a different TTR \u0026ldquo;calculated by dividing the total number of different words in an entire sample by the square root of twice the total number of words in that sample\u0026rdquo; (Hess et al., 1984, pp. 52-53).\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"journal-of-cultural-cognitive-science","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"cucs","sideBox":"Learn more about [Journal of Cultural Cognitive Science](http://link.springer.com/journal/41809)","snPcode":"41809","submissionUrl":"https://submission.nature.com/new-submission/41809/3","title":"Journal of Cultural Cognitive Science","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"literacy, frog story, narration, lexical diversity, Turkish, oral production","lastPublishedDoi":"10.21203/rs.3.rs-7134036/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7134036/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAdult language use varies substantially across speakers, with literacy experience emerging as a crucial but understudied factor in creating this variation. While written language exposes speakers to broader, more diverse vocabulary than speech alone, most psycholinguistic research focuses on highly literate populations, leaving gaps in our understanding of how literacy shapes oral production. This study addresses a critical question: Does literacy acquisition affect lexical diversity in spontaneous oral narrative production in Turkish? We compared lexical diversity patterns between semi-literate and fully literate adult Turkish speakers during a structured storytelling task. Using Root Type-Token Ratio analyses across six parts of speech, we found that literate speakers consistently demonstrated significantly higher lexical diversity than semi-literate speakers (d\u0026thinsp;=\u0026thinsp;1.18\u0026ndash;2.08 for most categories). Crucially, this occurred without increased word production, indicating that literacy enhances vocabulary variation rather than quantity. The largest effects emerged for elaborative categories\u0026mdash;conjunctions, adverbs, and adjectives. These findings reveal that literacy fundamentally affects lexical organization and deployment in oral productions.\u003c/p\u003e","manuscriptTitle":"Literacy Enhances Lexical Variation, Not Quantity, in Adult Oral Production","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-08-19 07:24:53","doi":"10.21203/rs.3.rs-7134036/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-10-08T08:39:01+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-07T00:19:54+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"206925906137714625468369034268988452829","date":"2025-08-12T14:40:57+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-08-11T15:01:11+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-08-11T14:57:05+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-07-18T07:16:46+00:00","index":"","fulltext":""},{"type":"submitted","content":"Journal of Cultural Cognitive Science","date":"2025-07-15T21:10:54+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"journal-of-cultural-cognitive-science","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"cucs","sideBox":"Learn more about [Journal of Cultural Cognitive Science](http://link.springer.com/journal/41809)","snPcode":"41809","submissionUrl":"https://submission.nature.com/new-submission/41809/3","title":"Journal of Cultural Cognitive Science","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"a73dd438-b09e-4763-a9e3-64d1eb45ca53","owner":[],"postedDate":"August 19th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-11-10T16:03:19+00:00","versionOfRecord":{"articleIdentity":"rs-7134036","link":"https://doi.org/10.1007/s41809-025-00189-3","journal":{"identity":"journal-of-cultural-cognitive-science","isVorOnly":false,"title":"Journal of Cultural Cognitive Science"},"publishedOn":"2025-11-07 15:58:02","publishedOnDateReadable":"November 7th, 2025"},"versionCreatedAt":"2025-08-19 07:24:53","video":"","vorDoi":"10.1007/s41809-025-00189-3","vorDoiUrl":"https://doi.org/10.1007/s41809-025-00189-3","workflowStages":[]},"version":"v1","identity":"rs-7134036","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7134036","identity":"rs-7134036","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.