Assessing Vocabulary Skills of School Children Aged 9 to 15 in Finland: Tracking the Gender and Home Language Gap

preprint OA: closed
Full text JSON View at publisher
Full text 157,232 characters · extracted from preprint-html · click to expand
Assessing Vocabulary Skills of School Children Aged 9 to 15 in Finland: Tracking the Gender and Home Language Gap | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Assessing Vocabulary Skills of School Children Aged 9 to 15 in Finland: Tracking the Gender and Home Language Gap Raymond Bertram, Tomi Rautaoja, Santeri Holopainen, Tuomo Häikiö, and 8 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6448049/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 29 Dec, 2025 Read the published version in Scientific Reports → Version 1 posted 9 You are reading this latest preprint version Abstract Vocabulary proficiency is a key predictor of reading development. However, vocabulary proficiency in school-age children is rarely assessed, especially in languages other than English. Moreover, because reading development differs depending on home language and gender, it is likely that these factors also influence the development of vocabulary proficiency. Here we report Finnish vocabulary proficiency of school-age children, examining its relationship with grade, gender, and home language. We utilize d-Lexize, a vocabulary test based on visual lexical decision, which we adapted from a previous test for adult L2 speakers. The test assesses vocabulary knowledge by accuracy and lexical retrieval speed through reaction time. Approximately 27,000 school children were tested in three experiments using different versions of d-Lexize. All experiments consistently show that vocabulary proficiency improves progressively from 3rd to 9th grade. The results also reveal an emerging gender gap: whereas girls perform equal to boys in the early stages, they exhibit a more extensive vocabulary and faster lexical retrieval in the later grades. Furthermore, the tests show that pupils from Finnish-only homes consistently outperform those from non-Finnish or mixed-language homes, with this gap widening over time. These results highlight the significance of language exposure and sociocultural factors during vocabulary development. Earth and environmental sciences/Environmental social sciences/Psychology and behaviour Biological sciences/Psychology Health sciences/Risk factors Vocabulary proficiency Reading development Gender Gap Home Language Language background Finnish Figures Figure 1 Figure 2 Figure 3 INTRODUCTION Childhood is a time of rapid vocabulary growth. During their initial 6 years - that is, before they start to read - children have already accumulated a substantial vocabulary amounting to thousands of words. This progression continues steadily throughout the years of primary education. For German pupils, vocabulary grows from roughly 6,000 to 38,000 words between 1st and 8th grade (age 6–14) 1 . The number of dictionary entries known by English-speaking Canadian children grows from about 10,000 to 40,000 from 1st to 5th grade 2 . There are, however, significant disparities in vocabulary size and growth rate among school children. Biemiller showed that by the end of 2nd grade, English-speaking children in the lowest 25th percentile know about 4,000 root words, whereas those in the highest 25th percentile knew around 8,000 root words 3 . Song et al. identified substantial individual differences in vocabulary size for Chinese children from 4 years onwards already 4 . Noticeable individual differences in early vocabulary skills have also been reported for Finnish 5 – 6 , although these studies had small sample sizes and limited grade coverage. Vocabulary proficiency is at the heart of language proficiency. An extensive vocabulary is associated with good phonological skills 7 , syntactic advancement 8 , and well-developed listening and writing ability 9 . Importantly, vocabulary growth plays a crucial role in reading development, as children rely on their word knowledge to make sense of texts. Torppa et al. found that smaller early vocabulary size translates into slower progress in reading 10 . Conversely, Colenbrander et al. showed that children with poor reading comprehension typically exhibit relatively low vocabulary skills 11 . The relationship between vocabulary and reading comprehension is thus reciprocal. Children with smaller vocabularies tend to read less than their peers with larger vocabularies, which in turn reduces their exposure to new words. Differences in vocabulary size may mediate the impact of certain sociolinguistic factors on the development of reading comprehension. For example, in the latest PISA tests, assessing pupils across 81 countries 12 , gender and home language emerged as prominent predictors of reading comprehension. Specifically, boys demonstrated lower reading skills than girls (see also 13 – 14 ), and first- and second-generation immigrant children scored lower than their peers from families of native language speakers 12 . These effects of gender and home language on reading comprehension are found in all languages tested within PISA including Finnish. Given the strong relation between vocabulary knowledge and reading comprehension, similar effects may be expected for vocabulary development. Since PISA identifies these gaps at age 15, it is crucial to track them earlier in development to understand both how they emerge and evolve over time, as well as the potential factors that contribute to their formation. The current study aims to track the gender and home language gap in vocabulary skills among children in Finland from the 3rd to the 9th grade, aged 9 to 15. In Finland, the 1st to 6th grade belong to primary education, while the 7th to 9th grade fall under lower secondary education. Both levels are compulsory for all children. Manu et al. found that while the gender gap in Finnish reading ability is negligible in the early stages of primary school education, it increases over time 15 . However, despite the early similarity in reading skills across genders, vocabulary differences may emerge in the early primary school years, potentially predicting later reading disparities. This study investigates that possibility. It also examines the emergence and development of the home language gap in Finnish vocabulary. Specifically, we compare children from fully Finnish-speaking homes with those from non-Finnish-speaking homes, as well as those from mixed homes where one caregiver is Finnish and the other is non-native. There is a clear need for this type of research, as there is limited understanding of how these three distinct home language environments shape vocabulary development until the end of lower secondary education when the PISA test is administered. To accommodate this need, we developed d-Lexize (developmental Lexize), a comprehensive and reliable tool for assessing Finnish vocabulary skills in children from the 3rd to the 9th grade, aged 9 to 15 years. The d-Lexize test was derived from the Lexize vocabulary test for Finnish L2 speakers 16 . In turn, that test was modeled after the Lexical Test for Advanced Learners (LexTale), a validated test to assess vocabulary proficiency for adult English L2 learners 17 . Like LexTale, Lexize is based on a visual lexical decision task, wherein participants judge whether a visually presented letter string is a word (e.g., savory) or a pseudoword (e.g., plaudate). In other words, our focus was on print vocabulary knowledge rather than vocabulary knowledge in general; however, throughout the text, we refer to it simply as vocabulary knowledge. Another point is that the test taps into vocabulary breadth, meaning the number of words that are known or can be recognized, rather than vocabulary depth, which refers to how well these words are understood 18 . However, both dimensions of vocabulary knowledge not only correlate strongly with reading comprehension but also with each other 19 . In other words, when a person recognizes a large number of words, they typically also possess ample syntactic, semantic, and practical knowledge about them. The Lexize test of Salmela et al. includes words that range from low to medium frequency, ensuring that it encompasses words of varying difficulty level 16 . Since Lexize detected differences among adult L1 and L2 Finnish speakers, we inferred that the test could serve as a starting point for designing a comprehensive Finnish vocabulary test for L1 and L2 school-aged children. This assertion is also supported by the Rapid Online Assessment of Reading ability test (ROAR) developed by Yeatman et al. for English 20 , which showed that visual lexical decision not only allows the assessment of adult vocabulary skills, but can be used equally well to tap into children’s vocabulary skills. Specifically, they created a simple web-based visual lexical decision task and showed that this can serve as an accurate and reliable measure of English reading ability (as assessed by the Woodcock-Johnson Word Identification test) from early childhood (6 years) onwards. An interesting finding of that study was that accuracy rate was a much better predictor of reading ability than reaction time. Yeatman et al. speculated that in languages with an opaque orthography, reaction time is not a reliable measure of individual differences, at least for young readers, but hypothesized that in transparent orthographies individual differences may be more distinctly reflected in reaction time 20 . They also noted that reaction time might work better with larger samples than the 100–200 participants used in their studies. The current study employs a similar visual lexical decision in a transparent orthography, namely Finnish, in which each letter corresponds to a single phoneme and nearly every phoneme to a single letter (only the phoneme /ŋ/ as in hanko does not correspond to a unique letter but to ‘nk’ or ‘ng’). Moreover, by exploiting the nationwide educational platform ViLLE, to which 70% of all Finnish elementary and lower secondary schools are subscribed, we collected data from more than 27,000 pupils from the 3rd to the 9th grade. The transparent orthography is likely to ensure that both accuracy rate and reaction time can serve as reliable dependent measures, while the large sample size helps strengthen the generalizability of the findings. It is particularly valuable to use reaction time alongside accuracy rate, as this measure captures a different facet of vocabulary skills: accuracy rate reflects the number of words a participant knows, while reaction time more effectively captures the speed of lexical retrieval. Both aspects of vocabulary are essential skills that play distinct roles in reading fluency and reading comprehension. This is for instance shown in the ENRO (ENglish Reading Online) metastudy of Siegelman et al. 21 , who found through exploratory factor analyses that vocabulary accuracy and reaction time load on separate factors. Their results also showed that accuracy is closely linked to reading comprehension, while reaction time is more strongly associated with reading fluency. The current study allows for several key questions to be explored. First, does vocabulary proficiency increase steadily throughout the school years? Second, is there a gender gap in the early stages of education already or does it emerge later? Third, how large is the vocabulary gap between children from Finnish and children from non-Finnish homes, and does this gap narrow over time? Fourth, do children from mixed homes have lower vocabulary proficiency than children from Finnish homes and if so, to what extent? While this study focuses on vocabulary development in Finnish, similar issues are relevant for other languages, as highlighted by the recent PISA results on reading comprehension, which show consistent effects of gender and home language across languages 12 . The current study was conducted over three separate experiments, utilizing different versions of d-Lexize. Between experiments, the number of items was reduced based on Item Response Theory (IRT) analyses, driven by the goal of creating a more concise vocabulary test with the best possible items, while also considering the time constraints when testing school children. Each version of the d-Lexize task was tested for reliability and validated against a sentence reading proficiency test and a phonological word reading test (for specifications, see Methods section). The validity of the various d-Lexize versions was also assessed by analyzing the impact of word frequency and length on accuracy and response latencies, as these factors are known to influence reading and lexical decision 22 , 23 . Experiment 1 tested approximately 7,000 children from the 3rd, 4th, and 7th grade; the other experiments included approximately 5,000 (Experiment 2) and 15,000 (Experiment 3) children from the 3rd to the 9th grade. Gender effects were assessed in all experiments, as the distribution of boys and girls was even across experiments. Home-language effects were assessed in Experiment 1 and 3 and involved 3 levels: children from exclusively Finnish-speaking homes (Finnish), children from mixed homes (Finnish/other), and children from non-Finnish-speaking homes (other). Information on the age of acquisition of the Finnish language was not available for the latter two categories, so the impact of home language was only assessed at a general level. Experiment 2 had too small a percentage of children from mixed and non-Finnish speaking households to consider home language as a variable. Data on accuracy rate and response latencies were analyzed using (generalized) linear mixed-effects models with the lme4 package 24 in R statistical software (Version 4.3.0 25 ). Random intercepts were included for both participants and items 26 . We used GLMM to analyze accuracy rate and LMM for response latencies (RTs). The RTs that were computed were based on RTs for correct responses only. Due to skewness, RTs were log-transformed prior to analysis. We examined models where grade interacted with 3 or 4 variables: gender, home language (not in Experiment 2), word frequency, and word length with list as a control variable in Experiment 2 and 3 (as here we used two versions of d-Lexize). The analyses only included word data and not pseudoword data because words were our primary target of interest; moreover, the model includes Log Lemma Frequency as a predictor, a variable that is not available for pseudowords. In Experiment 1, including the 3rd, 4th, and 7th grade, grade was treated as a categorical variable due to discontinuity between the grades, while in Experiments 2 and 3, with all grades from 3rd to 9th included, it was treated as a numeric variable. Gender and home language were also entered as categorical variables. The categorical variables were dummy coded with 3rd grade, boys and Finnish as the reference categories, respectively. Significance of the full terms was assessed by Wald tests (χ²) for accuracy and F-tests using the Satterthwaite approximation for the effective degrees of freedom for response latencies. For Experiment 1, post-hoc analyses were performed assessing the effect of gender and home language at each grade level with p-values being adjusted for False Discovery Rate 27 . The full models and the post-hoc analyses can be found in the statistical reports at https://osf.io/exb9s/?view_only=04dda2e11463493a8cc4bc76261d5e9c . RESULTS Experiment 1 The Lexize task for adult L2 speakers included 68 words and 34 pseudowords, all of which were presented to the participants of Experiment 1. Due to the concern that some of these items might not by suitable for school children, we conducted an IRT analysis and identified 9 words and 4 pseudowords with poor psychometric properties (see Methods for further details). These items were excluded from subsequent analyses, resulting in a final set of 89 items, which we named d-Lexize89; the analyses in Experiment 1 are based on this set. In the analysis of accuracy rates, the main effects of grade (χ²(2) = 31.53, p < .001), home language χ²(2) = (686.5, p < .001), and the interactions between grade and gender (χ²(2) = 16.04, p < .001) as well as between grade and home language (χ²(4) = 25.17, p < .001) were significant (see Fig. 1 ). Post-hoc comparisons revealed a growing gender difference in vocabulary proficiency from 3rd to 7th grade: While no significant difference was found in grade 3 (OR = 0.942, SE = 0.038, Z = -1.502, p = .142), boys scored significantly lower than girls in grade 4 (OR = 0.866, SE = 0.034, Z = -3.639, p = .001) with an even larger disadvantage by grade 7 (OR = 0.732, SE = 0.036, Z = -6.387, p < .001). For home language, post-hoc analyses showed that already in grade 3, children from Finnish-speaking homes outperformed those from mixed-language homes (OR = 1.773, SE = 0.13, Z = 7.83, p < .001) and non-Finnish homes (OR = 4.238, SE = 0.237, Z = 25.81, p < .001), with children from mixed-language homes also scoring higher than those from non-Finnish homes (OR = 2.391, SE = 0.206, Z = 10.12, p < .001). Similar patterns were observed in grade 4 (Finnish vs. Mixed: OR = 1.789, SE = 0.118, Z = 8.809, p < .001; Finnish vs. Other: OR = 4.237, SE = 0.244, Z = 25.08, p < .001; Mixed vs. Other: OR = 2.368, SE = 0.193, Z = 10.57, p < .001). By grade 7, the gap had widened, with children from Finnish-speaking homes maintaining the highest scores (Finnish vs. Mixed: OR = 2.758, SE = 0.239, Z = 11.719, p < .001; Finnish vs. Other: OR = 5.386, SE = 0.377, Z = 24.05, p < .001) and a clear difference between children from mixed-language and non-Finnish homes still (Mixed vs. Other: OR = 1.953, SE = 0.203, Z = 6.44, p < .001). For RT, there were again significant effects for grade (F(2, 25035) = 52.31, p < .001) and home language (F(2, 7041) = 106.99, p < .001) as well as interactions between grade and gender (F(2, 6811) = 14.77, p < .001), and grade and home language, F(4, 7042) = 8.40, p < .001. Post-hoc comparisons revealed a shift in the gender effect across grades: in grade 3, boys responded significantly faster than girls (β = -0.047, SE = 0.01, Z = -4.541, p < .001), but by grade 4, there was no difference (β = -0.018, SE = 0.01, Z = -1.746, p = .081), and by grade 7, the pattern had reversed, with boys now responding significantly slower than girls (β = 0.041, SE = 0.012, Z = 3.269, p = .001). For home language, post-hoc comparisons showed that in grade 3, children from Finnish-speaking or mixed homes responded significantly faster than children from non-Finnish-speaking homes (Finnish vs. Other: β = -0.093, SE = 0.015, Z = -6.379, p < .001; Mixed vs. Other: β = -0.095, SE = 0.022, Z = -4.246, p < .001), whereas the difference between fully Finnish and mixed language homes was not significant (Finnish vs Mixed: β = 0.002, SE = 0.019, Z = 0.112, p = .91). The same pattern was found for the 4th grade: (Finnish vs Other: β = -0.111, SE = 0.015, Z = -7.409, p < .001; Mixed vs Other: β = -0.084, SE = 0.021, Z = -3.951, p < .001; Finnish vs. Mixed: β = -0.027, SE = 0.017, Z = -1.596, p = .121). In grade 7, all home language contrasts reached significance (Finnish vs Mixed: β = -0.115, SE = 0.022, Z = -5.168, p < .001, Finnish vs Other: β = -0.196, SE = 0.018, Z = -10.788, p < .001; Mixed vs Other: β = -0.081, SE = 0.027, Z = -2.992, p = .003). Similar to the findings for accuracy, the gap between children from Finnish and non-Finnish homes had also widened for RTs, reflecting an increasing disparity in lexical retrieval speed. Moreover, children from Finnish homes now responded faster than those from mixed-language homes, suggesting that a gap in lexical retrieval speed is also emerging between these groups across grades. Figure 1 shows the interactions of grade with gender and home language for accuracy and RT in Experiment 1. Experiment 2 In Experiment 2 we sought to reduce the length of the d-Lexize test and to create an additional version of the test, allowing for multiple assessments and the potential development of a parallel assessment of oral vocabulary. Therefore, from the 89 items of d-Lexize89, we created two item lists of 36 words and 19 pseudowords (d-Lexize55a & d-Lexize55b), each containing 34 unique items and 21 shared items (for more detailed information, see Methods). The lists were matched on average item discriminability and accuracy rate in Experiment 1 as well as on average word and bigram frequency, word length, and orthographic neighborhood; list was entered as a control variable in the analyses. For accuracy, the analysis of Experiment 2 showed a marginal effect for grade (χ²(1) = 3.57, p = .059) and no effect for gender (χ²(1) = 1.05, p = .31), but the grade by gender interaction was significant (χ²(1) = 14.07, p < 0.001). As in Experiment 1, this indicates that the initially similar performance in early grades shifts to girls outperforming boys in later grades (see Fig. 2 , left panel). For RT, significant effects were observed for grade (F(1, 55421) = 40.97, p < .001), gender (F(1, 5185) = 10.06, p = .002), and the grade by gender interaction (F(1, 5161) = 16.55, p < .001). This interaction is in line with the RT results in Experiment 1 and indicates a shift from boys being faster in grade 3 to girls being faster in the later grades (see Fig. 2 , right panel). The effect of list was not significant (Accuracy, p = .98; RT, p = .08). Figure 2 shows the interaction of grade with gender for accuracy and RT in Experiment 2. Experiment 3 IRT analyses of Experiment 2 identified 10 words with poor psychometric properties (see Methods for more details). These items were excluded from Experiment 3. From the remaining 79 items, we constructed two lists with 25 words and 16 pseudowords (d-Lexize41a & d-Lexize41B), each of which had 38 unique items and 3 shared items. The lists were matched on average item discriminability and item accuracy from Experiment 2 as well as on average word and bigram frequency, word length, and orthographic neighborhood. For accuracy, there was a significant main effect for grade (χ²(1) = 57.31, p < .001), but not for gender (χ²(1) = 2.14, p = .14). The interaction between grade and gender was again significant (χ²(1) = 25.55, p < .001), indicating that similar performance in earlier grades turned into an advantage for girls in later grades. There was also a main effect for home language (χ²(2) = 296.02, p < .001), indicating that children from exclusively Finnish homes outperformed children from mixed homes (OR = 0.63, CI: 0.52–0.77, p < .001) and children from non-Finnish homes (OR = 0.63, 95% CI: 0.25–0.33, p < 0.001). The significant interaction between grade and home language (χ²(2) = 6.05, p = .049) reflected that the gap between children from exclusively Finnish homes with children from mixed homes (OR = 0.97, CI: 0.93–1.00, p = 0.076) and non-Finnish homes (OR = 0.97, CI: 0.95–1.00, p = 0.061) is growing over the years. The effect of list was not significant (χ²(1) = 3.82, p = .051). The interactions between grade and gender and grade and home language are depicted in Fig. 3 (left panels). For RT, there was a significant main effect for grade (F(1, 77857) = 387.78, p < .001) and gender (F(1, 14562) = 98.32, p < .001) and the interaction between grade and gender was also significant (F(1, 14375) = 90.89, p < .001). The interaction indicated that the initial faster responses for boys in the early grades swapped towards faster responses for the girls in the later grades. There was no main effect for home language (F(2, 15623) = 2.09, p = .12), but the interaction between grade and home language was again significant (F(2, 15033) = 44.30, p < .001). This indicated that the gap between children from Finnish-speaking homes and those from mixed-language homes widens over the school years (β = 0.01, 95% CI: 0.00–0.02, p = 0.01) and grows even more in relation to children from non-Finnish-speaking homes (β = 0.04, 95% CI: 0.03–0.04, p < 0.001). The effect of list was not significant (F(2, 33497) = 0.62, p = .43). Figure 3 shows the interactions of grade with gender and home language for accuracy and RT in Experiment 3. GENERAL DISCUSSION The results from our large-scale study provide clear and consistent answers to the questions posed in the Introduction, observed across three different versions of the d-Lexize vocabulary test and three different samples. First, as expected, it became clear that vocabulary proficiency increases steadily throughout the school years. This is reflected in increasing accuracy demonstrating vocabulary growth from 3rd to 9th grade, in line with earlier studies showing a steady increase of vocabulary during the school years in German 1 and English 2 . Increasing vocabulary proficiency is also reflected by the progressively faster response latencies, indicating increasing speed of lexical retrieval, in line with studies that show a solid increase in reading speed throughout the school years 28 . Second, a gender gap was observed in vocabulary skills, similar to what has been reported for reading comprehension, where girls tend to outperform boys 13 , 14 . Importantly, however, this gap was not evident in the early school years. Specifically, across all Experiments, 3rd grade boys appeared to be on par with 3rd grade girls in vocabulary knowledge and even demonstrated faster lexical retrieval. By 7th grade, however, around the transition from primary to lower secondary school, girls had surpassed boys in both vocabulary size and retrieval speed. This suggests that boys’ weaker vocabulary skills do not clearly precede their weaker reading comprehension, as speculated in the Introduction. Instead, vocabulary difficulties appear to align with the developmental trajectory of reading comprehension difficulties 15 . The gender gap in reading comprehension is often attributed to differences in reading motivation and habits, with girls typically displaying greater intrinsic reading motivation and reading more frequently than boys 29 , 30 . This difference becomes even more evident by the end of elementary school, around 6th grade 31 . It is likely that the observed disparity and development in vocabulary skills stems from similar underlying factors. Third, not speaking Finnish at home affects vocabulary proficiency greatly. Children from non-Finnish homes have a smaller vocabulary and are slower in lexical retrieval than children from exclusively Finnish or mixed homes, which is in line with the most recent PISA results 12 . These differences seem to be present throughout the school years, and are even more pronounced in the later grades than in the earlier grades. Since we did not collect information on the age at which children acquired Finnish, we cannot determine whether this widening gap is linked to students in higher grades having spent less time in Finland than those in lower grades. This is something we plan to explore further in future testing rounds. However, it is likely that our results reflect that the school environment and education are not effectively bridging the vocabulary and reading comprehension gap, which, in an ideal scenario, they would. This may especially be the case if schools have a high concentration of children from immigrant backgrounds, which may lead to increased use of languages other than Finnish in informal settings, such as the schoolyard, and may require more classroom time to be dedicated to foundational vocabulary and reading skills. At least part of the schools we tested may have such a situation, as our samples include the Helsinki area, where the proportion of pupils from immigrant background can be quite substantial, even exceeding 50% in some schools. Parental support for developing Finnish proficiency in non-Finnish homes may also be limited, partly because families may prioritize maintaining their heritage language at home, partly because they are not proficient in Finnish themselves. Fourth, having another home language in addition to Finnish turns out to affect vocabulary proficiency as well, even though the gap is smaller compared to children from non-Finnish homes. However, the results are evident in both vocabulary knowledge and lexical retrieval. These results align with other studies indicating that simultaneous bilingualism can limit proficiency in the dominant language due to less frequent exposure and use of words. For instance, Bialystok et al. reported the results of an analysis of 1,738 children between 3 and 10 years old and demonstrated a consistent difference in receptive vocabulary between monolinguals and bilinguals 32 . This aligns with the notion that bilingual children in early childhood receive less exposure to the majority language at home than monolingual children. As a result, they encounter fewer words, leading to lower accuracy scores, and have less repeated exposure to the words they do learn, contributing to slower lexical retrieval. Perhaps surprisingly though, our results show that the vocabulary gap widens as children progress through school. One might expect children from mixed-language homes to catch up over time, as their daily exposure to Finnish increases in elementary school - not only through spoken interactions but also through written language. To gain a deeper understanding of the widening gap between monolingual and bilingual pupils, future studies should include more detailed questionnaires on language use of bilingual children at home and elsewhere and explore their relationship to vocabulary development. Conclusions and future directions In the current study the observed differences across grade, gender, and home language exposure highlight how maturation, education, and sociocultural factors shape vocabulary proficiency, including both vocabulary size and lexical retrieval speed. Since vocabulary proficiency develops across the school years and disparities emerge gradually, monitoring vocabulary growth from early education would be essential. However, this kind of monitoring is rarely implemented in school assessment practices, partly due to the limited availability of standardized vocabulary assessments in many languages, including Finnish. The present study addresses this need by developing d-Lexize, a valid and reliable instrument for assessing vocabulary proficiency. The next step in this process involves creating normative data for 1st to 9th grade, a task we are currently undertaking. Expanding access to reliable vocabulary measures would support early identification of difficulties and enable targeted vocabulary interventions. The current study is part of a larger initiative entitled the Multilingual Reading Assessment (MUREA) project. Future research within the MUREA framework aims to comprehensively investigate the development and remediation of components of reading comprehension in individuals from diverse linguistic backgrounds. In addition to vocabulary, this includes phonology, morphology, syntax, and self-regulation skills. As such, the project aims to contribute to both reading research and practical applications in educational settings, ultimately promoting more equitable language development across diverse learner populations. METHODS For each Experiment, the R code as well as R-generated HTML-reports are available at the project's OSF page, https://osf.io/exb9s/?view_only=04dda2e11463493a8cc4bc76261d5e9c . The HTML-reports include the participant selection procedure, IRT analyses, validity and reliability tests, Monte Carlo simulations, (g)lmm models, and post-hoc analyses as well as outlier removal, including participant exclusions and filtering of extreme response times. Participants The children were sampled from children from schools across the regions of Uusimaa and Southwest Finland. Across experiments, around 27,000 participants were included in the analyses, with some excluded beforehand (see HTML reports for details). From all the children included in the analyses informed parental consent was obtained. A total of 6,988 children (50% boys, 50% girls; home language: 77% Finnish; 14% Other; 9% Finnish/Other) from the 3rd (n = 2,607), 4th (n = 2,691), and 7th grade (n = 1,784) were included in the analyses of Experiment 1. In the analyses of Experiment 2, 5,205 children (52% girls, 48% boys) with home language Finnish from the 3rd to the 9th grade (3rd : 862; 4th : 880; 5th : 877; 6th : 957; 7th : 726; 8th : 540; and 9th : 363) were included. Here children from mixed or non-native homes were excluded, as their percentage was very low in this sample (about 3%); hence, home language was not used as a variable in the analyses. For Experiment 3, the analyses included 14,634 children (50% girls, 50% boys) from the 3rd to the 9th grade (3rd : 5,108; 4th : 1,577; 5th : 1,789; 6th : 1,341; 7th : 3,301; 8th : 949; and 9th : 776). Here home language was used as a variable again, as there was a sufficient number of children from mixed (n = 1110) and non-Finnish homes (n = 1803). Procedure Before the experimental tests, participants were administered a questionnaire including questions about gender and home languages. After this, participants completed the d-Lexize task and two tasks that were used for validation: the sentence reading fluency task (SRF; Experiments 1–3) and the phonological word reading task (PWR; Experiments 2 and 3). Prior to starting d-Lexize, participants were presented with instructions explaining that letter strings would be displayed one by one and that they were to indicate whether each letter string was a word or not by pressing the “yes” or “no” button. After the instructions, participants were presented with 4 or 5 practice items, depending on the experiment. Each letter string was preceded by a fixation point that remained on the screen for 1000 ms. There was a time limit for the individual items of 4000 ms in Experiment 1 and 2 and 5000 ms in Experiment 3. The d-Lexize test lasted 5 to 10 minutes, the whole battery between 25 and 40 minutes. The sentence reading fluency test (SFR) In the SFR, participants read a series of sentences and are asked to indicate whether each sentence is true or not. The task is based on the widely used Woodcock-Johnson IV (WJ IV) sentence reading fluency test in English 33 . The current Finnish test includes four practice stimuli, followed by feedback after each response. After completing the practice items, each participant progresses through the task with as many stimuli as can be processed within the specified time limit (1 minute and 30 seconds). A maximum of 41 sentences are presented, in fixed order. Statements are relatively easy to respond to (e.g., “stones can be eaten”, “cows fly”, “fishes swim”), so they elicit a high accuracy rate. However, how quickly the sentences are read varies considerably across grades and across children. The internal consistency for the test is excellent for reaction times and moderate for accuracy rate (Guttman's lambdas were .94 and .74, respectively). The phonological word reading test (PWR) In the PWR, participants are tasked with choosing a word that matches a presented picture. The task includes four practice stimuli, followed by feedback after each response. After completing the practice items, each participant progresses through the task with as many stimuli as can be processed within the specified time limit (2 minutes). A maximum total of 140 trials are presented in fixed order. In addition to the correct word, participants are presented with three foils, including words and nonwords, that vary in phonological - and, by extension, orthographic - similarity to the correct choice. In this task, word level reading is evaluated by assessing the participant's ability to read and determine the word corresponding to the picture accurately. The internal consistency of the test is excellent for reaction times (Guttman’s lambda = 0.92) and good for accuracy rate (Guttman’s lambda = 0.83). The d-Lexize vocabulary test In Experiment 1, we utilized the Lexize version developed by Salmela et al. 16 , comprising 102 items, with 68 being Finnish words and 34 phonotactically valid Finnish pseudowords. Lexical-statistical characteristics were extracted from a Finnish newspaper corpus containing 22.7 million word forms, utilizing the lexical search program WordMill 34 . The chosen words were drawn from six distinct frequency ranges: 17 words with frequency < 1 per million (pm); 20 words with frequency 1–5 pm; 19 words with frequency 5–10 pm; 11 words with frequency 10–20 pm; 1 word with frequency 20–100 pm. The word set predominantly consisted of nouns (n = 52), with a smaller representation of verbs (n = 7) and adjectives (n = 9), reflecting the proportion of these word classes in natural language. To avoid morphological structure aiding word recognition, all selected words were monomorphemic 35 . Word length varied from 4 to 9 letters (M = 5.8). The set of 34 pseudowords was derived from words that were matched in part of speech, length, and frequency with the 68 word items. In these chosen words, 1 to 3 letters were altered such that phonotactically valid pseudowords were created. The mean bigram frequency of the pseudowords (M = 5.9, SD = 2.5) aligned with that of the selected words (M = 6.1, SD = 2.6), as confirmed by an independent samples t-test (t(100) = 0.53, p = 0.60). This ensured that the letter patterns of the pseudowords mimic those in words. Subsequent versions of d-Lexize, i.e. d-Lexize89, d-Lexize55a, d-Lexize55b, d-Lexize41a, and d-Lexize41b, preserved the same properties as the original version. The characteristics of each d-Lexize version are listed in Table 1 . All items from all versions are listed in Supplementary Table 1. Table 1 Table 1 Properties of the different Lexize versions. Version N W a PW b Nom c Adj d V e Freq f LenW g LenPW h BiW i BiPW j Lexize 102 68 34 49 12 7 5.6 (0.1–26.2) 5.8 (4–9) 6.1 (5–9) 6.2 5.9 d-Lexize89 89 59 30 43 9 7 6.0 (0.1–26.2) 5.8 (4–9) 6.2 (5–9) 6.2 6.2 d-Lexize55a 55 36 19 26 6 4 5.9 (0.1–18.8) 5.9 (4–9) 6.3 (5–9) 6.0 6.2 d-Lexize55b 55 36 19 28 5 3 6.1 (0.1–26.2) 5.9 (4–9) 6.2 (5–9) 6.3 6.2 d-Lexize41a 41 25 16 18 5 3 5.9 (0.1–18.8) 6.0 (4–9) 6.3 (5–9) 6.4 6.2 d-Lexize41b 41 25 16 19 5 2 6.1 (0.1–26.2) 5.6 (4–7) 6.2 (5–9) 5.7 6.2 a. No. of Words; b. No. of Pseudowords; c. No. of Nouns; d. No. of Adjectives; e. No. of Verbs; f. Frequency per million (range); g. Length of words (range); h. Length of pseudowords (range); i. Bigram Frequency of Words per 1000; j. Bigram Frequency of Pseudowords per 1000. From Lexize to d-Lexize89 to d-Lexize55 to d-Lexize41 via IRT and MC simulations Experiment 1 started with the 68 words and 34 pseudowords from the Lexize test originally designed for L2 adults 16 . Given the possibility that some of these items may not be appropriate for a vocabulary test for school children, we conducted Item Response Theory (IRT) analyses on the combined data from the 3rd, 4th, and the 7th grade using the one-parameter (1PL), two-parameter (2PL), and three-parameter (3PL) logistic models, which account for item difficulty, discrimination, and guessing, respectively. The 3PL model provided the best fit statistics (see section 2.4.4 from the html-report on Experiment 1) and identified 9 words (itara ‘stingy’, kovera ‘concave’, houre ‘phantom’, uuhi ’ewe’, kieppi ’coil’, vouti ’magistrate’, aihio ’work in progress’, purje ’sail’ and suppea ’narrow’) and 4 pseudowords with poor psychometric properties, which either meant relatively low discriminability values, relatively low difficulty values, or close-to-zero guessing values. The exclusion of these items resulted in a final set of 89 items, which we named d-Lexize89; the analyses in Experiment 1 are based on this set. Next we conducted a Monte Carlo simulation on d-Lexize89 with 17 sub-samples (sizes 5–85 in steps of 5). Table 3.3 in the HTML report shows mean correlations across 1000 repetitions. A 55-item sub-sample correlated almost perfectly (.96 for accuracy, .99 for response latencies) with the full scale, so we used this number of items for the lists we created in Experiment 2. Experiment 2 split d-Lexize89 into two lists ( d-Lexize55a & d-Lexize55b ), each with 36 words and 19 pseudowords (34 unique and 21 shared). Lists were matched on item discriminability and accuracy derived from the IRT analyses in Experiment 1, as well as on average frequency, word length, bigram frequency, and orthographic neighborhood (see Table 1 ). The division into two lists of 55 items with equal lexical-statistical properties was made with future experimentation in mind, such as multiple testing and comparing vocabulary skills through parallel auditory and visual lexical decision tasks. Next, we again conducted a Monte Carlo simulation on d-Lexize55a and d-Lexize55b with 10 sub-samples (sizes 5–50 in steps of 5). The table in section 4 of the HTML-report on Experiment 2 shows mean correlations across 1000 repetitions. A 40-item sub-sample correlated almost perfectly (.96 for accuracy, .99 for response latencies, for both lists) with the full scale, so we used this number of items for the lists we created for Experiment 3. IRT analyses combining data from the 3rd to the 9th grades detected the 10 words with the least favorable psychometric properties and excluded them for Experiment 3. The 10 excluded items were: juhta ‘beast of burden’, kolttu ‘old-fashioned dress’, rahvas ‘common people’, pisara ’drop’, kohtu ’uterus’, tyrkyttää ’intrude’, parvi ’loft’, nuotio ’campfire’, hauki ‘pike’ and sukeltaa ’dive’. Experiment 3 further refined d-Lexize89 by excluding 10 words based on Experiment 2’s IRT analyses. The remaining 79 items were split into two lists ( d-Lexize41a & d-Lexize41b ), each containing 25 words and 16 pseudowords—38 unique and 3 shared—again matched for key linguistic properties (see Table 1 ). Reliability Estimates for d-Lexize We calculated Cronbach's alpha and Guttman’s lambda 4 reliability coefficients for each class separately and for the total sample in each of the d-Lexize versions. Here, we report the values for the total sample; however, the results for each grade were similar (for more details, see the internal consistency sections in each experiment’s HTML report). For d-Lexize89, Cronbach's alpha was .90 for accuracy and .98 for RT, with Guttman’s lambda at .92 and .98, respectively. For d-Lexize-55a, Cronbach's alpha was .78 for accuracy and .98 for latencies, and Guttman’s lambda was .82 and .98, respectively. For d-Lexize-55b, Cronbach's alpha was .79 for accuracy and .98 for latencies, with Guttman’s lambda at .83 and .98, respectively. For d-Lexize-41a, Cronbach's alpha was .80 for accuracy and .97 for latencies, and Guttman’s lambda was .84 and .98, respectively. For d-Lexize-41b, Cronbach's alpha was .83 for accuracy and .97 for latencies, with Guttman’s lambda at .86 and .98, respectively. These values indicate that all d-Lexize versions are highly reliable. Validity Estimates for d-Lexize To further assess the validity of each d-Lexize version, we compared pupils' performance in d-Lexize with their performance on two other tests, the SFR and PWR. Specifically, we calculated the correlations between the accuracy rates and response latencies of each Lexize test and those of the SFR (Experiments 1–3) and the PWR test (Experiments 2 and 3). It is important to note that both the SFR and PWR test items are relatively easy, resulting in high accuracy rates (90% or more) and minimal variation (even in the lower grades), deflating the correlations. Therefore, the response latencies in both tests are more reliable indicators of the processing effort required by the children. This is also reflected in the correlations, with the strongest associations observed between d-Lexize RTs and those of the SFR and PWR (ranging from 0.69 to 0.82), indicating excellent validity. Full correlation details are presented in Table 2 . Table 2 Table 2 Correlations between Accuracy Rates and Reaction Times of d-Lexize tests with those of the SFR and the PWR tests. All bolded values are significant at the p < .001-level, the non-bolded values are not significant at the p < 0.05 level. Variable SFR_Acc SFR_RT PWR_Acc PWR_RT d-Lexize89_Acc 0.33 -0.45 d-Lexize89_RT 0.02 0.69 d-Lexize55a_Acc 0.30 -0.29 0.33 -0.13 d-Lexize55a_RT 0.03 0.82 -0.24 0.71 d-Lexize55b_Acc 0.25 -0.35 0.32 -0.23 d-Lexize55b_RT 0.03 0.82 -0.30 0.72 d-Lexize41a_Acc 0.43 -0.46 0.45 -0.30 d-Lexize41a_RT -0.10 0.79 -0.27 0.72 d-Lexize41b_Acc 0.39 -0.42 0.40 -0.24 d-Lexize41b_RT -0.09 0.80 -0.27 0.73 The validity of the different d-Lexize versions was also evaluated by examining the effects of word frequency and length on accuracy and response latencies, as these factors are known to influence reading and lexical decision tasks 22 , 23 . Both factors were analyzed in interaction with grade, as we expected their effects to vary by grade – a pattern that emerged in most analyses. The corresponding statistics and figures illustrating these interactions are presented in the mixed-effects models section of the HTML reports. Most importantly, consistent with previous research and across all grades and experiments, lower word frequency was associated with lower accuracy and slower response latencies, while longer words led to slower response latencies, supporting the validity of all d-Lexize versions. Ethics declarations The datasets utilized in this study were obtained through a collaborative effort between the University of Turku and several municipalities. Each municipality’s educational office independently decided whether to participate in the assessments organized by the University. Furthermore, the municipalities determined the specific grade levels that would be included in the evaluation. The primary objective of the assessments was to provide teachers and educational authorities within each municipality with information regarding pupils’ strengths and challenges in reading and mathematical skills. To facilitate the provision of feedback to teachers about their own pupils, each child was identified using a school-based login system. The research activities were conducted independently from the municipal assessments and their analyses. The research team requested permission from each municipality to utilize the assessment data for research purposes, thereby repurposing pre-existing registry data. Each municipality independently managed the process of obtaining informed parental consent, permitting the release of data to a predefined group of researchers in a pseudonymized format. This format had excluded all direct identifiers, such as personal login ID, names, school affiliations, and municipal identifiers. All necessary ethical approvals were obtained in advance of the study. The research was conducted in full compliance with Finnish legislation and adhered to the guidelines set forth by the Finnish National Board on Research Integrity (TENK). Declarations Supplementary material A supplementary table with all the items included in all the versions of d-Lexize is added as a supplementary file. Author contributions All authors contributed to the design of the study. T.R., S.H., P.E., and P.R. collected the data. S.H., N.S., and R.B. analyzed the data. All authors contributed to the writing of the manuscript. Competing interests The authors declare no competing interests. Data availability Extensive data analysis reports and R script are available at https://osf.io/exb9s/?view_only=04dda2e11463493a8cc4bc76261d5e9c. The datasets themselves, generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. References Segbers, J. & Schroeder, S. How many words do children know? A corpus-based estimation of children’s total vocabulary size. Lang. Test. 34 , 297–320 (2017). Anglin, J. M. Vocabulary development: A morphological analysis. Monogr. Soc. Res. Child. Dev. Serial No . 238 , 58 (1993). Biemiller, A. Size and sequence in vocabulary development: Implications for choosing words for primary grade vocabulary instruction. In Teaching and Learning Vocabulary: Bringing Research to Practice (eds Hiebert, A. & Kamil, M.) 223–242 (Erlbaum, (2005). Song, S. et al. Tracing children's vocabulary development from preschool through the school‐age years: An 8‐year longitudinal study. Dev. Sci. 18 , 119–131 (2015). Honko, M. Alakouluikaisten leksikaalinen tieto ja taito. Toisen sukupolven suomi ja S1-verrokit [Lexical knowledge and skills in primary school children. Second generation L2 Finnish speakers and L1 peers]. Doctoral Thesis (University of Tampere, 2013). Retrieved July 28, from (2021). https://trepo.tuni.fi/handle/10024/94544 Saarela, L. Peruskoululaisten kirjoitelmien kehittyminen sanastotutkimuksen valossa [The developmental trajectory of elementary school children’s essays in the light of vocabulary research]. Unpublished doctoral dissertation. Oulu: University of Oulu (1997). Lonigan, C. J. Vocabulary development and the development of phonological awareness skills in preschool children. In Vocabulary Acquisition: Implications Read. Comprehension 15–31 (2007). Bates, E. & Goodman, J. On the emergence of grammar from the lexicon. In The Emergence of Language (ed. 29–79 (Erlbaum, (1999). Stæhr, L. S. Vocabulary size and the skills of listening, reading and writing. Lang. Learn. J. 36 , 139–152 (2008). Torppa, M., Vasalampi, K., Eklund, K. & Niemi, P. Long-term effects of the home literacy environment on reading development: Familial risk for dyslexia as a moderator. J. Exp. Child. Psychol. 215 , 105314 (2022). Colenbrander, D., Kohnen, S., Smith-Lock, K. & Nickels, L. Individual differences in the vocabulary skills of children with poor reading comprehension. Learn. Individ Differ. 50 , (2016). OECD. PISA 2022 Assessment and Analytical Framework, PISA. OECD Publishing. (2023). Torppa, M., Eklund, K., Sulkunen, S., Niemi, P. & Ahonen, T. Why do boys and girls perform differently on PISA Reading in Finland? The effects of reading fluency, achievement behaviour, leisure reading and homework activity. J. Res. Read. 41 , 122–139 (2018). Smith, E. & Reimer, D. Understanding gender inequality in children's reading behavior: New insights from digital behavioral data. Child. Dev. 95 , 625–635 (2024). Manu, M. et al. Reading development from kindergarten to age 18: The role of gender and parental education. Read. Res. Q. 58 , 505–538 (2023). Salmela, R., Lehtonen, M., Garusi, S., Bertram, R. & Lexize A test to quickly assess vocabulary knowledge in Finnish. Scand. J. Psychol. 62 , 806–819. https://doi.org/10.1111/sjop.12768 (2021). Lemhöfer, K. & Broersma, M. Introducing LexTALE: A quick and valid lexical test for advanced learners of English. Behav. Res. Methods . 44 , 325–343 (2012). Schmitt, N. Size and depth of vocabulary knowledge: What the research shows. Lang. Learn. 64 , 913–951 (2014). Li, M. & Kirby, J. R. The effects of vocabulary breadth and depth on English reading. Appl. Linguist . 36 , 611–634 (2015). Yeatman, J. D. et al. Rapid online assessment of reading ability. Sci. Rep. 11 , 1–11 (2021). Siegelman, N. et al. Rethinking first language–second language similarities and differences in English proficiency: Insights from the ENglish Reading Online (ENRO) project. Lang. Learn. 74 , 249–294 (2024). Rayner, K. Eye movements in reading and information processing: 20 years of research. Psychol. Bull. 124 , 372–422 (1998). Schröter, P. & Schroeder, S. The Developmental Lexicon Project: A behavioral database to investigate visual word recognition across the lifespan. Behav. Res. 49 , 2183–2203. https://doi.org/10.3758/s13428-016-0851-9 (2017). Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67 , 1–48. https://doi.org/10.18637/jss.v067.i01 (2015). R Core Team. R: A language and environment for statistical computing (Version 4.3.0) [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/ Baayen, R. H., Davidson, D. J. & Bates, D. M. Mixed-effects modeling with crossed random effects for subjects and items. J. Mem. Lang. 59 , 390–412 (2008). Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R Stat. Soc. Ser. B (Methodol) . 57 , 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x (1995). Häikiö, T., Bertram, R., Hyönä, J. & Niemi, P. Development of the letter identity span in reading: Evidence from the eye movement moving window paradigm. J. Exp. Child. Psychol. 102 , 167–181 (2009). McGeown, S., Goodwin, H., Henderson, N. & Wright, P. Gender differences in reading motivation: Does sex or gender identity provide a better account? J. Res. Read. 35 , 328–336. https://doi.org/10.1111/j.1467-9817.2010.01481.x (2012). Wigfield, A. & Guthrie, J. T. Relations of children's motivation for reading to the amount and breadth of their reading. J. Educ. Psychol. 89 , 420–432. https://doi.org/10.1037/0022-0663.89.3.420 (1997). Becker, M. & McElvany, N. The interplay of gender and social background. A longitudinal study of interaction effects in reading attitudes and behaviour. Br. J. Educ. Psychol. 88 , 529–549. https://doi.org/10.1111/bjep.12199 (2018). Bialystok, E., Luk, G., Peets, K. F. & Yang, S. Receptive vocabulary differences in monolingual and bilingual children. Bilingualism: Lang. Cogn. 13 , 525–531 (2010). Schrank, F. A. & Wendling, B. J. The Woodcock–Johnson IV . Contemp. Intell. Assess. 383 (2018). Laine, M. & Virtanen, P. WordMill lexical search program. Cent. Cogn. Neurosci. Univ. Turku (1999). Brysbaert, M. & LEXTALE_FR: A fast, free, and efficient test to measure language proficiency in French. Psychol. Belg. 53 , 23–37 (2013). Additional Declarations No competing interests reported. Supplementary Files SupplementaryFileBertram.docx Cite Share Download PDF Status: Published Journal Publication published 29 Dec, 2025 Read the published version in Scientific Reports → Version 1 posted Editorial decision: Revision requested 15 Jul, 2025 Reviews received at journal 22 May, 2025 Reviewers agreed at journal 08 May, 2025 Reviewers agreed at journal 07 May, 2025 Reviewers invited by journal 06 May, 2025 Editor assigned by journal 06 May, 2025 Editor invited by journal 29 Apr, 2025 Submission checks completed at journal 28 Apr, 2025 First submitted to journal 14 Apr, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6448049","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":453703864,"identity":"cb17e5fc-8505-424b-9395-f4d2c06f576d","order_by":0,"name":"Raymond Bertram","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAoklEQVRIiWNgGAWjYHACNiC2AbM+JJCgJY2Bh4GBcQYpWg5DtBCl3rz9jNmDj3vOJ+6XSGBseECMFpkzOeaGM57dTuwBaSHKYRIMOWbSPAfAWtgfEKeF/42Z9J8D50ixRQJoC8OBAyRpeVZu2HMg2bjnzMNGIrXwJ2978OOAnWx7e/LBxh/EaGFg4DCAMhgbiNPAwMD+gFiVo2AUjIJRMFIBAAh9M6lcoDM7AAAAAElFTkSuQmCC","orcid":"","institution":"University of Turku","correspondingAuthor":true,"prefix":"","firstName":"Raymond","middleName":"","lastName":"Bertram","suffix":""},{"id":453703865,"identity":"0407ce68-c034-4094-99b1-3524ccd5f98e","order_by":1,"name":"Tomi Rautaoja","email":"","orcid":"","institution":"University of Turku","correspondingAuthor":false,"prefix":"","firstName":"Tomi","middleName":"","lastName":"Rautaoja","suffix":""},{"id":453703866,"identity":"8700df83-5fa6-43ea-a5ec-237015cabc3a","order_by":2,"name":"Santeri Holopainen","email":"","orcid":"","institution":"University of Turku","correspondingAuthor":false,"prefix":"","firstName":"Santeri","middleName":"","lastName":"Holopainen","suffix":""},{"id":453703867,"identity":"c8c7d1b4-cff2-43a5-b66a-95c8049a971c","order_by":3,"name":"Tuomo Häikiö","email":"","orcid":"","institution":"University of Turku","correspondingAuthor":false,"prefix":"","firstName":"Tuomo","middleName":"","lastName":"Häikiö","suffix":""},{"id":453703868,"identity":"f799239d-eee6-4cb7-a31e-229d40a5d270","order_by":4,"name":"Petra Enges","email":"","orcid":"","institution":"University of Turku","correspondingAuthor":false,"prefix":"","firstName":"Petra","middleName":"","lastName":"Enges","suffix":""},{"id":453703869,"identity":"6522984e-457a-432f-8886-29fe9675ec5c","order_by":5,"name":"Jukka Hyönä","email":"","orcid":"","institution":"University of Turku","correspondingAuthor":false,"prefix":"","firstName":"Jukka","middleName":"","lastName":"Hyönä","suffix":""},{"id":453703870,"identity":"28e4cd12-0ede-4e9b-a181-460a24420581","order_by":6,"name":"Minna Lehtonen","email":"","orcid":"","institution":"University of Turku","correspondingAuthor":false,"prefix":"","firstName":"Minna","middleName":"","lastName":"Lehtonen","suffix":""},{"id":453703871,"identity":"07d76aa9-310b-495f-b0b2-5fdb96d2a537","order_by":7,"name":"Kenneth R. Pugh","email":"","orcid":"","institution":"University of Connecticut","correspondingAuthor":false,"prefix":"","firstName":"Kenneth","middleName":"R.","lastName":"Pugh","suffix":""},{"id":453703872,"identity":"bf90cb03-c127-40cc-b4c9-930f6aeae844","order_by":8,"name":"Jay G. Rueckl","email":"","orcid":"","institution":"University of Connecticut","correspondingAuthor":false,"prefix":"","firstName":"Jay","middleName":"G.","lastName":"Rueckl","suffix":""},{"id":453703873,"identity":"f5a65452-2726-4a98-bc07-60e509786c78","order_by":9,"name":"Rosa Salmela","email":"","orcid":"","institution":"University of Turku","correspondingAuthor":false,"prefix":"","firstName":"Rosa","middleName":"","lastName":"Salmela","suffix":""},{"id":453703874,"identity":"a6f4c0fa-279e-49b9-ad66-e77e766d6917","order_by":10,"name":"Noam Siegelman","email":"","orcid":"","institution":"Hebrew University of Jerusalem","correspondingAuthor":false,"prefix":"","firstName":"Noam","middleName":"","lastName":"Siegelman","suffix":""},{"id":453703875,"identity":"ef8c7b1f-5146-4bc2-a7a6-2c7ee116f072","order_by":11,"name":"Pekka Räsänen","email":"","orcid":"","institution":"University of Turku","correspondingAuthor":false,"prefix":"","firstName":"Pekka","middleName":"","lastName":"Räsänen","suffix":""}],"badges":[],"createdAt":"2025-04-14 16:54:07","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6448049/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6448049/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-28902-w","type":"published","date":"2025-12-29T15:56:59+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":82561842,"identity":"fb24ca29-15db-41a5-8572-810839459aed","added_by":"auto","created_at":"2025-05-13 01:40:47","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":50582,"visible":true,"origin":"","legend":"\u003cp\u003eAccuracy rates and RTs for words of d-Lexize89 for the interactions of grade with gender and home language.\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6448049/v1/fdc0ed180ba9ed4e8883034d.jpg"},{"id":82561838,"identity":"cbe50239-05a8-4763-97b0-78b342a9d48b","added_by":"auto","created_at":"2025-05-13 01:40:46","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":41334,"visible":true,"origin":"","legend":"\u003cp\u003eAccuracy rates and RT for words of d-Lexize55 for the interaction of grade with gender.\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6448049/v1/aa63ef92345cbd6389731a11.jpg"},{"id":82562687,"identity":"79082d90-c9df-4b00-b27f-dfb3bac46a51","added_by":"auto","created_at":"2025-05-13 01:48:47","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":60093,"visible":true,"origin":"","legend":"\u003cp\u003eAccuracy rates and RT for words of d-Lexize41 for the interactions of grade with gender and home language.\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-6448049/v1/aa3c6937fcc0f8569da53ca9.jpg"},{"id":99545293,"identity":"6b3318bd-e892-4e13-9b28-47a537fc925b","added_by":"auto","created_at":"2026-01-05 16:05:31","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1084525,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6448049/v1/4cdce24a-b4f3-4734-8e00-6d11718cbde1.pdf"},{"id":82562686,"identity":"3f07da09-e65f-4004-b3b8-c8d21e16d599","added_by":"auto","created_at":"2025-05-13 01:48:47","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":43218,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryFileBertram.docx","url":"https://assets-eu.researchsquare.com/files/rs-6448049/v1/594cfe03529f076694c12a7f.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Assessing Vocabulary Skills of School Children Aged 9 to 15 in Finland: Tracking the Gender and Home Language Gap","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003eChildhood is a time of rapid vocabulary growth. During their initial 6 years - that is, before they start to read - children have already accumulated a substantial vocabulary amounting to thousands of words. This progression continues steadily throughout the years of primary education. For German pupils, vocabulary grows from roughly 6,000 to 38,000 words between 1st and 8th grade (age 6\u0026ndash;14)\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e. The number of dictionary entries known by English-speaking Canadian children grows from about 10,000 to 40,000 from 1st to 5th grade\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. There are, however, significant disparities in vocabulary size and growth rate among school children. Biemiller showed that by the end of 2nd grade, English-speaking children in the lowest 25th percentile know about 4,000 root words, whereas those in the highest 25th percentile knew around 8,000 root words\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e. Song et al. identified substantial individual differences in vocabulary size for Chinese children from 4 years onwards already\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e. Noticeable individual differences in early vocabulary skills have also been reported for Finnish\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e, although these studies had small sample sizes and limited grade coverage.\u003c/p\u003e \u003cp\u003eVocabulary proficiency is at the heart of language proficiency. An extensive vocabulary is associated with good phonological skills\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e, syntactic advancement\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e, and well-developed listening and writing ability\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. Importantly, vocabulary growth plays a crucial role in reading development, as children rely on their word knowledge to make sense of texts. Torppa et al. found that smaller early vocabulary size translates into slower progress in reading\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. Conversely, Colenbrander et al. showed that children with poor reading comprehension typically exhibit relatively low vocabulary skills\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e. The relationship between vocabulary and reading comprehension is thus reciprocal. Children with smaller vocabularies tend to read less than their peers with larger vocabularies, which in turn reduces their exposure to new words.\u003c/p\u003e \u003cp\u003eDifferences in vocabulary size may mediate the impact of certain sociolinguistic factors on the development of reading comprehension. For example, in the latest PISA tests, assessing pupils across 81 countries\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e, gender and home language emerged as prominent predictors of reading comprehension. Specifically, boys demonstrated lower reading skills than girls (see also \u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e), and first- and second-generation immigrant children scored lower than their peers from families of native language speakers\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. These effects of gender and home language on reading comprehension are found in all languages tested within PISA including Finnish. Given the strong relation between vocabulary knowledge and reading comprehension, similar effects may be expected for vocabulary development. Since PISA identifies these gaps at age 15, it is crucial to track them earlier in development to understand both how they emerge and evolve over time, as well as the potential factors that contribute to their formation.\u003c/p\u003e \u003cp\u003eThe current study aims to track the gender and home language gap in vocabulary skills among children in Finland from the 3rd to the 9th grade, aged 9 to 15. In Finland, the 1st to 6th grade belong to primary education, while the 7th to 9th grade fall under lower secondary education. Both levels are compulsory for all children. Manu et al. found that while the gender gap in Finnish reading ability is negligible in the early stages of primary school education, it increases over time\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. However, despite the early similarity in reading skills across genders, vocabulary differences may emerge in the early primary school years, potentially predicting later reading disparities. This study investigates that possibility. It also examines the emergence and development of the home language gap in Finnish vocabulary. Specifically, we compare children from fully Finnish-speaking homes with those from non-Finnish-speaking homes, as well as those from mixed homes where one caregiver is Finnish and the other is non-native. There is a clear need for this type of research, as there is limited understanding of how these three distinct home language environments shape vocabulary development until the end of lower secondary education when the PISA test is administered.\u003c/p\u003e \u003cp\u003eTo accommodate this need, we developed d-Lexize (developmental Lexize), a comprehensive and reliable tool for assessing Finnish vocabulary skills in children from the 3rd to the 9th grade, aged 9 to 15 years. The d-Lexize test was derived from the Lexize vocabulary test for Finnish L2 speakers\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e. In turn, that test was modeled after the Lexical Test for Advanced Learners (LexTale), a validated test to assess vocabulary proficiency for adult English L2 learners\u003csup\u003e\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e. Like LexTale, Lexize is based on a visual lexical decision task, wherein participants judge whether a visually presented letter string is a word (e.g., savory) or a pseudoword (e.g., plaudate). In other words, our focus was on print vocabulary knowledge rather than vocabulary knowledge in general; however, throughout the text, we refer to it simply as vocabulary knowledge. Another point is that the test taps into vocabulary breadth, meaning the number of words that are known or can be recognized, rather than vocabulary depth, which refers to how well these words are understood\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. However, both dimensions of vocabulary knowledge not only correlate strongly with reading comprehension but also with each other\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. In other words, when a person recognizes a large number of words, they typically also possess ample syntactic, semantic, and practical knowledge about them.\u003c/p\u003e \u003cp\u003eThe Lexize test of Salmela et al. includes words that range from low to medium frequency, ensuring that it encompasses words of varying difficulty level\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e. Since Lexize detected differences among adult L1 and L2 Finnish speakers, we inferred that the test could serve as a starting point for designing a comprehensive Finnish vocabulary test for L1 and L2 school-aged children. This assertion is also supported by the Rapid Online Assessment of Reading ability test (ROAR) developed by Yeatman et al. for English\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e, which showed that visual lexical decision not only allows the assessment of adult vocabulary skills, but can be used equally well to tap into children\u0026rsquo;s vocabulary skills. Specifically, they created a simple web-based visual lexical decision task and showed that this can serve as an accurate and reliable measure of English reading ability (as assessed by the Woodcock-Johnson Word Identification test) from early childhood (6 years) onwards. An interesting finding of that study was that accuracy rate was a much better predictor of reading ability than reaction time. Yeatman et al. speculated that in languages with an opaque orthography, reaction time is not a reliable measure of individual differences, at least for young readers, but hypothesized that in transparent orthographies individual differences may be more distinctly reflected in reaction time\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e. They also noted that reaction time might work better with larger samples than the 100\u0026ndash;200 participants used in their studies.\u003c/p\u003e \u003cp\u003eThe current study employs a similar visual lexical decision in a transparent orthography, namely Finnish, in which each letter corresponds to a single phoneme and nearly every phoneme to a single letter (only the phoneme /ŋ/ as in \u003cem\u003ehanko\u003c/em\u003e does not correspond to a unique letter but to \u0026lsquo;nk\u0026rsquo; or \u0026lsquo;ng\u0026rsquo;). Moreover, by exploiting the nationwide educational platform ViLLE, to which 70% of all Finnish elementary and lower secondary schools are subscribed, we collected data from more than 27,000 pupils from the 3rd to the 9th grade. The transparent orthography is likely to ensure that both accuracy rate and reaction time can serve as reliable dependent measures, while the large sample size helps strengthen the generalizability of the findings. It is particularly valuable to use reaction time alongside accuracy rate, as this measure captures a different facet of vocabulary skills: accuracy rate reflects the number of words a participant knows, while reaction time more effectively captures the speed of lexical retrieval. Both aspects of vocabulary are essential skills that play distinct roles in reading fluency and reading comprehension. This is for instance shown in the ENRO (ENglish Reading Online) metastudy of Siegelman et al.\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e, who found through exploratory factor analyses that vocabulary accuracy and reaction time load on separate factors. Their results also showed that accuracy is closely linked to reading comprehension, while reaction time is more strongly associated with reading fluency.\u003c/p\u003e \u003cp\u003eThe current study allows for several key questions to be explored. First, does vocabulary proficiency increase steadily throughout the school years? Second, is there a gender gap in the early stages of education already or does it emerge later? Third, how large is the vocabulary gap between children from Finnish and children from non-Finnish homes, and does this gap narrow over time? Fourth, do children from mixed homes have lower vocabulary proficiency than children from Finnish homes and if so, to what extent? While this study focuses on vocabulary development in Finnish, similar issues are relevant for other languages, as highlighted by the recent PISA results on reading comprehension, which show consistent effects of gender and home language across languages\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eThe current study was conducted over three separate experiments, utilizing different versions of d-Lexize. Between experiments, the number of items was reduced based on Item Response Theory (IRT) analyses, driven by the goal of creating a more concise vocabulary test with the best possible items, while also considering the time constraints when testing school children. Each version of the d-Lexize task was tested for reliability and validated against a sentence reading proficiency test and a phonological word reading test (for specifications, see Methods section). The validity of the various d-Lexize versions was also assessed by analyzing the impact of word frequency and length on accuracy and response latencies, as these factors are known to influence reading and lexical decision\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e,\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eExperiment 1 tested approximately 7,000 children from the 3rd, 4th, and 7th grade; the other experiments included approximately 5,000 (Experiment 2) and 15,000 (Experiment 3) children from the 3rd to the 9th grade. Gender effects were assessed in all experiments, as the distribution of boys and girls was even across experiments. Home-language effects were assessed in Experiment 1 and 3 and involved 3 levels: children from exclusively Finnish-speaking homes (Finnish), children from mixed homes (Finnish/other), and children from non-Finnish-speaking homes (other). Information on the age of acquisition of the Finnish language was not available for the latter two categories, so the impact of home language was only assessed at a general level. Experiment 2 had too small a percentage of children from mixed and non-Finnish speaking households to consider home language as a variable.\u003c/p\u003e \u003cp\u003eData on accuracy rate and response latencies were analyzed using (generalized) linear mixed-effects models with the lme4 package\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e in R statistical software (Version 4.3.0\u003csup\u003e25\u003c/sup\u003e). Random intercepts were included for both participants and items\u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e\u003c/sup\u003e. We used GLMM to analyze accuracy rate and LMM for response latencies (RTs). The RTs that were computed were based on RTs for correct responses only. Due to skewness, RTs were log-transformed prior to analysis. We examined models where grade interacted with 3 or 4 variables: gender, home language (not in Experiment 2), word frequency, and word length with list as a control variable in Experiment 2 and 3 (as here we used two versions of d-Lexize). The analyses only included word data and not pseudoword data because words were our primary target of interest; moreover, the model includes Log Lemma Frequency as a predictor, a variable that is not available for pseudowords.\u003c/p\u003e \u003cp\u003eIn Experiment 1, including the 3rd, 4th, and 7th grade, grade was treated as a categorical variable due to discontinuity between the grades, while in Experiments 2 and 3, with all grades from 3rd to 9th included, it was treated as a numeric variable. Gender and home language were also entered as categorical variables. The categorical variables were dummy coded with 3rd grade, boys and Finnish as the reference categories, respectively. Significance of the full terms was assessed by Wald tests (χ\u0026sup2;) for accuracy and F-tests using the Satterthwaite approximation for the effective degrees of freedom for response latencies. For Experiment 1, post-hoc analyses were performed assessing the effect of gender and home language at each grade level with p-values being adjusted for False Discovery Rate\u003csup\u003e\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. The full models and the post-hoc analyses can be found in the statistical reports at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://osf.io/exb9s/?view_only=04dda2e11463493a8cc4bc76261d5e9c\u003c/span\u003e\u003cspan address=\"https://osf.io/exb9s/?view_only=04dda2e11463493a8cc4bc76261d5e9c\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e"},{"header":"RESULTS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eExperiment 1\u003c/h2\u003e \u003cp\u003e The Lexize task for adult L2 speakers included 68 words and 34 pseudowords, all of which were presented to the participants of Experiment 1. Due to the concern that some of these items might not by suitable for school children, we conducted an IRT analysis and identified 9 words and 4 pseudowords with poor psychometric properties (see Methods for further details). These items were excluded from subsequent analyses, resulting in a final set of 89 items, which we named d-Lexize89; the analyses in Experiment 1 are based on this set.\u003c/p\u003e \u003cp\u003eIn the analysis of accuracy rates, the main effects of grade (χ\u0026sup2;(2)\u0026thinsp;=\u0026thinsp;31.53, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), home language χ\u0026sup2;(2) = (686.5, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), and the interactions between grade and gender (χ\u0026sup2;(2)\u0026thinsp;=\u0026thinsp;16.04, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) as well as between grade and home language (χ\u0026sup2;(4)\u0026thinsp;=\u0026thinsp;25.17, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) were significant (see Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Post-hoc comparisons revealed a growing gender difference in vocabulary proficiency from 3rd to 7th grade: While no significant difference was found in grade 3 (OR\u0026thinsp;=\u0026thinsp;0.942, SE\u0026thinsp;=\u0026thinsp;0.038, Z = -1.502, p\u0026thinsp;=\u0026thinsp;.142), boys scored significantly lower than girls in grade 4 (OR\u0026thinsp;=\u0026thinsp;0.866, SE\u0026thinsp;=\u0026thinsp;0.034, Z = -3.639, p\u0026thinsp;=\u0026thinsp;.001) with an even larger disadvantage by grade 7 (OR\u0026thinsp;=\u0026thinsp;0.732, SE\u0026thinsp;=\u0026thinsp;0.036, Z = -6.387, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). For home language, post-hoc analyses showed that already in grade 3, children from Finnish-speaking homes outperformed those from mixed-language homes (OR\u0026thinsp;=\u0026thinsp;1.773, SE\u0026thinsp;=\u0026thinsp;0.13, Z\u0026thinsp;=\u0026thinsp;7.83, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) and non-Finnish homes (OR\u0026thinsp;=\u0026thinsp;4.238, SE\u0026thinsp;=\u0026thinsp;0.237, Z\u0026thinsp;=\u0026thinsp;25.81, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), with children from mixed-language homes also scoring higher than those from non-Finnish homes (OR\u0026thinsp;=\u0026thinsp;2.391, SE\u0026thinsp;=\u0026thinsp;0.206, Z\u0026thinsp;=\u0026thinsp;10.12, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). Similar patterns were observed in grade 4 (Finnish vs. Mixed: OR\u0026thinsp;=\u0026thinsp;1.789, SE\u0026thinsp;=\u0026thinsp;0.118, Z\u0026thinsp;=\u0026thinsp;8.809, p\u0026thinsp;\u0026lt;\u0026thinsp;.001; Finnish vs. Other: OR\u0026thinsp;=\u0026thinsp;4.237, SE\u0026thinsp;=\u0026thinsp;0.244, Z\u0026thinsp;=\u0026thinsp;25.08, p\u0026thinsp;\u0026lt;\u0026thinsp;.001; Mixed vs. Other: OR\u0026thinsp;=\u0026thinsp;2.368, SE\u0026thinsp;=\u0026thinsp;0.193, Z\u0026thinsp;=\u0026thinsp;10.57, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). By grade 7, the gap had widened, with children from Finnish-speaking homes maintaining the highest scores (Finnish vs. Mixed: OR\u0026thinsp;=\u0026thinsp;2.758, SE\u0026thinsp;=\u0026thinsp;0.239, Z\u0026thinsp;=\u0026thinsp;11.719, p\u0026thinsp;\u0026lt;\u0026thinsp;.001; Finnish vs. Other: OR\u0026thinsp;=\u0026thinsp;5.386, SE\u0026thinsp;=\u0026thinsp;0.377, Z\u0026thinsp;=\u0026thinsp;24.05, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) and a clear difference between children from mixed-language and non-Finnish homes still (Mixed vs. Other: OR\u0026thinsp;=\u0026thinsp;1.953, SE\u0026thinsp;=\u0026thinsp;0.203, Z\u0026thinsp;=\u0026thinsp;6.44, p\u0026thinsp;\u0026lt;\u0026thinsp;.001).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFor RT, there were again significant effects for grade (F(2, 25035)\u0026thinsp;=\u0026thinsp;52.31, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) and home language (F(2, 7041)\u0026thinsp;=\u0026thinsp;106.99, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) as well as interactions between grade and gender (F(2, 6811)\u0026thinsp;=\u0026thinsp;14.77, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), and grade and home language, F(4, 7042)\u0026thinsp;=\u0026thinsp;8.40, p\u0026thinsp;\u0026lt;\u0026thinsp;.001. Post-hoc comparisons revealed a shift in the gender effect across grades: in grade 3, boys responded significantly faster than girls (β = -0.047, SE\u0026thinsp;=\u0026thinsp;0.01, Z = -4.541, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), but by grade 4, there was no difference (β = -0.018, SE\u0026thinsp;=\u0026thinsp;0.01, Z = -1.746, p\u0026thinsp;=\u0026thinsp;.081), and by grade 7, the pattern had reversed, with boys now responding significantly slower than girls (β\u0026thinsp;=\u0026thinsp;0.041, SE\u0026thinsp;=\u0026thinsp;0.012, Z\u0026thinsp;=\u0026thinsp;3.269, p\u0026thinsp;=\u0026thinsp;.001). For home language, post-hoc comparisons showed that in grade 3, children from Finnish-speaking or mixed homes responded significantly faster than children from non-Finnish-speaking homes (Finnish vs. Other: β = -0.093, SE\u0026thinsp;=\u0026thinsp;0.015, Z = -6.379, p\u0026thinsp;\u0026lt;\u0026thinsp;.001; Mixed vs. Other: β = -0.095, SE\u0026thinsp;=\u0026thinsp;0.022, Z = -4.246, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), whereas the difference between fully Finnish and mixed language homes was not significant (Finnish vs Mixed: β\u0026thinsp;=\u0026thinsp;0.002, SE\u0026thinsp;=\u0026thinsp;0.019, Z\u0026thinsp;=\u0026thinsp;0.112, p\u0026thinsp;=\u0026thinsp;.91). The same pattern was found for the 4th grade: (Finnish vs Other: β = -0.111, SE\u0026thinsp;=\u0026thinsp;0.015, Z = -7.409, p\u0026thinsp;\u0026lt;\u0026thinsp;.001; Mixed vs Other: β = -0.084, SE\u0026thinsp;=\u0026thinsp;0.021, Z = -3.951, p\u0026thinsp;\u0026lt;\u0026thinsp;.001; Finnish vs. Mixed: β = -0.027, SE\u0026thinsp;=\u0026thinsp;0.017, Z = -1.596, p\u0026thinsp;=\u0026thinsp;.121). In grade 7, all home language contrasts reached significance (Finnish vs Mixed: β = -0.115, SE\u0026thinsp;=\u0026thinsp;0.022, Z = -5.168, p\u0026thinsp;\u0026lt;\u0026thinsp;.001, Finnish vs Other: β = -0.196, SE\u0026thinsp;=\u0026thinsp;0.018, Z = -10.788, p\u0026thinsp;\u0026lt;\u0026thinsp;.001; Mixed vs Other: β = -0.081, SE\u0026thinsp;=\u0026thinsp;0.027, Z = -2.992, p\u0026thinsp;=\u0026thinsp;.003). Similar to the findings for accuracy, the gap between children from Finnish and non-Finnish homes had also widened for RTs, reflecting an increasing disparity in lexical retrieval speed. Moreover, children from Finnish homes now responded faster than those from mixed-language homes, suggesting that a gap in lexical retrieval speed is also emerging between these groups across grades. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows the interactions of grade with gender and home language for accuracy and RT in Experiment 1.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eExperiment 2\u003c/h3\u003e\n\u003cp\u003e In Experiment 2 we sought to reduce the length of the d-Lexize test and to create an additional version of the test, allowing for multiple assessments and the potential development of a parallel assessment of oral vocabulary. Therefore, from the 89 items of d-Lexize89, we created two item lists of 36 words and 19 pseudowords (d-Lexize55a \u0026amp; d-Lexize55b), each containing 34 unique items and 21 shared items (for more detailed information, see Methods). The lists were matched on average item discriminability and accuracy rate in Experiment 1 as well as on average word and bigram frequency, word length, and orthographic neighborhood; list was entered as a control variable in the analyses.\u003c/p\u003e \u003cp\u003eFor accuracy, the analysis of Experiment 2 showed a marginal effect for grade (χ\u0026sup2;(1)\u0026thinsp;=\u0026thinsp;3.57, p\u0026thinsp;=\u0026thinsp;.059) and no effect for gender (χ\u0026sup2;(1)\u0026thinsp;=\u0026thinsp;1.05, p\u0026thinsp;=\u0026thinsp;.31), but the grade by gender interaction was significant (χ\u0026sup2;(1)\u0026thinsp;=\u0026thinsp;14.07, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001). As in Experiment 1, this indicates that the initially similar performance in early grades shifts to girls outperforming boys in later grades (see Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, left panel). For RT, significant effects were observed for grade (F(1, 55421)\u0026thinsp;=\u0026thinsp;40.97, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), gender (F(1, 5185)\u0026thinsp;=\u0026thinsp;10.06, p\u0026thinsp;=\u0026thinsp;.002), and the grade by gender interaction (F(1, 5161)\u0026thinsp;=\u0026thinsp;16.55, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). This interaction is in line with the RT results in Experiment 1 and indicates a shift from boys being faster in grade 3 to girls being faster in the later grades (see Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, right panel). The effect of list was not significant (Accuracy, p\u0026thinsp;=\u0026thinsp;.98; RT, p\u0026thinsp;=\u0026thinsp;.08). Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e shows the interaction of grade with gender for accuracy and RT in Experiment 2.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eExperiment 3\u003c/h3\u003e\n\u003cp\u003eIRT analyses of Experiment 2 identified 10 words with poor psychometric properties (see Methods for more details). These items were excluded from Experiment 3. From the remaining 79 items, we constructed two lists with 25 words and 16 pseudowords (d-Lexize41a \u0026amp; d-Lexize41B), each of which had 38 unique items and 3 shared items. The lists were matched on average item discriminability and item accuracy from Experiment 2 as well as on average word and bigram frequency, word length, and orthographic neighborhood.\u003c/p\u003e \u003cp\u003eFor accuracy, there was a significant main effect for grade (χ\u0026sup2;(1)\u0026thinsp;=\u0026thinsp;57.31, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), but not for gender (χ\u0026sup2;(1)\u0026thinsp;=\u0026thinsp;2.14, p\u0026thinsp;=\u0026thinsp;.14). The interaction between grade and gender was again significant (χ\u0026sup2;(1)\u0026thinsp;=\u0026thinsp;25.55, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), indicating that similar performance in earlier grades turned into an advantage for girls in later grades. There was also a main effect for home language (χ\u0026sup2;(2)\u0026thinsp;=\u0026thinsp;296.02, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), indicating that children from exclusively Finnish homes outperformed children from mixed homes (OR\u0026thinsp;=\u0026thinsp;0.63, CI: 0.52\u0026ndash;0.77, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) and children from non-Finnish homes (OR\u0026thinsp;=\u0026thinsp;0.63, 95% CI: 0.25\u0026ndash;0.33, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). The significant interaction between grade and home language (χ\u0026sup2;(2)\u0026thinsp;=\u0026thinsp;6.05, p\u0026thinsp;=\u0026thinsp;.049) reflected that the gap between children from exclusively Finnish homes with children from mixed homes (OR\u0026thinsp;=\u0026thinsp;0.97, CI: 0.93\u0026ndash;1.00, \u003cem\u003ep\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.076) and non-Finnish homes (OR\u0026thinsp;=\u0026thinsp;0.97, CI: 0.95\u0026ndash;1.00, p\u0026thinsp;=\u0026thinsp;0.061) is growing over the years. The effect of list was not significant (χ\u0026sup2;(1)\u0026thinsp;=\u0026thinsp;3.82, p\u0026thinsp;=\u0026thinsp;.051). The interactions between grade and gender and grade and home language are depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e (left panels).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFor RT, there was a significant main effect for grade (F(1, 77857)\u0026thinsp;=\u0026thinsp;387.78, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) and gender (F(1, 14562)\u0026thinsp;=\u0026thinsp;98.32, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) and the interaction between grade and gender was also significant (F(1, 14375)\u0026thinsp;=\u0026thinsp;90.89, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). The interaction indicated that the initial faster responses for boys in the early grades swapped towards faster responses for the girls in the later grades. There was no main effect for home language (F(2, 15623)\u0026thinsp;=\u0026thinsp;2.09, p\u0026thinsp;=\u0026thinsp;.12), but the interaction between grade and home language was again significant (F(2, 15033)\u0026thinsp;=\u0026thinsp;44.30, p\u0026thinsp;\u0026lt;\u0026thinsp;.001). This indicated that the gap between children from Finnish-speaking homes and those from mixed-language homes widens over the school years (β\u0026thinsp;=\u0026thinsp;0.01, 95% CI: 0.00\u0026ndash;0.02, p\u0026thinsp;=\u0026thinsp;0.01) and grows even more in relation to children from non-Finnish-speaking homes (β\u0026thinsp;=\u0026thinsp;0.04, 95% CI: 0.03\u0026ndash;0.04, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). The effect of list was not significant (F(2, 33497)\u0026thinsp;=\u0026thinsp;0.62, p\u0026thinsp;=\u0026thinsp;.43). Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows the interactions of grade with gender and home language for accuracy and RT in Experiment 3.\u003c/p\u003e"},{"header":"GENERAL DISCUSSION","content":"\u003cp\u003eThe results from our large-scale study provide clear and consistent answers to the questions posed in the Introduction, observed across three different versions of the d-Lexize vocabulary test and three different samples. First, as expected, it became clear that vocabulary proficiency increases steadily throughout the school years. This is reflected in increasing accuracy demonstrating vocabulary growth from 3rd to 9th grade, in line with earlier studies showing a steady increase of vocabulary during the school years in German\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e and English\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. Increasing vocabulary proficiency is also reflected by the progressively faster response latencies, indicating increasing speed of lexical retrieval, in line with studies that show a solid increase in reading speed throughout the school years\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eSecond, a gender gap was observed in vocabulary skills, similar to what has been reported for reading comprehension, where girls tend to outperform boys\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. Importantly, however, this gap was not evident in the early school years. Specifically, across all Experiments, 3rd grade boys appeared to be on par with 3rd grade girls in vocabulary knowledge and even demonstrated faster lexical retrieval. By 7th grade, however, around the transition from primary to lower secondary school, girls had surpassed boys in both vocabulary size and retrieval speed. This suggests that boys\u0026rsquo; weaker vocabulary skills do not clearly precede their weaker reading comprehension, as speculated in the Introduction. Instead, vocabulary difficulties appear to align with the developmental trajectory of reading comprehension difficulties\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. The gender gap in reading comprehension is often attributed to differences in reading motivation and habits, with girls typically displaying greater intrinsic reading motivation and reading more frequently than boys\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u003c/sup\u003e. This difference becomes even more evident by the end of elementary school, around 6th grade\u003csup\u003e\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e\u003c/sup\u003e. It is likely that the observed disparity and development in vocabulary skills stems from similar underlying factors.\u003c/p\u003e \u003cp\u003eThird, not speaking Finnish at home affects vocabulary proficiency greatly. Children from non-Finnish homes have a smaller vocabulary and are slower in lexical retrieval than children from exclusively Finnish or mixed homes, which is in line with the most recent PISA results\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. These differences seem to be present throughout the school years, and are even more pronounced in the later grades than in the earlier grades. Since we did not collect information on the age at which children acquired Finnish, we cannot determine whether this widening gap is linked to students in higher grades having spent less time in Finland than those in lower grades. This is something we plan to explore further in future testing rounds. However, it is likely that our results reflect that the school environment and education are not effectively bridging the vocabulary and reading comprehension gap, which, in an ideal scenario, they would. This may especially be the case if schools have a high concentration of children from immigrant backgrounds, which may lead to increased use of languages other than Finnish in informal settings, such as the schoolyard, and may require more classroom time to be dedicated to foundational vocabulary and reading skills. At least part of the schools we tested may have such a situation, as our samples include the Helsinki area, where the proportion of pupils from immigrant background can be quite substantial, even exceeding 50% in some schools. Parental support for developing Finnish proficiency in non-Finnish homes may also be limited, partly because families may prioritize maintaining their heritage language at home, partly because they are not proficient in Finnish themselves.\u003c/p\u003e \u003cp\u003eFourth, having another home language in addition to Finnish turns out to affect vocabulary proficiency as well, even though the gap is smaller compared to children from non-Finnish homes. However, the results are evident in both vocabulary knowledge and lexical retrieval. These results align with other studies indicating that simultaneous bilingualism can limit proficiency in the dominant language due to less frequent exposure and use of words. For instance, Bialystok et al. reported the results of an analysis of 1,738 children between 3 and 10 years old and demonstrated a consistent difference in receptive vocabulary between monolinguals and bilinguals\u003csup\u003e\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. This aligns with the notion that bilingual children in early childhood receive less exposure to the majority language at home than monolingual children. As a result, they encounter fewer words, leading to lower accuracy scores, and have less repeated exposure to the words they do learn, contributing to slower lexical retrieval. Perhaps surprisingly though, our results show that the vocabulary gap widens as children progress through school. One might expect children from mixed-language homes to catch up over time, as their daily exposure to Finnish increases in elementary school - not only through spoken interactions but also through written language. To gain a deeper understanding of the widening gap between monolingual and bilingual pupils, future studies should include more detailed questionnaires on language use of bilingual children at home and elsewhere and explore their relationship to vocabulary development.\u003c/p\u003e"},{"header":"Conclusions and future directions","content":"\u003cp\u003eIn the current study the observed differences across grade, gender, and home language exposure highlight how maturation, education, and sociocultural factors shape vocabulary proficiency, including both vocabulary size and lexical retrieval speed. Since vocabulary proficiency develops across the school years and disparities emerge gradually, monitoring vocabulary growth from early education would be essential. However, this kind of monitoring is rarely implemented in school assessment practices, partly due to the limited availability of standardized vocabulary assessments in many languages, including Finnish. The present study addresses this need by developing d-Lexize, a valid and reliable instrument for assessing vocabulary proficiency. The next step in this process involves creating normative data for 1st to 9th grade, a task we are currently undertaking. Expanding access to reliable vocabulary measures would support early identification of difficulties and enable targeted vocabulary interventions.\u003c/p\u003e \u003cp\u003eThe current study is part of a larger initiative entitled the Multilingual Reading Assessment (MUREA) project. Future research within the MUREA framework aims to comprehensively investigate the development and remediation of components of reading comprehension in individuals from diverse linguistic backgrounds. In addition to vocabulary, this includes phonology, morphology, syntax, and self-regulation skills. As such, the project aims to contribute to both reading research and practical applications in educational settings, ultimately promoting more equitable language development across diverse learner populations.\u003c/p\u003e "},{"header":"METHODS","content":"\u003cp\u003eFor each Experiment, the R code as well as R-generated HTML-reports are available at the project's OSF page, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://osf.io/exb9s/?view_only=04dda2e11463493a8cc4bc76261d5e9c\u003c/span\u003e\u003cspan address=\"https://osf.io/exb9s/?view_only=04dda2e11463493a8cc4bc76261d5e9c\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. The HTML-reports include the participant selection procedure, IRT analyses, validity and reliability tests, Monte Carlo simulations, (g)lmm models, and post-hoc analyses as well as outlier removal, including participant exclusions and filtering of extreme response times.\u003c/p\u003e\u003ch3\u003eParticipants\u003c/h3\u003e\u003cp\u003eThe children were sampled from children from schools across the regions of Uusimaa and Southwest Finland. Across experiments, around 27,000 participants were included in the analyses, with some excluded beforehand (see HTML reports for details). From all the children included in the analyses informed parental consent was obtained. A total of 6,988 children (50% boys, 50% girls; home language: 77% Finnish; 14% Other; 9% Finnish/Other) from the 3rd (n = 2,607), 4th (n = 2,691), and 7th grade (n = 1,784) were included in the analyses of Experiment 1. In the analyses of Experiment 2, 5,205 children (52% girls, 48% boys) with home language Finnish from the 3rd to the 9th grade (3rd : 862; 4th : 880; 5th : 877; 6th : 957; 7th : 726; 8th : 540; and 9th : 363) were included. Here children from mixed or non-native homes were excluded, as their percentage was very low in this sample (about 3%); hence, home language was not used as a variable in the analyses. For Experiment 3, the analyses included 14,634 children (50% girls, 50% boys) from the 3rd to the 9th grade (3rd : 5,108; 4th : 1,577; 5th : 1,789; 6th : 1,341; 7th : 3,301; 8th : 949; and 9th : 776). Here home language was used as a variable again, as there was a sufficient number of children from mixed (n = 1110) and non-Finnish homes (n = 1803).\u003c/p\u003e\u003ch3\u003eProcedure\u003c/h3\u003e\u003cp\u003eBefore the experimental tests, participants were administered a questionnaire including questions about gender and home languages. After this, participants completed the d-Lexize task and two tasks that were used for validation: the sentence reading fluency task (SRF; Experiments 1–3) and the phonological word reading task (PWR; Experiments 2 and 3). Prior to starting d-Lexize, participants were presented with instructions explaining that letter strings would be displayed one by one and that they were to indicate whether each letter string was a word or not by pressing the “yes” or “no” button. After the instructions, participants were presented with 4 or 5 practice items, depending on the experiment. Each letter string was preceded by a fixation point that remained on the screen for 1000 ms. There was a time limit for the individual items of 4000 ms in Experiment 1 and 2 and 5000 ms in Experiment 3. The d-Lexize test lasted 5 to 10 minutes, the whole battery between 25 and 40 minutes.\u003c/p\u003e\u003ch2\u003eThe sentence reading fluency test (SFR)\u003c/h2\u003e\u003cp\u003eIn the SFR, participants read a series of sentences and are asked to indicate whether each sentence is true or not. The task is based on the widely used Woodcock-Johnson IV (WJ IV) sentence reading fluency test in English\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e. The current Finnish test includes four practice stimuli, followed by feedback after each response. After completing the practice items, each participant progresses through the task with as many stimuli as can be processed within the specified time limit (1 minute and 30 seconds). A maximum of 41 sentences are presented, in fixed order. Statements are relatively easy to respond to (e.g., “stones can be eaten”, “cows fly”, “fishes swim”), so they elicit a high accuracy rate. However, how quickly the sentences are read varies considerably across grades and across children. The internal consistency for the test is excellent for reaction times and moderate for accuracy rate (Guttman's lambdas were .94 and .74, respectively).\u003c/p\u003e\u003ch2\u003eThe phonological word reading test (PWR)\u003c/h2\u003e\u003cp\u003eIn the PWR, participants are tasked with choosing a word that matches a presented picture. The task includes four practice stimuli, followed by feedback after each response. After completing the practice items, each participant progresses through the task with as many stimuli as can be processed within the specified time limit (2 minutes). A maximum total of 140 trials are presented in fixed order. In addition to the correct word, participants are presented with three foils, including words and nonwords, that vary in phonological - and, by extension, orthographic - similarity to the correct choice. In this task, word level reading is evaluated by assessing the participant's ability to read and determine the word corresponding to the picture accurately. The internal consistency of the test is excellent for reaction times (Guttman’s lambda = 0.92) and good for accuracy rate (Guttman’s lambda = 0.83).\u003c/p\u003e\u003ch2\u003eThe d-Lexize vocabulary test\u003c/h2\u003e\u003cp\u003eIn Experiment 1, we utilized the Lexize version developed by Salmela et al.\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e, comprising 102 items, with 68 being Finnish words and 34 phonotactically valid Finnish pseudowords. Lexical-statistical characteristics were extracted from a Finnish newspaper corpus containing 22.7\u0026nbsp;million word forms, utilizing the lexical search program WordMill\u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e. The chosen words were drawn from six distinct frequency ranges: 17 words with frequency \u0026lt; 1 per million (pm); 20 words with frequency 1–5 pm; 19 words with frequency 5–10 pm; 11 words with frequency 10–20 pm; 1 word with frequency 20–100 pm. The word set predominantly consisted of nouns (n = 52), with a smaller representation of verbs (n = 7) and adjectives (n = 9), reflecting the proportion of these word classes in natural language. To avoid morphological structure aiding word recognition, all selected words were monomorphemic\u003csup\u003e\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e. Word length varied from 4 to 9 letters (M = 5.8). The set of 34 pseudowords was derived from words that were matched in part of speech, length, and frequency with the 68 word items. In these chosen words, 1 to 3 letters were altered such that phonotactically valid pseudowords were created. The mean bigram frequency of the pseudowords (M = 5.9, SD = 2.5) aligned with that of the selected words (M = 6.1, SD = 2.6), as confirmed by an independent samples t-test (t(100) = 0.53, p = 0.60). This ensured that the letter patterns of the pseudowords mimic those in words. Subsequent versions of d-Lexize, i.e. d-Lexize89, d-Lexize55a, d-Lexize55b, d-Lexize41a, and d-Lexize41b, preserved the same properties as the original version. The characteristics of each d-Lexize version are listed in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. All items from all versions are listed in Supplementary Table\u0026nbsp;1.\u003c/p\u003e\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e\u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c12\" colnum=\"12\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eProperties of the different Lexize versions.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"12\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVersion\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eN\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eW\u003csup\u003ea\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePW\u003csup\u003eb\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNom\u003csup\u003ec\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eAdj\u003csup\u003ed\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eV\u003csup\u003ee\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eFreq\u003csup\u003ef\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eLenW\u003csup\u003eg\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eLenPW\u003csup\u003eh\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eBiW\u003csup\u003ei\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c12\"\u003e \u003cp\u003eBiPW\u003csup\u003ej\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLexize\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e102\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e68\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e34\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e49\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e12\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e5.6 (0.1–26.2)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e5.8 (4–9)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e6.1 (5–9)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e6.2\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e5.9\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize89\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e89\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e59\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e30\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e43\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e6.0 (0.1–26.2)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e5.8 (4–9)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e6.2 (5–9)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e6.2\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e6.2\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize55a\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e55\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e19\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e26\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e5.9 (0.1–18.8)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e5.9 (4–9)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e6.3 (5–9)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e6.0\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e6.2\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize55b\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e55\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e19\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e28\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e6.1 (0.1–26.2)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e5.9 (4–9)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e6.2 (5–9)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e6.3\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e6.2\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize41a\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e41\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e5.9 (0.1–18.8)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e6.0 (4–9)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e6.3 (5–9)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e6.4\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e6.2\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize41b\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e41\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e19\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e6.1 (0.1–26.2)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e5.6 (4–7)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c10\"\u003e \u003cp\u003e6.2 (5–9)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c11\"\u003e \u003cp\u003e5.7\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c12\"\u003e \u003cp\u003e6.2\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e\u003cp\u003ea. No. of Words; b. No. of Pseudowords; c. No. of Nouns; d. No. of Adjectives; e. No. of Verbs; f. Frequency per million (range);\u003c/p\u003e\u003cp\u003eg. Length of words (range); h. Length of pseudowords (range); i. Bigram Frequency of Words per 1000; j. Bigram Frequency of Pseudowords per 1000.\u003c/p\u003e\u003ch2\u003eFrom Lexize to d-Lexize89 to d-Lexize55 to d-Lexize41 via IRT and MC simulations\u003c/h2\u003e\u003cp\u003eExperiment 1 started with the 68 words and 34 pseudowords from the Lexize test originally designed for L2 adults\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e. Given the possibility that some of these items may not be appropriate for a vocabulary test for school children, we conducted Item Response Theory (IRT) analyses on the combined data from the 3rd, 4th, and the 7th grade using the one-parameter (1PL), two-parameter (2PL), and three-parameter (3PL) logistic models, which account for item difficulty, discrimination, and guessing, respectively. The 3PL model provided the best fit statistics (see section 2.4.4 from the html-report on Experiment 1) and identified 9 words (itara ‘stingy’, kovera ‘concave’, houre ‘phantom’, uuhi ’ewe’, kieppi ’coil’, vouti ’magistrate’, aihio ’work in progress’, purje ’sail’ and suppea ’narrow’) and 4 pseudowords with poor psychometric properties, which either meant relatively low discriminability values, relatively low difficulty values, or close-to-zero guessing values. The exclusion of these items resulted in a final set of 89 items, which we named d-Lexize89; the analyses in Experiment 1 are based on this set. Next we conducted a Monte Carlo simulation on \u003cb\u003ed-Lexize89\u003c/b\u003e with 17 sub-samples (sizes 5–85 in steps of 5). Table\u0026nbsp;3.3 in the HTML report shows mean correlations across 1000 repetitions. A 55-item sub-sample correlated almost perfectly (.96 for accuracy, .99 for response latencies) with the full scale, so we used this number of items for the lists we created in Experiment 2.\u003c/p\u003e\u003cp\u003e \u003cb\u003eExperiment 2\u003c/b\u003e split d-Lexize89 into two lists (\u003cb\u003ed-Lexize55a \u0026amp; d-Lexize55b\u003c/b\u003e), each with 36 words and 19 pseudowords (34 unique and 21 shared). Lists were matched on item discriminability and accuracy derived from the IRT analyses in Experiment 1, as well as on average frequency, word length, bigram frequency, and orthographic neighborhood (see Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The division into two lists of 55 items with equal lexical-statistical properties was made with future experimentation in mind, such as multiple testing and comparing vocabulary skills through parallel auditory and visual lexical decision tasks. Next, we again conducted a Monte Carlo simulation on \u003cb\u003ed-Lexize55a and d-Lexize55b\u003c/b\u003e with 10 sub-samples (sizes 5–50 in steps of 5). The table in section 4 of the HTML-report on Experiment 2 shows mean correlations across 1000 repetitions. A 40-item sub-sample correlated almost perfectly (.96 for accuracy, .99 for response latencies, for both lists) with the full scale, so we used this number of items for the lists we created for Experiment 3. IRT analyses combining data from the 3rd to the 9th grades detected the 10 words with the least favorable psychometric properties and excluded them for Experiment 3. The 10 excluded items were: juhta ‘beast of burden’, kolttu ‘old-fashioned dress’, rahvas ‘common people’, pisara ’drop’, kohtu ’uterus’, tyrkyttää ’intrude’, parvi ’loft’, nuotio ’campfire’, hauki ‘pike’ and sukeltaa ’dive’.\u003c/p\u003e\u003cp\u003e \u003cb\u003eExperiment 3\u003c/b\u003e further refined d-Lexize89 by excluding 10 words based on Experiment 2’s IRT analyses. The remaining 79 items were split into two lists (\u003cb\u003ed-Lexize41a \u0026amp; d-Lexize41b\u003c/b\u003e), each containing 25 words and 16 pseudowords—38 unique and 3 shared—again matched for key linguistic properties (see Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\u003ch2\u003eReliability Estimates for d-Lexize\u003c/h2\u003e\u003cp\u003eWe calculated Cronbach's alpha and Guttman’s lambda 4 reliability coefficients for each class separately and for the total sample in each of the d-Lexize versions. Here, we report the values for the total sample; however, the results for each grade were similar (for more details, see the internal consistency sections in each experiment’s HTML report). For d-Lexize89, Cronbach's alpha was .90 for accuracy and .98 for RT, with Guttman’s lambda at .92 and .98, respectively. For d-Lexize-55a, Cronbach's alpha was .78 for accuracy and .98 for latencies, and Guttman’s lambda was .82 and .98, respectively. For d-Lexize-55b, Cronbach's alpha was .79 for accuracy and .98 for latencies, with Guttman’s lambda at .83 and .98, respectively. For d-Lexize-41a, Cronbach's alpha was .80 for accuracy and .97 for latencies, and Guttman’s lambda was .84 and .98, respectively. For d-Lexize-41b, Cronbach's alpha was .83 for accuracy and .97 for latencies, with Guttman’s lambda at .86 and .98, respectively. These values indicate that all d-Lexize versions are highly reliable.\u003c/p\u003e\u003ch2\u003eValidity Estimates for d-Lexize\u003c/h2\u003e\u003cp\u003eTo further assess the validity of each d-Lexize version, we compared pupils' performance in d-Lexize with their performance on two other tests, the SFR and PWR. Specifically, we calculated the correlations between the accuracy rates and response latencies of each Lexize test and those of the SFR (Experiments 1–3) and the PWR test (Experiments 2 and 3). It is important to note that both the SFR and PWR test items are relatively easy, resulting in high accuracy rates (90% or more) and minimal variation (even in the lower grades), deflating the correlations. Therefore, the response latencies in both tests are more reliable indicators of the processing effort required by the children. This is also reflected in the correlations, with the strongest associations observed between d-Lexize RTs and those of the SFR and PWR (ranging from 0.69 to 0.82), indicating excellent validity. Full correlation details are presented in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e\u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCorrelations between Accuracy Rates and Reaction Times of d-Lexize tests with those of the SFR and the PWR tests. All bolded values are significant at the p \u0026lt; .001-level, the non-bolded values are not significant at the p \u0026lt; 0.05 level.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVariable\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSFR_Acc\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSFR_RT\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003ePWR_Acc\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePWR_RT\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize89_Acc\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e0.33\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e-0.45\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize89_RT\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.02\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.69\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize55a_Acc\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e0.30\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e-0.29\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.33\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e-0.13\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize55a_RT\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.03\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.82\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e-0.24\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.71\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize55b_Acc\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e0.25\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e-0.35\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.32\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e-0.23\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize55b_RT\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.03\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.82\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e-0.30\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.72\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize41a_Acc\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e0.43\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e-0.46\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.45\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e-0.30\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize41a_RT\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e-0.10\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.79\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e-0.27\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.72\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize41b_Acc\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e0.39\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e-0.42\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.40\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e-0.24\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ed-Lexize41b_RT\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e-0.09\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.80\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e-0.27\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e0.73\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e\u003cp\u003eThe validity of the different d-Lexize versions was also evaluated by examining the effects of word frequency and length on accuracy and response latencies, as these factors are known to influence reading and lexical decision tasks\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e. Both factors were analyzed in interaction with grade, as we expected their effects to vary by grade – a pattern that emerged in most analyses. The corresponding statistics and figures illustrating these interactions are presented in the mixed-effects models section of the HTML reports. Most importantly, consistent with previous research and across all grades and experiments, lower word frequency was associated with lower accuracy and slower response latencies, while longer words led to slower response latencies, supporting the validity of all d-Lexize versions.\u003c/p\u003e\u003ch2\u003eEthics declarations\u003c/h2\u003e\u003cp\u003e The datasets utilized in this study were obtained through a collaborative effort between the University of Turku and several municipalities. Each municipality’s educational office independently decided whether to participate in the assessments organized by the University. Furthermore, the municipalities determined the specific grade levels that would be included in the evaluation. The primary objective of the assessments was to provide teachers and educational authorities within each municipality with information regarding pupils’ strengths and challenges in reading and mathematical skills. To facilitate the provision of feedback to teachers about their own pupils, each child was identified using a school-based login system.\u003c/p\u003e\u003cp\u003eThe research activities were conducted independently from the municipal assessments and their analyses. The research team requested permission from each municipality to utilize the assessment data for research purposes, thereby repurposing pre-existing registry data. Each municipality independently managed the process of obtaining informed parental consent, permitting the release of data to a predefined group of researchers in a pseudonymized format. This format had excluded all direct identifiers, such as personal login ID, names, school affiliations, and municipal identifiers. All necessary ethical approvals were obtained in advance of the study. The research was conducted in full compliance with Finnish legislation and adhered to the guidelines set forth by the Finnish National Board on Research Integrity (TENK).\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eSupplementary material\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA supplementary table with all the items included in all the versions of d-Lexize is added as a supplementary file.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors contributed to the design of the study. T.R., S.H., P.E., and P.R. collected the data. S.H., N.S., and R.B. analyzed the data. All authors contributed to the writing of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eExtensive data analysis reports and R script are available at https://osf.io/exb9s/?view_only=04dda2e11463493a8cc4bc76261d5e9c. The datasets themselves, generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eSegbers, J. \u0026amp; Schroeder, S. How many words do children know? A corpus-based estimation of children\u0026rsquo;s total vocabulary size. \u003cem\u003eLang. Test.\u003c/em\u003e \u003cb\u003e34\u003c/b\u003e, 297\u0026ndash;320 (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAnglin, J. M. Vocabulary development: A morphological analysis. \u003cem\u003eMonogr. Soc. Res. Child. Dev. Serial No\u003c/em\u003e. \u003cb\u003e238\u003c/b\u003e, 58 (1993).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBiemiller, A. Size and sequence in vocabulary development: Implications for choosing words for primary grade vocabulary instruction. In Teaching and Learning Vocabulary: Bringing Research to Practice (eds Hiebert, A. \u0026amp; Kamil, M.) 223\u0026ndash;242 (Erlbaum, (2005).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSong, S. et al. Tracing children's vocabulary development from preschool through the school‐age years: An 8‐year longitudinal study. \u003cem\u003eDev. Sci.\u003c/em\u003e \u003cb\u003e18\u003c/b\u003e, 119\u0026ndash;131 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHonko, M. Alakouluikaisten leksikaalinen tieto ja taito. Toisen sukupolven suomi ja S1-verrokit [Lexical knowledge and skills in primary school children. Second generation L2 Finnish speakers and L1 peers]. \u003cem\u003eDoctoral Thesis\u003c/em\u003e (University of Tampere, 2013). Retrieved July 28, from (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://trepo.tuni.fi/handle/10024/94544\u003c/span\u003e\u003cspan address=\"https://trepo.tuni.fi/handle/10024/94544\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSaarela, L. Peruskoululaisten kirjoitelmien kehittyminen sanastotutkimuksen valossa [The developmental trajectory of elementary school children\u0026rsquo;s essays in the light of vocabulary research]. Unpublished doctoral dissertation. Oulu: University of Oulu (1997).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLonigan, C. J. Vocabulary development and the development of phonological awareness skills in preschool children. In \u003cem\u003eVocabulary Acquisition: Implications Read. Comprehension\u003c/em\u003e 15\u0026ndash;31 (2007).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBates, E. \u0026amp; Goodman, J. On the emergence of grammar from the lexicon. In The Emergence of Language (ed. 29\u0026ndash;79 (Erlbaum, (1999).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSt\u0026aelig;hr, L. S. Vocabulary size and the skills of listening, reading and writing. \u003cem\u003eLang. Learn. J.\u003c/em\u003e \u003cb\u003e36\u003c/b\u003e, 139\u0026ndash;152 (2008).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTorppa, M., Vasalampi, K., Eklund, K. \u0026amp; Niemi, P. Long-term effects of the home literacy environment on reading development: Familial risk for dyslexia as a moderator. \u003cem\u003eJ. Exp. Child. Psychol.\u003c/em\u003e \u003cb\u003e215\u003c/b\u003e, 105314 (2022).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eColenbrander, D., Kohnen, S., Smith-Lock, K. \u0026amp; Nickels, L. Individual differences in the vocabulary skills of children with poor reading comprehension. \u003cem\u003eLearn. Individ Differ.\u003c/em\u003e \u003cb\u003e50\u003c/b\u003e, (2016).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOECD. PISA 2022 Assessment and Analytical Framework, PISA. OECD Publishing. (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTorppa, M., Eklund, K., Sulkunen, S., Niemi, P. \u0026amp; Ahonen, T. Why do boys and girls perform differently on PISA Reading in Finland? The effects of reading fluency, achievement behaviour, leisure reading and homework activity. \u003cem\u003eJ. Res. Read.\u003c/em\u003e \u003cb\u003e41\u003c/b\u003e, 122\u0026ndash;139 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSmith, E. \u0026amp; Reimer, D. Understanding gender inequality in children's reading behavior: New insights from digital behavioral data. \u003cem\u003eChild. Dev.\u003c/em\u003e \u003cb\u003e95\u003c/b\u003e, 625\u0026ndash;635 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eManu, M. et al. Reading development from kindergarten to age 18: The role of gender and parental education. \u003cem\u003eRead. Res. Q.\u003c/em\u003e \u003cb\u003e58\u003c/b\u003e, 505\u0026ndash;538 (2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSalmela, R., Lehtonen, M., Garusi, S., Bertram, R. \u0026amp; Lexize A test to quickly assess vocabulary knowledge in Finnish. \u003cem\u003eScand. J. Psychol.\u003c/em\u003e \u003cb\u003e62\u003c/b\u003e, 806\u0026ndash;819. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/sjop.12768\u003c/span\u003e\u003cspan address=\"10.1111/sjop.12768\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLemh\u0026ouml;fer, K. \u0026amp; Broersma, M. Introducing LexTALE: A quick and valid lexical test for advanced learners of English. \u003cem\u003eBehav. Res. Methods\u003c/em\u003e. \u003cb\u003e44\u003c/b\u003e, 325\u0026ndash;343 (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchmitt, N. Size and depth of vocabulary knowledge: What the research shows. \u003cem\u003eLang. Learn.\u003c/em\u003e \u003cb\u003e64\u003c/b\u003e, 913\u0026ndash;951 (2014).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, M. \u0026amp; Kirby, J. R. The effects of vocabulary breadth and depth on English reading. \u003cem\u003eAppl. Linguist\u003c/em\u003e. \u003cb\u003e36\u003c/b\u003e, 611\u0026ndash;634 (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYeatman, J. D. et al. Rapid online assessment of reading ability. \u003cem\u003eSci. Rep.\u003c/em\u003e \u003cb\u003e11\u003c/b\u003e, 1\u0026ndash;11 (2021).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSiegelman, N. et al. Rethinking first language\u0026ndash;second language similarities and differences in English proficiency: Insights from the ENglish Reading Online (ENRO) project. \u003cem\u003eLang. Learn.\u003c/em\u003e \u003cb\u003e74\u003c/b\u003e, 249\u0026ndash;294 (2024).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRayner, K. Eye movements in reading and information processing: 20 years of research. \u003cem\u003ePsychol. Bull.\u003c/em\u003e \u003cb\u003e124\u003c/b\u003e, 372\u0026ndash;422 (1998).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchr\u0026ouml;ter, P. \u0026amp; Schroeder, S. The Developmental Lexicon Project: A behavioral database to investigate visual word recognition across the lifespan. \u003cem\u003eBehav. Res.\u003c/em\u003e \u003cb\u003e49\u003c/b\u003e, 2183\u0026ndash;2203. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3758/s13428-016-0851-9\u003c/span\u003e\u003cspan address=\"10.3758/s13428-016-0851-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2017).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBates, D., M\u0026auml;chler, M., Bolker, B. \u0026amp; Walker, S. Fitting linear mixed-effects models using lme4. \u003cem\u003eJ. Stat. Softw.\u003c/em\u003e \u003cb\u003e67\u003c/b\u003e, 1\u0026ndash;48. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.18637/jss.v067.i01\u003c/span\u003e\u003cspan address=\"10.18637/jss.v067.i01\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2015).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eR Core Team. R: A language and environment for statistical computing (Version 4.3.0) [Computer software]. R Foundation for Statistical Computing. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.R-project.org/\u003c/span\u003e\u003cspan address=\"https://www.R-project.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBaayen, R. H., Davidson, D. J. \u0026amp; Bates, D. M. Mixed-effects modeling with crossed random effects for subjects and items. \u003cem\u003eJ. Mem. Lang.\u003c/em\u003e \u003cb\u003e59\u003c/b\u003e, 390\u0026ndash;412 (2008).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBenjamini, Y. \u0026amp; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. \u003cem\u003eJ. R Stat. Soc. Ser. B (Methodol)\u003c/em\u003e. \u003cb\u003e57\u003c/b\u003e, 289\u0026ndash;300. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.2517-6161.1995.tb02031.x\u003c/span\u003e\u003cspan address=\"10.1111/j.2517-6161.1995.tb02031.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (1995).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eH\u0026auml;iki\u0026ouml;, T., Bertram, R., Hy\u0026ouml;n\u0026auml;, J. \u0026amp; Niemi, P. Development of the letter identity span in reading: Evidence from the eye movement moving window paradigm. \u003cem\u003eJ. Exp. Child. Psychol.\u003c/em\u003e \u003cb\u003e102\u003c/b\u003e, 167\u0026ndash;181 (2009).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcGeown, S., Goodwin, H., Henderson, N. \u0026amp; Wright, P. Gender differences in reading motivation: Does sex or gender identity provide a better account? \u003cem\u003eJ. Res. Read.\u003c/em\u003e \u003cb\u003e35\u003c/b\u003e, 328\u0026ndash;336. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.1467-9817.2010.01481.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1467-9817.2010.01481.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2012).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWigfield, A. \u0026amp; Guthrie, J. T. Relations of children's motivation for reading to the amount and breadth of their reading. \u003cem\u003eJ. Educ. Psychol.\u003c/em\u003e \u003cb\u003e89\u003c/b\u003e, 420\u0026ndash;432. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1037/0022-0663.89.3.420\u003c/span\u003e\u003cspan address=\"10.1037/0022-0663.89.3.420\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (1997).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBecker, M. \u0026amp; McElvany, N. The interplay of gender and social background. A longitudinal study of interaction effects in reading attitudes and behaviour. \u003cem\u003eBr. J. Educ. Psychol.\u003c/em\u003e \u003cb\u003e88\u003c/b\u003e, 529\u0026ndash;549. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/bjep.12199\u003c/span\u003e\u003cspan address=\"10.1111/bjep.12199\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBialystok, E., Luk, G., Peets, K. F. \u0026amp; Yang, S. Receptive vocabulary differences in monolingual and bilingual children. \u003cem\u003eBilingualism: Lang. Cogn.\u003c/em\u003e \u003cb\u003e13\u003c/b\u003e, 525\u0026ndash;531 (2010).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchrank, F. A. \u0026amp; Wendling, B. J. \u003cem\u003eThe Woodcock\u0026ndash;Johnson IV\u003c/em\u003e. \u003cem\u003eContemp. Intell. Assess.\u003c/em\u003e 383 (2018).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLaine, M. \u0026amp; Virtanen, P. WordMill lexical search program. \u003cem\u003eCent. Cogn. Neurosci. Univ. Turku\u003c/em\u003e (1999).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrysbaert, M. \u0026amp; LEXTALE_FR: A fast, free, and efficient test to measure language proficiency in French. \u003cem\u003ePsychol. Belg.\u003c/em\u003e \u003cb\u003e53\u003c/b\u003e, 23\u0026ndash;37 (2013).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Vocabulary proficiency, Reading development, Gender Gap, Home Language, Language background, Finnish","lastPublishedDoi":"10.21203/rs.3.rs-6448049/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6448049/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eVocabulary proficiency is a key predictor of reading development. However, vocabulary proficiency in school-age children is rarely assessed, especially in languages other than English. Moreover, because reading development differs depending on home language and gender, it is likely that these factors also influence the development of vocabulary proficiency. Here we report Finnish vocabulary proficiency of school-age children, examining its relationship with grade, gender, and home language. We utilize d-Lexize, a vocabulary test based on visual lexical decision, which we adapted from a previous test for adult L2 speakers. The test assesses vocabulary knowledge by accuracy and lexical retrieval speed through reaction time. Approximately 27,000 school children were tested in three experiments using different versions of d-Lexize. All experiments consistently show that vocabulary proficiency improves progressively from 3rd to 9th grade. The results also reveal an emerging gender gap: whereas girls perform equal to boys in the early stages, they exhibit a more extensive vocabulary and faster lexical retrieval in the later grades. Furthermore, the tests show that pupils from Finnish-only homes consistently outperform those from non-Finnish or mixed-language homes, with this gap widening over time. These results highlight the significance of language exposure and sociocultural factors during vocabulary development.\u003c/p\u003e","manuscriptTitle":"Assessing Vocabulary Skills of School Children Aged 9 to 15 in Finland: Tracking the Gender and Home Language Gap","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-13 01:40:42","doi":"10.21203/rs.3.rs-6448049/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-07-15T07:36:40+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-05-22T14:32:28+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"52798945549264128214951003147740718969","date":"2025-05-08T09:23:07+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"103057404623104758178064142263759437142","date":"2025-05-07T14:24:28+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-05-06T08:44:39+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-05-06T08:43:53+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-04-29T15:36:07+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-04-28T11:45:39+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-04-14T16:47:57+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c9ed9c3a-f4fd-460d-a6c0-ce6139bc1f95","owner":[],"postedDate":"May 13th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":48246396,"name":"Earth and environmental sciences/Environmental social sciences/Psychology and behaviour"},{"id":48246397,"name":"Biological sciences/Psychology"},{"id":48246398,"name":"Health sciences/Risk factors"}],"tags":[],"updatedAt":"2026-01-05T16:00:40+00:00","versionOfRecord":{"articleIdentity":"rs-6448049","link":"https://doi.org/10.1038/s41598-025-28902-w","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2025-12-29 15:56:59","publishedOnDateReadable":"December 29th, 2025"},"versionCreatedAt":"2025-05-13 01:40:42","video":"","vorDoi":"10.1038/s41598-025-28902-w","vorDoiUrl":"https://doi.org/10.1038/s41598-025-28902-w","workflowStages":[]},"version":"v1","identity":"rs-6448049","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6448049","identity":"rs-6448049","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00