Where Things Belong: The Development of Scene Knowledge in Childhood

doi:10.21203/rs.3.rs-6861119/v1

Where Things Belong: The Development of Scene Knowledge in Childhood

2025 · doi:10.21203/rs.3.rs-6861119/v1

preprint OA: closed

Full text JSON View at publisher

Full text 140,276 characters · extracted from preprint-html · click to expand

Where Things Belong: The Development of Scene Knowledge in Childhood | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Where Things Belong: The Development of Scene Knowledge in Childhood Dilara Deniz Türk, Daniela Bahn, Christina Kauschke, Melissa Le-Hoa Võ This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6861119/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Humans develop semantic and syntactic expectations about what objects typically appear where in everyday scenes. This study examined how children aged 6 to 10 process such scene-grammatical rules and how this ability develops. We assessed scene knowledge implicitly using two eye-tracking tasks: a free viewing and a visual search task featuring scenes with either consistent object placements or semantic/syntactic violations. We also measured explicit knowledge by asking children to furnish a dollhouse. Results showed that children looked less at consistent objects in the free viewing task. Our visual search task further revealed earlier fixations and faster reaction times to consistent objects. These results replicate previous findings in adults indicating more efficient processing and stronger expectations for objects placed consistent to scene grammar. Additionally, children who were more sensitive to syntactic violations in images showed greater accuracy in the dollhouse task. Scene knowledge grew more robust with age, as evidenced by shorter dwell times in the free viewing task and earlier fixations and faster responses to consistent objects in the visual search task. Together, these findings highlight the ongoing development of scene grammar in children and offer new insights into how implicit and explicit measures can tap into children’s visual cognition. Biological sciences/Psychology Biological sciences/Psychology/Human behaviour eye movements scene grammar semantics syntax cognitive development Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Introduction The organization of visual scenes allows us to predict where objects within these scenes will typically be located [1]. Even if we enter a house we have never been in before, we can easily predict that the soap will be located above the sink and the toilet paper next to the toilet. Like other processes we learn throughout our development, we learn these rules over time, as we interact with our environment. These rules guide us in our daily lives, facilitating object recognition, search processes, goal-directed behaviors, and our overall understanding of scenes [1, 2, 3, 4]. The main focus of this study was to examine the developmental trajectory of scene knowledge as reflected in attentional deployment and behavior. The scene grammar framework addresses the rules governing object placement within visual scenes (for a review, see [5]). Inspired by the distinction between semantics and syntax in language, we define scene semantics as the expectation of what objects should be present in a particular scene and scene syntax as the expectation of where those objects should be located [1, 3, 6, 7, 8]. The scene grammar framework is hierarchical in nature, where certain objects are associated with specific categories of scenes and within a scene tend to cluster in a meaningful and functional way, forming what we call “phrases.” Finally, within each phrase, large, static “anchor” objects provide cues about the location and identity of surrounding “local” objects that are positioned around these anchors [2, 9, 10]. Understanding how scene rules guide our interaction with everyday environments, as well as how this ability develops, is crucial for gaining insights into how we effortlessly orient ourselves using this knowledge. Studies with adults consistently show that they spend less time fixating on objects that conform to scene rules compared to those that violate them, for both semantic [7, 11, 12, 13, 14] and syntactic violations [7, 10, 11, 13, 14]. This "consistency effect" is well established in adults (for a review see [15]), but its developmental emergence in children is less studied. Recent studies have started to bridge this gap by investigating how infants and young children process scene rules. For example, research with 24-month-old toddlers has shown a similar consistency effect, with shorter dwell times on consistent targets compared to semantically inconsistent ones, though this effect was only observed when targets were highly salient [16]. Neural studies from our research group further demonstrated that scene rule violations are processed as early as two years of age [17], while by the age of four, children begin to show both implicit and explicit behavioral responses to semantic and syntactic violations [14]. Extending this trajectory, it has been showed that typically developing children aged 5 and 6 also exhibit this consistency effect as seen in lower looking time proportions, shorter first-pass dwell times, and shorter first fixation durations to consistent target objects [18]. In our recent work [6], we examined implicit and explicit scene grammar processing in children aged 6 to 10 years with and without developmental language disorder (DLD). While both groups showed inconsistency effects, age-related changes were more pronounced in typically developing children, who became increasingly sensitive to semantic and syntactic violations with age. In contrast, this developmental trend was less evident in children with DLD. All prior developmental studies have employed free viewing paradigms to assess the consistency effect. While informative, these paradigms capture only one aspect of visual behavior. The current study addresses these gaps by implementing both a free viewing task and a visual search task, and by covering a broader developmental window (ages 6 to 10). This allows us to examine the consistency effect across different viewing conditions and attentional demands, offering a more comprehensive picture of how scene grammar knowledge develops and manifests in childhood. To explore how children’s implicit and explicit behaviors reflect their understanding of scene organization and how this understanding improves with age, we build on previous work by our research group [14] with 2- to 4-year-olds and adults. In this previous work, a free viewing task featuring scenes with semantic or syntactic violations was implemented while tracking eye movements, as well as a dollhouse placement task where participants freely furnished a dollhouse. In their study, eye movements during the free viewing task served as an implicit measure of scene knowledge, allowing us to infer processing of violations without directly asking children if they noticed them. The time spent on these inconsistent objects indicates the processing of expectation violations, even if children do not explicitly point them out. For an explicit measure of scene knowledge, the dollhouse placement task provided a straightforward way to assess children’s understanding by asking them to furnish a dollhouse with predefined rooms, serving as a miniature version of a real-world house. The accuracy of the scene categorization of objects and the distance between correctly placed objects and their anchors were measured. This explicit measure indicates the children’s ability to access and apply their scene knowledge and reflects their understanding of how objects are organized in real-world scenes. We employed a similar methodology, including the free viewing and dollhouse placement tasks, but adding also a visual search task to measure implicit behavior through eye tracking and reaction time collection. We also refined our analysis by incorporating developmental tests into our models and enhancing the precision of our methods for extracting measures from the dollhouse tasks (see Methods for details). Our primary goal was to examine scene grammar knowledge in children by employing both implicit and explicit task-based assessments, while also exploring the relationship between these measures. Specifically, we used eye-tracking metrics (such as dwell time and first fixation) and placement accuracy in a dollhouse task as empirical measures of children’s ability to process objects based on scene-based semantic and syntactic rules. Additionally, we aimed at investigating whether and how the strength of scene-related expectations changes as children grow older. Developmental theories suggest that children gradually shift from object-based learning to more flexible, relational, and rule-based forms of reasoning as they grow older [19, 20] In line with this, research on category learning with rules revealed that the ability to assign items into rule-based categories continues to develop throughout the preschool and school years [21, 22]. Only around age 10 children typically reach adult-like proficiency in rule-based categorization and relational integration [22, 23].These findings are relevant to our framework: in essence, semantic scene knowledge involves the ability to categorize objects according to the type of scene they belong to (e.g., recognizing that a sink is typically found in a bathroom, not a bedroom). This mirrors classic category learning tasks in which children must learn to assign items to categories based on shared or rule-defined features. Syntactic scene knowledge, by contrast, requires children to go beyond simple categorization. It involves understanding the spatial and functional relationships between objects within a scene (e.g., that a a mirror is usually placed above a sink). Acquiring such knowledge relies on relational reasoning and rule learning — the ability to extract underlying patterns from repeated exposure to structured environments. Based on these, we anticipated observing a consistency effect in our age group with further increase in the strength of their scene knowledge as they age, and a positive relationship between our implicit and explicit measures. Methods Participants We recruited a total of 38 children, aged between 5.58 and 10.66 years ( M = 8.27). Of these, 17 were female and 21 were male. All participants had normal or corrected-to-normal vision with no neurological conditions, as confirmed by parent questionnaires. We only included children whose native language was German; thus, we excluded 2 bilingual children who did not fall within the range of typical language development in German. Additionally, 3 more children were excluded for not completing all tasks. This resulted in a final sample of 33 children, aged between 5.58 and 10.66 years ( M = 8.22, 16 female, 17 male). The children were recruited through collaborations with schools and public outreach initiatives at the universities of Frankfurt and Marburg (Hesse, Germany). Informed written consent was obtained from the parents before the study commenced. Children received small gifts as a token of appreciation, and parents were reimbursed for travel expenses. The study adhered to the Declaration of Helsinki and was approved by the local ethics committee. We designed tasks that incorporated both explicit and implicit measures of scene knowledge to identify indicators of scene grammar and to explore the developmental trajectory of scene comprehension. Implicit measure of scene knowledge: Free viewing task Apparatus We monitored children’s eye movements using the EyeLink 1000 Portable Duo system (SR Research, Kanata, Ontario, Canada), focusing on the left eye with a sampling rate of 500 Hz in remote mode. The stimuli were displayed on either a 17-inch laptop or a 24-inch monitor, both with a resolution of 1920 x 1080 pixels and a refresh rate of 144 Hz. Children were positioned about 70 cm from the screen. Stimuli From the SCEGRAM database [13], we selected 45 images including 15 semantically inconsistent, 15 syntactically inconsistent, and 15 consistent scenes (see Figure 1 for an example). We defined areas of interest (AOIs) that maintained consistent size across violations and AOI positioning of the objects was identical across semantically inconsistent and consistent scenes. We incorporated a 75-pixel buffer around each AOI to ensure precise eye-tracking, especially given the more lenient thresholds for children. We counterbalanced the images across participants, ensuring that each object was seen in only one condition per participant. To address any potential saliency effects, we assessed the mean saliency rank using DeepGaze IIE [24]. The results showed no significant differences in saliency between conditions [consistent: β = .96, SE = .078, z = -.556, p = .578; semantically inconsistent: β = .94, SE = .07, z = -.87, p = .384]. Procedure At the start of the experiment, we performed a 5-point calibration using an audio-visual target and conducted a drift check after every 10 trials. Children then completed 2 practice trials before proceeding to the main phase, which included 45 trials. Each trial commenced with an audio-visual fixation spiral positioned either on the left or right side of the screen. To trigger the presentation of the scene images, children needed to fixate on the spiral for a minimum of 500 milliseconds. The images were displayed for 7 seconds, and a reward video was shown randomly approximately every 2 images, lasting around 10 seconds each time (Figure 2). The children were instructed to view the scenes freely. Analysis We used total dwell time (DT) as our primary eye movement measure to explore children's responses to violations in scenes. Dwell time represents the cumulative duration of all fixations within a specific area of interest from the initial to the final visit, thus indicating the level of interest in that area. We excluded trials with a dwell time shorter than 100 ms from our analysis (1.59 %), as such brief durations are generally deemed inadequate for meaningful information processing [25]. We further analyzed the mean dwell time difference for each participant by subtracting their average dwell time on consistent scenes from their dwell time on inconsistent scenes. This difference reflected the actual sensitivity towards violations compared to consistent targets and allowed us to examine the effect of these differences on our other measures (see Results). Exploratory analyses of refixation counts—reflecting how often participants returned their gaze to the same area of interest (target objects)—are reported in Supplementary Information 1. Implicit measure of scene knowledge: Visual search task The visual search task was conducted either immediately before or after the free viewing task, with exceptions occurring only if technical or timing issues interrupted the planned session. In such cases, the task was rescheduled and conducted in a separate session. The sequence of these tasks was randomly counterbalanced. Apparatus We tracked children’s eye movements using the same eye-tracking equipment as in the free viewing study. Stimuli Forty images with objects placed in both expected and unexpected locations were selected from the BOiS - Berlin Object in Scene Database [26], with 20 images per condition (see Figure 3 for an example). This database includes a range of indoor and outdoor scenes, featuring target objects photographed from multiple angles and positioned either in expected or unexpected locations, maintaining a similar distance from the center of the scene in both cases. We chose everyday indoor scenes with familiar target objects for children as the stimuli. Similar to the procedure in the free viewing task, we maintained the same size for areas of interest (AOIs) across both unexpected and expected scenes. We also included 75-pixel buffer around each AOI to ensure precise eye-tracking. The order of conditions was randomized and counterbalanced across participants to ensure that every child only saw each object once, either in an expected or an unexpected location. Scene images were presented at a resolution of 1024 x 768 pixels, while target objects were displayed at 288 x 192 pixels. We evaluated the mean saliency rank using again DeepGaze IIE [24] to account for potential saliency effects. The analysis revealed no significant saliency differences between expected and unexpected objects ( β = -.052, SE = .034, z = -1.538, p = .128). Procedure The experimental setup was similar to the free viewing task. We began with a 5-point calibration using an audio-visual target and conducted a drift check every 10 trials. Participants completed 2 practice trials before proceeding to the main phase, which included 40 trials. Each trial started with an audiovisual fixation spiral, requiring at least 500 milliseconds of gaze to either the left or right side of the screen, followed by a 1-second presentation of the target object. Children were instructed to find the target object as quickly as possible within the preceding scene. After another gaze-contingent fixation spiral, a 10-ms blank screen was shown before the scene image appeared, which remained on the screen until the participant pressed the spacebar to indicate they had found the target object (Figure 4). To ensure participants were aware that target objects were always present, we included a gaze-contingent feedback: i.e., if participants fixated on the predefined area of the target object during the 1-second period immediately prior to their key press, they received a reward in the form of a cheer/applause video. If their gaze did not meet this criterion, a scene image with a red frame around the target object's location was shown as feedback. Analysis We selected reaction time and first fixation time as our primary measures in the visual search task. Reaction time indicated how long it took participants to find and identify the target objects (from scene onset until button press), whereas first fixation time reflected how effectively their scene knowledge guided their search (from scene onset until first target fixation) and whether it sped up the process for expected targets, with earlier fixations on expected target locations suggesting a better understanding of where objects are found in the scene. We excluded trials that deviated by ±2 seconds from the mean reaction time ( M = 7.19 sec, 4.7 %). Additional measures, including decision time and accuracy, are reported in Supplementary Information 1 to provide a comprehensive overview of all recorded task metrics. Explicit measures of scene knowledge: Dollhouse task Procedure The dollhouse task functioned as an explicit measure of children’s scene knowledge. Children were asked to place objects in a wooden dollhouse (Nic Spiel + Art GmbH, Laupheim, Germany) in the way they found most fitting. The dollhouse, consisting of two floors and four rooms, each measuring 31 cm by 40 cm, was empty except for one defining, i.e. diagnostic, object per room: a bed in a bedroom, a shower in the bathroom, an oven/stove in the kitchen, and a sofa in the living room. Each child was then provided with remaining 57 objects that were to be placed within the dollhouse. Children were not given a time limit and were asked to place all the objects to the best of their knowledge. The dollhouse task was always given as the final task if no exception occurred. Dollhouse semantics When selecting diagnostic objects to help establish clear scene categories when beginning the dollhouse task, we followed the approach described in [14], where one highly diagnostic (anchor) object was placed in each room. These diagnostic objects were chosen based on their strong scene associations, as quantified by real-world image statistics [27], characterizing the degree to which an item reliably signaled a specific scene category. For instance, while a fridge provides information about the scene being a kitchen, the kitchen scene also strengthens the expectation that a fridge should be present—creating a reciprocal relationship of strong contextual cues. On the other hand, a book can be found across various scene categories, e.g. the bedroom, the living room or the kitchen, making it less straightforward to assign it to a specific scene. Based on these criteria, we selected a bed, a stove, a shower, and a sofa as diagnostic objects for the bedroom, kitchen, bathroom, and living room, respectively. We assessed semantic knowledge by measuring how accurately children placed the remaining objects into their designated room categories. Object-room-assignment was again primarily based on image statistics reported in [27], which quantify how frequently objects appear in specific scenes. In addition to these data-driven assignments, we also relied on informed researcher intuition to finalize the categorization of certain objects — particularly those not clearly diagnostic or those that could reasonably appear in more than one room. Objects that were relevant to multiple rooms (e.g., a pillow) were considered correct if placed in any of those rooms (e.g., both bedroom and living room). Additionally, if children misidentified objects (e.g., misidentifying a nightstand as a stool) and placed them in the room corresponding to the object they thought it resembled (e.g., a stool in the living room), we still counted these placements as correct, provided that the placement was appropriate for the object they mistook it for. A full list of objects and their assigned scenes is provided in Supplementary Information 2. Dollhouse syntax For the syntactic analysis, we drew on the concept of scene phrases—structures within scenes composed of anchor and local objects—based on previously established spatial-functional groupings [2, 10]. While some of our anchor–local object selections were directly inspired by the phrases proposed by the previous research of our group [9], others were guided by our own scene-based intuitions. This combined approach allowed us to define meaningful spatial and functional relationships within each room of the dollhouse. We assessed syntactic knowledge in the dollhouse task by measuring the distances between predefined anchor objects and local objects within phrases of the scenes. To this end, we first performed a 3D scan of each child's completed dollhouse to achieve this. These scans were then imported into Unity (Unity Technologies, 2023, Version 2021.3.18f1), where we positioned pre-scanned models of the dollhouse objects inside the scanned 3D dollhouse model exactly as the participants had placed the real objects. This approach ensured maximal precision in our 3D measurements. Once all objects were positioned, we used a custom Unity script to create a matrix showing the distances between the centers of each object. Our analysis focused exclusively on objects placed in the correct room categories, with particular attention to the predefined anchor and local objects (12 anchor (19.67%) and 44 local (72.13%) objects). A full list of anchor and local objects is provided in Supplementary Information 2. Developmental Tests To account for the influence of children’s developmental cognitive skills on task performance, we administered both visual development and non-verbal intelligence tests. For visual development, we used the FEW – 3 (Frostigs Entwicklungstest der visuellen Wahrnehmung, German adaption of Developmental Test of Visual Perception – DTVP-3, [28]), which includes tasks such as identifying shapes within complex visual backgrounds, mentally completing incomplete forms, and detecting similarities between different shapes. For non-verbal intelligence, we administered the CPM (Colored Progressive Matrices, [29]), where children completed patterns by selecting the correct piece to fit into a colored sequence. We included raw scores from both tests in our models to also account for age as a covariate in our models. However, to enhance interpretability for the reader, we report here the mean percentile ranks: the mean percentile for the FEW-3 was 56.6 (range = 2–98), and for the CPM was 61.4 (range = 6–100). Statistical Analysis We made sure that objects were not repeated across trials in the free-viewing task; however, a subset of scenes was presented more than once. Since scene repetition can produce potential priming effects in children’s eye-tracking behavior, such as reduced exploration, we included only the first presentation of each scene per participant in our final analyses. This led to the exclusion of 14% of trials from the free viewing eye-tracking dataset. For completeness, we report the results of the full model—including also the repeated trials, with repetition order modeled as a random slope—in the Supplementary Information 3. We conducted our statistical analysis using generalized linear mixed models (GLMMs). GLMMs enabled us to incorporate random effects, accounting for variations in both the saliency ranks of the scenes and individual differences among participants. This approach enabled us to model our data at the trial level while appropriately handling non-normal distributions. GLMMs were particularly suited for our binomial outcome measures—such as correct vs. incorrect object placement in the dollhouse task—as well as for continuous outcomes that violated the assumptions of linear mixed models (LMMs), including skewed or log-normal response variables. The analyses were carried out in the R environment (version 2023.09.1) using the lmer function from the lme4 package [30]. We calculated p-values using the lmerTest package [31], applying Satterthwaite’s approximation for determining degrees of freedom. For the free viewing task models, we included dwell time as the dependent variable, and violation type (with levels of consistent, semantically inconsistent, and syntactically inconsistent) as the fixed factor (predictor). In the visual search task, our dependent variables were first fixation time and reaction time, with violation type (with the levels of expected vs. unexpected target locations) as the key fixed factor (predictor). For the dollhouse task, the dependent variables were object placement accuracy (%) for dollhouse semantics, and the mean distance between anchor and local objects for dollhouse syntax. Further, we sought to understand the relationship between our implicit measures in the free viewing and search tasks (such as dwell time, first fixation time, and reaction time) and explicit measures of performance in the dollhouse task (object placement accuracy and distance between related objects). Specifically, we analyzed how dwell time on semantically inconsistent scenes in the free viewing task related to object placement in the dollhouse task (dollhouse semantics), and how dwell time on syntactically inconsistent scenes was associated with mean distance between related objects (dollhouse syntax). Given the syntactic aspect of the inconsistently placed targets in the visual search task, we also examined the relationship between visual search task metrics (first fixation time, reaction time) and dollhouse syntax, enabling us to both examine relationships between explicit-explicit and explicit-implicit measures. Finally, we ran models with our main measures (dwell time on violations, reaction time on violations, first fixation time on violations, object placement accuracy, and distance between related objects) and included age as a continuous predictor to examine the developmental trajectory of scene knowledge. To account for individual variability, participants were consistently included as a random factor in all models. We also included random intercepts and slopes for the saliency ranks of each scene, allowing both baseline responses and the effects of saliency to differ across scenes. This approach provided a more nuanced model, accommodating variations in both the baseline responses and the predictor effects across different scenes. When age was not a predictor, it was included as a covariate in our models, alongside developmental test scores from the CPM and FEW tests, which were incorporated based on their potential contribution to the model fit and impact on the response variables. Although both CPM and FEW test scores were collected, only scores from the FEW were included in the final models, as the CPM either led to convergence issues or did not improve model fit. Continuous predictors, such as age, FEW scores and the mean distance between related objects, were scaled (z-transformed) prior to analysis to improve model convergence and interpretability of fixed effects. Results Implicit measures Results from our model for the free viewing task revealed a significant effect of object consistency on children’s dwell time ( β = -0.136, SE = 0.06, z = -2.293, p = .022) (Figure 5a). Specifically, children spent less time looking at consistent objects compared to semantically inconsistent objects ( ratio = 0.747, SE = 0.077, z ratio = -2.814, p = .014), indicating a sensitivity to semantic violations. However, this difference was not observed for syntactically inconsistent objects ( ratio = .888, SE = 0.09, z ratio = -1.152, p = .482). There was also no significant difference between the dwell time on semantic violations and syntactic violations ( ratio = 1.188, SE = 0.123, z ratio = -1.662, p = .22). In the visual search task, children directed their first fixation on unexpectedly placed target objects significantly later compared to expectedly placed targets ( β = 0.288, SE = 0.082, z = 3.515, p < .001). Additionally, children reacted significantly slower to unexpectedly placed objects than expectedly placed ones ( β = 0 .146, SE = 0 .043, z = 3.387, p = < .001) (Figure 5b). To summarize the results from the implicit measures, children in this age range looked less at semantically inconsistent objects in the free viewing task and located expectedly placed objects more quickly in the visual search task. Relationship between implicit and explicit measures Explicit measures obtained from the dollhouse task revealed that children had a mean correct placement performance of 84% ( SD = 6.9%, range = 60%–93%). On average, they placed local objects at a distance of 0.29 meters from their anchors ( SD = 0.075 m, range = 0.03 m–0.54 m). Further analyses were conducted to examine the relationship between these explicit measures and our implicit measures. Dwell time on semantic violations did not significantly relate with dollhouse semantics ( β = -0.874, SE = 0.849, z = -1.080, p = .280). Also, dwell time on syntactic violations did not significantly predict the mean distance between related objects in the dollhouse task ( β = -4.153, SE = 3.423, z = -1.213, p = .225). Further analysis revealed a significant relationship showing that as the mean difference in dwell time between syntactically inconsistent and consistent targets increased, the mean distance between related objects decreased ( β = -197.57, SE = 96.44 z = -2.049, p = .049). In the visual search task, neither first fixation time on unexpected targets nor on expected targets predicted dollhouse syntax (unexpected: β = -.023, SE = 0.044, z = -.052, p = .603; expected: β = 0.037, SE = 0.034, z = 1.108, p = .268) (Figure 6a). The same insignificant pattern revealed itself for reaction time (unexpected: β = -.039, SE = 0.035, z = -1.13, p = .258; expected: β = 0.022, SE = 0.036, z = 0.606, p = .545) (Figure 6b). Together, these results suggest that syntactic dwell time in the free viewing task was significantly related to explicit syntactic scene knowledge—but only when the dwell time difference between syntactically inconsistent and consistent targets was considered. No other implicit measures—including those from the visual search task—showed significant associations with the explicit measures derived from the dollhouse task. Developmental trajectory of scene knowledge In the free viewing task, we observed that children showed reduced dwell time for consistent targets as age increased ( β = -0.66, SE = 0.03, z = -2.188, p = .028, r = -0.089). For semantic and syntactic violations, dwell time was not significantly related to age (SEM: β = 0.043, SE = .035, z = 1.222, p = .222, r = 0.108; SYN: β = 0.023, SE = 0.04, z = 0.564, p = .573, r = 0.037) (Figure 7a). A significant positive relationship emerged when examining the differences in dwell time between violations. Simple linear models indicated that the difference in mean dwell time between semantically inconsistent and consistent targets significantly increased with age ( β = 171.98, SE = 78.85, t = 2.181, p = .037) (Figure 7b). The difference in mean dwell time between syntactically inconsistent and consistent targets did not show a significant relationship with age ( β = 98.10, SE = 62.08, t = 1.580, p = .124) (Figure 7c). In the visual search task, we found a significant decrease in first fixation time to unexpected targets with age ( β = -0.09, SE = .042, z = -2.124, p = .034) and even more so for expected targets ( β = -.125, SE = .034, z = -3.646, p < .001; see Figure 8a). Age also predicted reaction time to unexpected and expected targets, showing that reaction time to both target types decreased with age (unexpected: β = -0.154, SE = .034, z = -4.514, p < .001; expected: β = -.162, SE = .035, z = -4.634, p < .001) (Figure 8b). In the dollhouse task, we observed a trend suggesting that object placement performance improves with age ( β = .096, SE = .052, z = 1.854, p = .064) (Figure 8c). However, no significant relationship was found between age and the distance between related objects ( β = -.002, SE = .002, t = -.837, p = .414) (Figure 8d). Finally, FEW test scores that were included in the models as covariates to account for individual differences in visual perception did not show significant effects on the dependent variables. Full model outputs are provided in Supplementary Information 4. In summary, these results suggest that the implicit use of scene grammar knowledge continues to strengthen with age, as evidenced by faster attentional disengagement of consistent targets and more efficient search strategies for expected targets in older children. However, the results from the dollhouse task did not reveal a significant improvement in children's active use of explicit scene knowledge between the ages of 6-10. Discussion In the present study, we aimed to investigate how scene knowledge manifests itself and develops both implicitly and explicitly in children aged 6 to 10 years, while also examining the relationship between these two types of measures. Our eye-tracking results indicated that children in this age range exhibited a consistency effect in response to semantic scene violations, and effectively utilized their scene knowledge during the visual search task. Furthermore, children who displayed strong syntactic scene knowledge in the free viewing task also tended to perform better in placing objects according to syntactic rules during the dollhouse task. Our findings also illustrated the ongoing developmental process of scene knowledge in children, as evidenced by a growing consistency effect in both free viewing and visual search tasks. The results of this study contribute to our understanding of the developmental trajectories of scene grammar knowledge by expanding the age range previously investigating the consistency effect in scenes. Prior research using brain recordings has shown that children can process scene violations as early as two years old [ 17 ], while the implicit manifestation of this effect can be observed in their eye movements by age four [ 14 ]. Our study complements these findings by demonstrating that - as expected - the consistency effect remains robust in children aged 6–10 while providing further evidence that suggests continued development across this age range. Moreover, by incorporating a visual search task into our methodology, we provide a broader toolbox including a diverse set of ecologically valid tasks that capture different facets of children’s scene knowledge. In the free viewing task, shorter dwell times on consistent objects suggested that children required less processing time for scene-congruent elements — indicating fluent use of their scene-based expectations. In contrast, the visual search task allowed us to assess how quickly children could locate expected objects, with faster fixations reflecting more efficient guidance of attention based on scene grammar. Together, these measures demonstrate that children not only process expected objects more fluently but also actively use their scene knowledge to navigate structured environments more efficiently. While previous studies have demonstrated a consistency effect also for syntactic violations during a free viewing task as early as age 4 [ 14 ] and between 5–6 years [ 18 ], we only found a consistency effect for semantic violations. The study in [ 14 ] employed a between-subjects design, in which each child was presented with either semantically or syntactically inconsistent scenes. In contrast, our approach of presenting both types of violations within the same experiment may have influenced the processing of violations thereby altering the children's gaze behavior toward syntactic violations. Moreover, the consistency effect identified in [ 18 ] intriguingly interacted with saliency, revealing that children exhibited longer dwell times on syntactically inconsistent objects with low saliency. As it is typically expected that visual salience drives gaze more strongly than contextual cues [ 32 , 33 ], the authors interpreted these findings from 5- to 6-year-old children as suggesting that their syntactic knowledge may not be as robustly developed as their semantic knowledge, potentially undermining the reliability of these results. Consistent with this interpretation, in our previous study [ 6 ], we found a syntactic inconsistency effect only when dividing the typically developing sample into two age groups: only older children (ages 8.92–10.83, n = 11) demonstrated such an effect, and even then, it was less prominent than the effect for semantic inconsistencies. Given these findings, it is not surprising that we could not fully replicate previous studies. Nonetheless, despite the differences in task instructions and motivations between the visual search task and traditional free viewing tasks, the presence of syntactic violations in the visual search task—and children’s later first fixation times on unexpected objects—suggest that such violations disrupt their search behavior, indicating sensitivity to syntactic scene knowledge. However, further research is needed to clarify specifically the developmental trajectory of processing syntactic rules in scenes. A distinctive aspect of this study was our examination of the relationship between implicit and explicit measures of scene knowledge, building on and extending previous work presented in [ 14 ]. Our results demonstrated that dwell time difference between syntactically inconsistent objects and consistent objects during the free viewing task was a significant predictor of the dollhouse syntax measure, replicating previous findings. Our results are also in line with those reported in [ 14 ], where no significant relationship was observed between dollhouse semantics and dwell time on semantically inconsistent objects in children aged 2–4 years; similarly, we did not find a significant relationship in our sample of children aged 6–10 years. The authors of [ 14 ] suggested that this absence of an implicit-explicit correlation in semantic knowledge might stem from differences in task demands—whether or not explicit instructions were given—and the absence of the scene gist advantage in the dollhouse task. We concur with this interpretation and bolster it with our own findings which showed a significant syntax-performance link—but an absent semantic link. Additionally, an important factor to keep in mind is that our dollhouse syntax measure only holds when children place objects in the correct room, requiring both semantic and syntactic knowledge. In contrast, dollhouse semantics might be more affected by task-specific factors. Research shows that instructions can significantly influence children’s categorization strategies [ 34 , 35 , 36 , 37 ]. We also noticed during testing that children sometimes struggled to recognize objects due to the absence of explicit labeling—a deliberate choice to avoid interfering with their object/scene knowledge. Research shows that labels enhance object categorization in infants and children [ 38 , 39 ]. Additionally, understanding an object’s function also plays a role [ 40 ]. Future studies might refine the dollhouse task with more explicit instructions to better capture scene knowledge and control for object recognition. Moreover, examining the effects of naming—specifically, whether labeling objects during scene construction facilitates placement performance—could provide important insights into how linguistic processes interact with the application of scene grammar. Lastly, the positive relationship between dwell time differences for syntactic violations in the free viewing task and children's dollhouse syntax performance suggests a link between children's implicit and explicit syntactic scene knowledge. This finding shows that children apply their syntactic scene knowledge both when processing objects in structured scenes and when constructing scenes themselves. Thus, we extend previous research by demonstrating that children’s sensitivity to syntactic rules is consistent across tasks requiring either passive exploration of structured scenes or active scene assembly. While our previous study [ 6 ] established initial evidence of scene grammar processing in children with and without DLD, the present study expands on this work by including a larger sample of typically developing participants, introducing a visual search task, systematically analyzing age-related effects, and directly examining the relationship between implicit and explicit scene grammar measures. Building on those findings, the current results provide a more detailed picture of how scene knowledge continues to refine ages between 6 to 10. Specifically, our analyses on children’s eye movements revealed that as children age, they spend less and less time particularly on consistent objects in the free viewing task and they direct their gaze to these objects earlier in the visual search task. This shift indicates a growing robustness in children's scene knowledge, as they increasingly utilize their existing knowledge by allocating less attention to expected or typical objects and adjusting their search strategies according to their scene-rule-based expectations to make search more efficient. Our results on the developmental trajectory of scene knowledge align with existing research on rule learning which suggests that by age four, children begin developing rule-based categorization skills—such as for spatial relations, perceptual similarities, or information-integration categories—ultimately progressing to more complex, higher-order rules as their executive functions mature [ 21 , 22 , 41 ]. Additionally, studies indicate that children's capacity for complex cognitive processing, such as spatial clustering and relational integration, gradually increases starting around age five [ 42 , 43 ]. Our results affirm that the development of rule-based categorization and relational integration within the context of scene knowledge is also ongoing between the ages of 6 and 10, and they also highlight the strengthening of the consistency effect, which was found to emerge at age four in [ 14 ]. Our study had its limitations despite methodological refinements. With only 33 children in our dataset and a within-subject design including many different tasks, this sample size may not have been sufficient to establish clear relationships across all measures. Additionally, the inconclusive results regarding the implicit-explicit measures suggest that further refinements to the dollhouse task are warranted. Future studies could consider creating 3D apartments with more realistic, real-world objects to address issues related to the recognition of objects. However, careful attention should also be paid to task instructions to capture the real understanding of scene knowledge. In conclusion, the present study contributes to the literature by examining both implicit and explicit measures of scene knowledge in children aged 6–10 years. Our findings demonstrate that children show increased sensitivity to both semantic and syntactic scene violations as they grow older. Importantly, we found a link between children's eye movement behavior during implicit tasks and their ability to apply syntactic scene rules explicitly in the dollhouse task. These results highlight a developmental progression in children's ability to extract and apply scene grammar rules, suggesting an emerging integration of visual attention and conceptual understanding. We hope this work stimulates further research on the development of scene grammar, utilizing larger sample sizes and more refined methodologies. Declarations Acknowledgements We are grateful to all the children and parents who took part in this study. We also sincerely thank Theresa Henke, Judith Hollnagel, Ronja Schnellen, and Jonathan Mader for their assistance with data collection. Funding This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—project number 222641018—SFB/TRR 135 (Projects C3 and C7) and the Hessisches Ministerium für Wissenschaft und Kunst (HMWK; project ‘The Adaptive Mind’). Author contributions Conceptualization, D.D.T., D.B., C.K., and M.L.-H.V.; methodology, D.D.T., D.B., C.K., and M.L.-H.V.; software, D.D.T., D.B., C.K., and M.L.-H.V.; validation, D.D.T., D.B., C.K., and M.L.-H.V.; formal analysis, D.D.T. and D.B.; investigation, D.D.T. and D.B.; and resources, C.K., and M.L.-H.V.; data curation, D.D.T. and D.B.; writing—original draft preparation, D.D.T. and D.B.; writing—review and editing, D.D.T., D.B., C.K., and M.L.H.V.; visualization, D.D.T., D.B., C.K., and M.L.-H.V.; supervision, C.K. and M.L.-H.V.; project administration, C.K., and M.L.-H.V.; funding acquisition, C.K., and M.L.-H.V. All authors have read and agreed to the published version of the manuscript. Competing interests The authors declare no competing interests. Data availability All data used in this study are publicly available and can be accessed via https://osf.io/a39gr/ References Biederman, I., Mezzanotte, R. J. & Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology 14 , 143–177. Draschkow, D., & Võ, M. L.-H. (2017). Scene grammar shapes the way we interact with objects,strenghtens memories, and speeds search. Scientific Reports,7 (1), 16471. https://doi.org/10.1038/s41598-017-16739-x pdf Võ, M. L.-H., & Wolfe, J. M. (2013a). The interplay of episodic and semantic memory in guiding repeated search in scenes. Cognition, 126 (2), 198-212. https://doi.org/10.1016/j.cognition.2012.09.017 Võ, M. L.-H. , & Wolfe, J. M. (2013b). Differential electrophysiological signatures of semantic and syntactic scene processing. Psychological science, 24 (9), 1816-1823. https://doi.org/10.1177/0956797613476955 Võ, M. L.-H. (2021). The Meaning and Structure of Scenes. Vision Research, 181 , 10-20. https://doi.org/10.1016/j.visres.2020.11.003 pdf Bahn, D., Türk, D. D., Tsenkova, N., Schwarzer, G., Võ, M. L. H., & Kauschke, C. (2025). Processing of Scene Grammar Inconsistencies in Children with Developmental Language Disorder Insights from Implicit and Explicit Measures. Brain Sciences, 15 (2). https://doi.org/10.3390/brainsci15020139 Võ, M. L.-H., & Henderson, J. M. (2009). Does gravity matter? Effects of semantic and syntactic inconsistencies on the allocation of attention during scene perception. Journal of Vision, 9 (3), 24. https://doi.org/10.1167/9.3.24 Võ, M. L.-H., & Henderson, J. M. (2011). Object–scene inconsistencies do not capture gaze: Evidence from the flash-preview moving-window paradigm. Attention, Perception, & Psychophysics, 73 (6), 1742–1753. https://doi.org/10.3758/s13414-011-0150-6 Turini, J., & Võ, M. L.-H. (2022). Hierarchical organization of objects in scenes is reflected in mental representations of objects. Scientific Reports, 12 (1), 20068. https://doi.org/10.1038/s41598-022-24505-x pdf Võ, M. L.-H., Boettcher, S. E., & Draschkow, D. (2019). Reading scenes: How scene grammar guides attention and aids perception in real-world environments. Current Opinion in Psychology , 29 , 205-210. https://doi.org/10.1016/j.copsyc.2019.03.009 pdf De Graef, P., Christiaens, D., & d’Ydewalle, G. R. (1990). Perceptual effects of scene context on object identification. Psychological Research Psychologische Forschung, 52, 317–329. Henderson, J. M., Weeks, P. A., Jr., & Hollingworth, A. (1999). The effects of semantic consistency on eye movements during complex scene viewing. Journal of Experimental Psychology: Human Perception and Performance, 25 , 210–228 Öhlschläger, S., & Võ, M. L.-H. (2017). SCEGRAM: An image database for semantic and syntactic inconsistencies in scenes. Behavior research method s, 49 (5), 1780-1791. https://doi.org/10.3758/s13428-016-0820-3 pdf Öhlschläger, S., & Võ, M. L.-H. (2020). Development of scene knowledge: Evidence from explicit and implicit scene knowledge measures. Journal of experimental child psychology , 194 , 104782. https://doi.org/10.1016/j.jecp.2019.104782 pdf Lauer, T., & Võ, M. L.-H. (2022). The ingredients of scenes that affect object search and perception. In B. Ionescu, W. A. Bainbridge, & N. Murray (Eds .), Human perception of visual information (pp. 1–32). Springer. https://doi.org/10.1007/978-3-030-81465-6_1 Helo, A., van Ommen, S., Pannasch, S., Danteny-Dordoigne, L., & Rämä, P. (2017). Influence of semantic consistency and perceptual features on visual attention during scene viewing in toddlers. Infant behavior & development , 49 , 248–266. https://doi.org/10.1016/j.infbeh.2017.09.008 Maffongelli, L., Öhlschläger, S., & Võ, M. L.-H. (2020). The development of scene semantics: First ERP indications for the processing of semantic object-scene inconsistencies in 24-month olds. Collabra: Psychology , 6(1), 17707, 1-8. https://doi.org/10.1525/collabra.17707 pdf Helo, A., Guerra, E., Coloma, C. J., Aravena-Bravo, P., & Rämä, P. (2022). Do Children With Developmental Language Disorder Activate Scene Knowledge to Guide Visual Attention? Effect of Object-Scene Inconsistencies on Gaze Allocation. Frontiers in psychology , 12 , 796459. https://doi.org/10.3389/fpsyg.2021.796459 Holyoak, K. J., Junn, E. N., & Billman, D. (1984). Development of analogical problem-solving skill. Child Development, 55 , 2042–2055. Gentner, D., & Medina, J. (1998). Similarity and the development of rules. Cognition , 65 (2-3), 263-297. Huang-Pollock, C. L., Maddox, W. T., & Karalunas, S. L. (2011). Development of implicit and explicit category learning. Journal of experimental child psychology , 109 (3), 321-335. Rabi, R., & Minda, J. P. (2014). Perceptual category learning: Similarity and differences between children and adults. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 36, No. 36). Hammer, R., Diesendruck, G., Weinshall, D., & Hochstein, S. (2009). The development of category learning strategies: What makes the difference?. Cognition , 112 (1), 105-119. Linardos, A., Kümmerer, M., Press, O., & Bethge, M. (2021, October). DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling . In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/ICCV48922.2021.01268 Tullis, T., & Albert, B. (2008). Performance metrics. In T. Tullis & B. Albert (Eds.), Measuring the user experience (pp. 63–97). Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-373558-4.00004-2 Mohr, J., Seyfarth, J., Lueschow, A., Weber, J. E., Wichmann, F. A., & Obermayer, K. (2016). BOiS—Berlin object in scene database: Controlled photographic images for visual search experiments with quantified contextual priors. Frontiers in Psychology , 7 , 749. Greene, M. R. (2013). Statistics of high-level scene context. Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00777 Büttner, G., Dacheneder, W., Schneider, W., & Hasselhorn, M. (2021). Frostigs Entwicklungstest der visuellen Wahrnehmung–3 (1st ed.). Pearson. Bulheller, S., & Hacker, H. (2001). CPM – Colored Progressive Matrices (3rd ed.). Pearson. Bates, D., Mächler, M., Bolker, B. M., & Walker, S. C. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67 (1), 1–48. Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82 (13). https://doi.org/10.18637/jss.v082.i13. Tatler, B. W., & Vincent, B. T. (2008). Systematic tendencies in scene viewing. Journal of Eye Movement Research, 2 (2), 1–18. Parkhurst, D., Law, K., & Niebur, E. (2002). Modelling the role of salience in the allocation of visual selective attention. Vision Research, 42 (1), 107–123. Bauer, P. J., & Mandler, J. M. (1989). Taxonomies and triads: Conceptual organization in one-to two-year-olds. Cognitive Psychology , 21 (2), 156-184. Davidson, D., Rainey, V. R., Vanegas, S. B., & Hilvert, E. (2018). The effects of type of instruction, animacy cues, and dimensionality of objects on the shape bias in 3‐to 6‐year‐old children. Infant and Child Development , 27 (1), e2044. Freund, L. S., Baker, L., & Sonnenschein, S. (1990). Developmental changes in strategic approaches to classification. Journal of Experimental Child Psychology, 49 , 343–362. Ratner, H. H., & Myers, N. A. (1981). Long-term memory and retrieval at ages 2, 3, 4. Journal of Experimental Child Psychology, 31 , 365–386. Pomiechowska, B., & Gliga, T. (2019). Lexical acquisition through category matching: 12-month-old infants associate words to visual categories. Psychological Science , 30 (2), 288-299. Waxman, S. R., & Braun, I. (2005). Consistent (but not variable) names as invitations to form object categories: New evidence from 12-month-old infants. Cognition , 95 (3), B59-B68. Booth, A. E., & Waxman, S. R. (2002). Object names and object functions serve as cues to categories for infants. Developmental Psychology, 38 (6), 948–957. Quinn, P.C. (2004). Spatial representation by young infants: Categorization of spatial relations or sensitivity to a crossing primitive? Memory & Cognition, 32 , 852-861. Plumert, J. M., & Strahan, D. (1997). Relations between task structure and developmental changes in children's use of spatial clustering strategies. British Journal of Developmental Psychology , 15 (4), 495-514. Richland, L. E., Morrison, R. G., & Holyoak, K. J. (2006). Children’s development of analogical reasoning: Insights from scene analogy problems. Journal of experimental child psychology , 94 (3), 249-273. Additional Declarations No competing interests reported. Supplementary Files SupplementaryInformationTurkBahnKauschkeVo.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6861119","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":473124326,"identity":"cd6c3e3b-daf6-4484-8b1f-74440a6f3ebb","order_by":0,"name":"Dilara Deniz Türk","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABNElEQVRIie2QMUvDQBSAXziIDtdkvRC0f+FKQBH/TEIgWa4idMkgEpebdI9Q8C/opAWHk4O6HHYtuLSLU0Wli4KKubTFSkJdBfMND+69++7eewA1NX8RUUR/U0ekgwnGSGfsooCrlOKe7y0piOqMk+rzCiVIFwr8pljC7pOX6yi+tFVrug9yzwJkPr9dRYTeDwRMkpLiCGQ62QNr9zLmuRnITt4YOj1RjNC7EIyuKik0V1wskvb5EG8hDDLgzScJDZ4cUoUANXi18i6SmA7UXMl/MT54QmbKZ7UCgvlUsG8lf5zNlbQ8i0TezrGIWr0s6riYxjNlg0fEUYjedPvljd0ejYevImxu2/JiipPd4CzNG3vkIbGUMR5NDspbRosOYZ3q+BNRFpaGgrXRqnpNTU3NP+YL8bNiGsLT6+kAAAAASUVORK5CYII=","orcid":"","institution":"Goethe University Frankfurt","correspondingAuthor":true,"prefix":"","firstName":"Dilara","middleName":"Deniz","lastName":"Türk","suffix":""},{"id":473124327,"identity":"e420492e-6766-4848-9683-6c1321970366","order_by":1,"name":"Daniela Bahn","email":"","orcid":"","institution":"Philipps University of Marburg","correspondingAuthor":false,"prefix":"","firstName":"Daniela","middleName":"","lastName":"Bahn","suffix":""},{"id":473124328,"identity":"463a67af-50f4-4114-82f5-92e1a4c31802","order_by":2,"name":"Christina Kauschke","email":"","orcid":"","institution":"Philipps University of Marburg","correspondingAuthor":false,"prefix":"","firstName":"Christina","middleName":"","lastName":"Kauschke","suffix":""},{"id":473124329,"identity":"26e00278-47a1-4a48-a46c-53623d152736","order_by":3,"name":"Melissa Le-Hoa Võ","email":"","orcid":"","institution":"Ludwig-Maximilians-Universität München","correspondingAuthor":false,"prefix":"","firstName":"Melissa","middleName":"Le-Hoa","lastName":"Võ","suffix":""}],"badges":[],"createdAt":"2025-06-10 08:53:43","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-6861119/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6861119/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":85362763,"identity":"762ebc5f-08f8-45b7-968d-3533a9d2b287","added_by":"auto","created_at":"2025-06-25 06:23:06","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":158601,"visible":true,"origin":"","legend":"\u003cp\u003eExamples from SCEGRAM database\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eNote.\u003c/em\u003e (\u003cstrong\u003ea\u003c/strong\u003e) Ketchup in the door shelf of the fridge = consistent; (\u003cstrong\u003eb\u003c/strong\u003e) Shower gel in the door shelf of the fridge = semantically inconsistent; (\u003cstrong\u003ec\u003c/strong\u003e) Ketchup by the vegetable drawers in the fridge = syntactically inconsistent.\u003c/p\u003e","description":"","filename":"Figure1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6861119/v1/0af7c5a87cfcd61133a180b1.jpeg"},{"id":85362762,"identity":"cbf666aa-9d7e-4104-8e1a-ebf1dd9046f7","added_by":"auto","created_at":"2025-06-25 06:23:06","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":62591,"visible":true,"origin":"","legend":"\u003cp\u003eTrial sequence of free viewing task\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eNote.\u003c/em\u003e Size of the visual stimulus is adjusted in the figure for better visibility.\u003c/p\u003e","description":"","filename":"Figure2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6861119/v1/45733eb58ea795d62e4727d0.jpeg"},{"id":85364291,"identity":"eddb6812-cc4b-48fa-acf0-f6f0ccda5cac","added_by":"auto","created_at":"2025-06-25 06:31:06","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":139385,"visible":true,"origin":"","legend":"\u003cp\u003eExamples from BOiS database\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eNote. \u003c/em\u003e(\u003cstrong\u003ea\u003c/strong\u003e) Water heater on top of the kitchen counter = expected (CON); (\u003cstrong\u003eb\u003c/strong\u003e) Water heater on the floor = unexpected (SYN)\u003c/p\u003e","description":"","filename":"Figure3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6861119/v1/6ba73facc8efbf98738e3587.jpeg"},{"id":85362760,"identity":"932652e4-ac59-4808-a81e-0abc45fc405e","added_by":"auto","created_at":"2025-06-25 06:23:06","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":97697,"visible":true,"origin":"","legend":"\u003cp\u003eTrial sequence of the visual search task\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eNote.\u003c/em\u003e The size of the visual stimulus is adjusted in the figure for better visibility.\u003c/p\u003e","description":"","filename":"Figure4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6861119/v1/a6f8963c9772b60c476c8add.jpeg"},{"id":85362775,"identity":"f2ebf60e-8fcd-4fcf-99d2-ef426712d572","added_by":"auto","created_at":"2025-06-25 06:23:06","extension":"jpeg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":36592,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eNote.\u003c/em\u003e (a) Bar plot showing the dwell time in milliseconds by violation type of objects during free viewing task. Points show participant means for each condition. Error bars show ± 1 standard error. (b) Boxplot showing the first fixation time and reaction time in milliseconds to target objects by violation time during visual search task. White diamond shape represents mean values. * = \u003cem\u003ep\u003c/em\u003e \u0026lt; .05; ** = \u003cem\u003ep \u003c/em\u003e\u0026lt; .01; *** = \u003cem\u003ep\u003c/em\u003e \u0026lt; .001.\u003c/p\u003e","description":"","filename":"Figure5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6861119/v1/34dc2ead2449eaa61a53dfce.jpeg"},{"id":85362782,"identity":"fd88240c-5084-4b50-989b-8a90b6aa2fdc","added_by":"auto","created_at":"2025-06-25 06:23:07","extension":"jpeg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":139500,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eNote.\u003c/em\u003e (a) Regression of first fixation time (FFT) on target objects during the visual search task against the dollhouse syntax measure. (b) Regression of reaction time (RT) for finding target objects during the visual search task against the dollhouse syntax measure. Points represent participants' FFT and RT during the task. Gray areas represent the 95% confidence intervals. * = \u003cem\u003ep\u003c/em\u003e \u0026lt; .05; ** = \u003cem\u003ep \u003c/em\u003e\u0026lt; .01; *** = \u003cem\u003ep\u003c/em\u003e \u0026lt; .001.\u003c/p\u003e","description":"","filename":"Figure6.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6861119/v1/4e64a169c8a2bc2bca183da7.jpeg"},{"id":85364288,"identity":"efb1a7a2-eb0d-4f5a-8a4a-e0e18c9633d4","added_by":"auto","created_at":"2025-06-25 06:31:06","extension":"jpeg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":109585,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eNote.\u003c/em\u003e (a) Regression of dwell time on target objects during the free viewing task against participant age. (b) Regression of dwell time difference between semantically inconsistent objects and consistent objects against participant age. (c) Regression of dwell time difference between syntactically inconsistent objects and consistent objects against participant age. Points represent participants’ dwell time during the task in (a) and the mean dwell time difference in (b) and (c). Gray areas represent the 95% confidence intervals. * = \u003cem\u003ep\u003c/em\u003e \u0026lt; .05; ** = \u003cem\u003ep \u003c/em\u003e\u0026lt; .01; *** = \u003cem\u003ep\u003c/em\u003e \u0026lt; .001.\u003c/p\u003e","description":"","filename":"Figure7.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6861119/v1/0bed25dbaac117d8526b5300.jpeg"},{"id":85362772,"identity":"e52206a1-988b-4754-b835-d3a93f9983b8","added_by":"auto","created_at":"2025-06-25 06:23:06","extension":"jpeg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":177506,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eNote.\u003c/em\u003e (a) Regression of first fixation time (FFT) on target objects during visual search task against participant age. (b) Regression of reaction time (RT) for finding target objects during visual search task against participant age. Points represent participants’ FFT and RT during the task. (c) Regression of dollhouse object placement performance against participant age. (d) Regression of dollhouse related object distance against participant age. Gray areas represent the 95% confidence intervals. * = \u003cem\u003ep\u003c/em\u003e \u0026lt; .05; ** = \u003cem\u003ep \u003c/em\u003e\u0026lt; .01; *** = \u003cem\u003ep\u003c/em\u003e \u0026lt; .001.\u003c/p\u003e","description":"","filename":"Figure8.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6861119/v1/4aa4131688e8b695e7a29ef6.jpeg"},{"id":88776135,"identity":"e1c6f792-14d2-42f2-9b41-5156374219a9","added_by":"auto","created_at":"2025-08-11 10:08:23","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1645815,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6861119/v1/6f1c3db7-08cb-40f6-960f-91c84cd52b97.pdf"},{"id":85365378,"identity":"256e3b93-e969-4e9b-8a0c-4e76876798b3","added_by":"auto","created_at":"2025-06-25 06:39:06","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":43524,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformationTurkBahnKauschkeVo.docx","url":"https://assets-eu.researchsquare.com/files/rs-6861119/v1/f9722ff6622b838b3eb006a6.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Where Things Belong: The Development of Scene Knowledge in Childhood","fulltext":[{"header":"Introduction","content":"\u003cp\u003eThe organization of visual scenes allows us to predict where objects within these scenes will typically be located [1]. Even if we enter a house we have never been in before, we can easily predict that the soap will be located above the sink and the toilet paper next to the toilet. Like other processes we learn throughout our development, we learn these rules over time, as we interact with our environment. These rules guide us in our daily lives, facilitating object recognition, search processes, goal-directed behaviors, and our overall understanding of scenes [1, 2, 3, 4]. The main focus of this study was to examine the developmental trajectory of scene knowledge as reflected in attentional deployment and behavior.\u003c/p\u003e\n\u003cp\u003eThe scene grammar framework addresses the rules governing object placement within visual scenes (for a review, see [5]). Inspired by the distinction between semantics and syntax in language, we define scene semantics as the expectation of \u003cem\u003ewhat\u003c/em\u003e objects should be present in a particular scene and scene syntax as the expectation of \u003cem\u003ewhere\u003c/em\u003e those objects should be located [1, 3, 6, 7, 8]. The scene grammar framework is hierarchical in nature, where certain objects are associated with specific categories of scenes and within a scene tend to cluster in a meaningful and functional way, forming what we call \u0026ldquo;phrases.\u0026rdquo; Finally, within each phrase, large, static \u0026ldquo;anchor\u0026rdquo; objects provide cues about the location and identity of surrounding \u0026ldquo;local\u0026rdquo; objects that are positioned around these anchors [2, 9, 10].\u003c/p\u003e\n\u003cp\u003eUnderstanding how scene rules guide our interaction with everyday environments, as well as how this ability develops, is crucial for gaining insights into how we effortlessly orient ourselves using this knowledge. Studies with adults consistently show that they spend less time fixating on objects that conform to scene rules compared to those that violate them, for both semantic [7, 11, 12, 13, 14] and syntactic violations [7, 10, 11, 13, 14]. This \u0026quot;consistency effect\u0026quot; is well established in adults (for a review see [15]), but its developmental emergence in children is less studied. Recent studies have started to bridge this gap by investigating how infants and young children process scene rules. For example, research with 24-month-old toddlers has shown a similar consistency effect, with shorter dwell times on consistent targets compared to semantically inconsistent ones, though this effect was only observed when targets were highly salient\u0026nbsp;[16].\u0026nbsp;Neural studies from our research group further demonstrated that scene rule violations are processed as early as two years of age [17], while by the age of four, children begin to show both implicit and explicit behavioral responses to semantic and syntactic violations [14]. Extending this trajectory, it has been showed that typically developing children aged 5 and 6 also exhibit this consistency effect as seen in\u0026nbsp;lower looking time proportions, shorter first-pass dwell times, and shorter first fixation durations to consistent target objects [18].\u0026nbsp;In our recent work [6], we examined implicit and explicit scene grammar processing in children aged 6 to 10 years with and without developmental language disorder (DLD). While both groups showed inconsistency effects, age-related changes were more pronounced in typically developing children, who became increasingly sensitive to semantic and syntactic violations with age. In contrast, this developmental trend was less evident in children with DLD.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAll prior developmental studies have employed free viewing paradigms to assess the consistency effect. While informative, these paradigms capture only one aspect of visual behavior. The current study addresses these gaps by implementing both a free viewing task and a visual search task, and by covering a broader developmental window (ages 6 to 10). This allows us to examine the consistency effect across different viewing conditions and attentional demands, offering a more comprehensive picture of how scene grammar knowledge develops and manifests in childhood.\u003c/p\u003e\n\u003cp\u003eTo explore how children\u0026rsquo;s implicit and explicit behaviors reflect their understanding of scene organization and how this understanding improves with age, we build on previous work by our research group [14] with 2- to 4-year-olds and adults. In this previous work, a free viewing task featuring scenes with semantic or syntactic violations was implemented while tracking eye movements, as well as a dollhouse placement task where participants freely furnished a dollhouse. In their study, eye movements during the free viewing task served as an implicit measure of scene knowledge, allowing us to infer processing of violations without directly asking children if they noticed them. The time spent on these inconsistent objects indicates the processing of expectation violations, even if children do not explicitly point them out. For an explicit measure of scene knowledge, the dollhouse placement task provided a straightforward way to assess children\u0026rsquo;s understanding by asking them to furnish a dollhouse with predefined rooms, serving as a miniature version of a real-world house. The accuracy of the scene categorization of objects and the distance between correctly placed objects and their anchors were measured. This explicit measure indicates the children\u0026rsquo;s ability to access and apply their scene knowledge and reflects their understanding of how objects are organized in real-world scenes. We employed a similar methodology, including the free viewing and dollhouse placement tasks, but adding also a visual search task to measure implicit behavior through eye tracking and reaction time collection. We also refined our analysis by incorporating developmental tests into our models and enhancing the precision of our methods for extracting measures from the dollhouse tasks (see Methods for details).\u003c/p\u003e\n\u003cp\u003eOur primary goal was to examine scene grammar knowledge in children by employing both implicit and explicit task-based assessments, while also exploring the relationship between these measures. Specifically, we used eye-tracking metrics (such as dwell time and first fixation) and placement accuracy in a dollhouse task as empirical measures of children\u0026rsquo;s ability to process objects based on scene-based semantic and syntactic rules. \u0026nbsp;Additionally, we aimed at investigating whether and how the strength of scene-related expectations changes as children grow older. Developmental theories suggest that children gradually shift from object-based learning to more flexible, relational, and rule-based forms of reasoning as they grow older [19, 20] In line with this, research on category learning with rules revealed that the ability to assign items into rule-based categories continues to develop throughout the preschool and school years [21, 22]. Only around age 10 children typically reach adult-like proficiency in rule-based categorization and relational integration [22, 23].These findings are relevant to our framework: in essence, semantic scene knowledge involves the ability to categorize objects according to the type of scene they belong to (e.g., recognizing that a sink is typically found in a bathroom, not a bedroom). This mirrors classic category learning tasks in which children must learn to assign items to categories based on shared or rule-defined features. Syntactic scene knowledge, by contrast, requires children to go beyond simple categorization. It involves understanding the spatial and functional relationships between objects within a scene (e.g., that a a mirror is usually placed above a sink). Acquiring such knowledge relies on relational reasoning and rule learning \u0026mdash; the ability to extract underlying patterns from repeated exposure to structured environments. Based on these, we anticipated observing a consistency effect in our age group with further increase in the strength of their scene knowledge as they age, and a positive relationship between our implicit and explicit measures.\u0026nbsp;\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e\u003cstrong\u003eParticipants\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe recruited a total of 38 children, aged between 5.58 and 10.66 years (\u003cem\u003eM\u003c/em\u003e = 8.27). Of these, 17 were female and 21 were male. All participants had normal or corrected-to-normal vision with no neurological conditions, as confirmed by parent questionnaires. We only included children whose native language was German; thus, we excluded 2 bilingual children who did not fall within the range of typical language development in German. Additionally, 3 more children were excluded for not completing all tasks. This resulted in a final sample of 33 children, aged between 5.58 and 10.66 years (\u003cem\u003eM\u003c/em\u003e = 8.22, 16 female, 17 male).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe children were recruited through collaborations with schools and public outreach initiatives at the universities of Frankfurt and Marburg (Hesse, Germany). Informed written consent was obtained from the parents before the study commenced. Children received small gifts as a token of appreciation, and parents were reimbursed for travel expenses. The study adhered to the Declaration of Helsinki and was approved by the local ethics committee.\u003c/p\u003e\n\u003cp\u003eWe designed tasks that incorporated both explicit and implicit measures of scene knowledge to identify indicators of scene grammar and to explore the developmental trajectory of scene comprehension.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eImplicit measure of scene knowledge: Free viewing task\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eApparatus\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe monitored children\u0026rsquo;s eye movements using the EyeLink 1000 Portable Duo system (SR Research, Kanata, Ontario, Canada), focusing on the left eye with a sampling rate of 500 Hz in remote mode. The stimuli were displayed on either a 17-inch laptop or a 24-inch monitor, both with a resolution of 1920 x 1080 pixels and a refresh rate of 144 Hz. Children were positioned about 70 cm from the screen.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eStimuli\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFrom the SCEGRAM database [13], we selected 45 images including 15 semantically inconsistent, 15 syntactically inconsistent, and 15 consistent scenes (see Figure 1 for an example). We defined areas of interest (AOIs) that maintained consistent size across violations and AOI positioning of the objects was identical across semantically inconsistent and consistent scenes. We incorporated a 75-pixel buffer around each AOI to ensure precise eye-tracking, especially given the more lenient thresholds for children. We counterbalanced the images across participants, ensuring that each object was seen in only one condition per participant. To address any potential saliency effects, we assessed the mean saliency rank using DeepGaze IIE [24]. The results showed no significant differences in saliency between conditions [consistent: \u003cem\u003e\u0026beta;\u003c/em\u003e= .96, \u003cem\u003eSE\u003c/em\u003e= .078, \u003cem\u003ez\u003c/em\u003e= -.556, \u003cem\u003ep\u003c/em\u003e= .578; semantically inconsistent: \u003cem\u003e\u0026beta;\u003c/em\u003e= .94, \u003cem\u003eSE\u003c/em\u003e= .07, \u003cem\u003ez\u003c/em\u003e= -.87, \u003cem\u003ep\u003c/em\u003e= .384].\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eProcedure\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAt the start of the experiment, we performed a 5-point calibration using an audio-visual target and conducted a drift check after every 10 trials. Children then completed 2 practice trials before proceeding to the main phase, which included 45 trials. Each trial commenced with an audio-visual fixation spiral positioned either on the left or right side of the screen. To trigger the presentation of the scene images, children needed to fixate on the spiral for a minimum of 500 milliseconds. The images were displayed for 7 seconds, and a reward video was shown randomly approximately every 2 images, lasting around 10 seconds each time (Figure 2). The children were instructed to view the scenes freely.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eAnalysis\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe used total dwell time (DT) as our primary eye movement measure to explore children\u0026apos;s responses to violations in scenes. Dwell time represents the cumulative duration of all fixations within a specific area of interest from the initial to the final visit, thus indicating the level of interest in that area. We excluded trials with a dwell time shorter than 100 ms from our analysis (1.59 %), as such brief durations are generally deemed inadequate for meaningful information processing [25]. We further analyzed the mean dwell time difference for each participant by subtracting their average dwell time on consistent scenes from their dwell time on inconsistent scenes. This difference reflected the actual sensitivity towards violations compared to consistent targets and allowed us to examine the effect of these differences on our other measures (see Results). Exploratory analyses of refixation counts\u0026mdash;reflecting how often participants returned their gaze to the same area of interest (target objects)\u0026mdash;are reported in Supplementary Information 1.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eImplicit measure of scene knowledge: Visual search task\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe visual search task was conducted either immediately before or after the free viewing task, with exceptions occurring only if technical or timing issues interrupted the planned session. In such cases, the task was rescheduled and conducted in a separate session. The sequence of these tasks was randomly counterbalanced.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eApparatus\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe tracked children\u0026rsquo;s eye movements using the same eye-tracking equipment as in the free viewing study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eStimuli\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eForty images with objects placed in both expected and unexpected locations were selected from the BOiS - Berlin Object in Scene Database [26], with 20 images per condition (see Figure 3 for an example). This database includes a range of indoor and outdoor scenes, featuring target objects photographed from multiple angles and positioned either in expected or unexpected locations, maintaining a similar distance from the center of the scene in both cases. We chose everyday indoor scenes with familiar target objects for children as the stimuli. Similar to the procedure in the free viewing task, we maintained the same size for areas of interest (AOIs) across both unexpected and expected scenes. We also included 75-pixel buffer around each AOI to ensure precise eye-tracking. The order of conditions was randomized and counterbalanced across participants to ensure that every child only saw each object once, either in an expected or an unexpected location. Scene images were presented at a resolution of 1024 x 768 pixels, while target objects were displayed at 288 x 192 pixels. We evaluated the mean saliency rank using again DeepGaze IIE [24] to account for potential saliency effects. The analysis revealed no significant saliency differences between expected and unexpected objects (\u003cem\u003e\u0026beta;\u003c/em\u003e = -.052, \u003cem\u003eSE\u003c/em\u003e = .034,\u003cem\u003e\u0026nbsp;z\u003c/em\u003e = -1.538, \u003cem\u003ep\u003c/em\u003e = .128).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eProcedure\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe experimental setup was similar to the free viewing task. We began with a 5-point calibration using an audio-visual target and conducted a drift check every 10 trials. Participants completed 2 practice trials before proceeding to the main phase, which included 40 trials. Each trial started with an audiovisual fixation spiral, requiring at least 500 milliseconds of gaze to either the left or right side of the screen, followed by a 1-second presentation of the target object. Children were instructed to find the target object as quickly as possible within the preceding scene. After another gaze-contingent fixation spiral, a 10-ms blank screen was shown before the scene image appeared, which remained on the screen until the participant pressed the spacebar to indicate they had found the target object (Figure 4). To ensure participants were aware that target objects were always present, we included a gaze-contingent feedback: i.e., if participants fixated on the predefined area of the target object during the 1-second period immediately prior to their key press, they received a reward in the form of a cheer/applause video. If their gaze did not meet this criterion, a scene image with a red frame around the target object\u0026apos;s location was shown as feedback.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eAnalysis\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe selected reaction time and first fixation time as our primary measures in the visual search task. Reaction time indicated how long it took participants to find and identify the target objects (from scene onset until button press), whereas first fixation time reflected how effectively their scene knowledge guided their search (from scene onset until first target fixation) and whether it sped up the process for expected targets, with earlier fixations on expected target locations suggesting a better understanding of where objects are found in the scene. We excluded trials that deviated by \u0026plusmn;2 seconds from the mean reaction time (\u003cem\u003eM\u003c/em\u003e = 7.19 sec, 4.7 %). Additional measures, including decision time and accuracy, are reported in Supplementary Information 1 to provide a comprehensive overview of all recorded task metrics.\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eExplicit measures of scene knowledge: Dollhouse task\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eProcedure\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe dollhouse task functioned as an explicit measure of children\u0026rsquo;s scene knowledge. Children were asked to place objects in a wooden dollhouse (Nic Spiel + Art GmbH, Laupheim, Germany) in the way they found most fitting. The dollhouse, consisting of two floors and four rooms, each measuring 31 cm by 40 cm, was empty except for one defining, i.e. diagnostic, object per room: a bed in a bedroom, a shower in the bathroom, an oven/stove in the kitchen, and a sofa in the living room. Each child was then provided with remaining 57 objects that were to be placed within the dollhouse. Children were not given a time limit and were asked to place all the objects to the best of their knowledge. The dollhouse task was always given as the final task if no exception occurred.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eDollhouse semantics\u0026nbsp;\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWhen selecting diagnostic objects to help establish clear scene categories when beginning the dollhouse task, we followed the approach described in [14], where one highly diagnostic (anchor) object was placed in each room. These diagnostic objects were chosen based on their strong scene associations, as quantified by real-world image statistics [27], characterizing the degree to which an item reliably signaled a specific scene category. For instance, while a fridge provides information about the scene being a kitchen, the kitchen scene also strengthens the expectation that a fridge should be present\u0026mdash;creating a reciprocal relationship of strong contextual cues. On the other hand, a book can be found across various scene categories, e.g. the bedroom, the living room or the kitchen, making it less straightforward to assign it to a specific scene. Based on these criteria, we selected a bed, a stove, a shower, and a sofa as diagnostic objects for the bedroom, kitchen, bathroom, and living room, respectively.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWe assessed semantic knowledge by measuring how accurately children placed the remaining objects into their designated room categories. Object-room-assignment was again primarily based on image statistics reported in [27], which quantify how frequently objects appear in specific scenes. In addition to these data-driven assignments, we also relied on informed researcher intuition to finalize the categorization of certain objects \u0026mdash; particularly those not clearly diagnostic or those that could reasonably appear in more than one room. Objects that were relevant to multiple rooms (e.g., a pillow) were considered correct if placed in any of those rooms (e.g., both bedroom and living room). Additionally, if children misidentified objects (e.g., misidentifying a nightstand as a stool) and placed them in the room corresponding to the object they thought it resembled (e.g., a stool in the living room), we still counted these placements as correct, provided that the placement was appropriate for the object they mistook it for. A full list of objects and their assigned scenes is provided in Supplementary Information 2.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eDollhouse syntax\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFor the syntactic analysis, we drew on the concept of scene phrases\u0026mdash;structures within scenes composed of anchor and local objects\u0026mdash;based on previously established spatial-functional groupings [2, 10]. While some of our anchor\u0026ndash;local object selections were directly inspired by the phrases proposed by the previous research of our group [9], others were guided by our own scene-based intuitions. This combined approach allowed us to define meaningful spatial and functional relationships within each room of the dollhouse. We assessed syntactic knowledge in the dollhouse task by measuring the distances between predefined anchor objects and local objects within phrases of the scenes. To this end, we first performed a 3D scan of each child\u0026apos;s completed dollhouse to achieve this. These scans were then imported into Unity (Unity Technologies, 2023, Version 2021.3.18f1), where we positioned pre-scanned models of the dollhouse objects inside the scanned 3D dollhouse model exactly as the participants had placed the real objects. This approach ensured maximal precision in our 3D measurements. Once all objects were positioned, we used a custom Unity script to create a matrix showing the distances between the centers of each object. Our analysis focused exclusively on objects placed in the correct room categories, with particular attention to the predefined anchor and local objects (12 anchor (19.67%) and 44 local (72.13%) objects).\u0026nbsp;A full list of anchor and local objects is provided in Supplementary Information 2.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDevelopmental Tests\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo account for the influence of children\u0026rsquo;s developmental cognitive skills on task performance, we administered both visual development and non-verbal intelligence tests. For visual development, we used the FEW \u0026ndash; 3 (Frostigs Entwicklungstest der visuellen Wahrnehmung, German adaption of \u0026nbsp; Developmental Test of Visual Perception \u0026ndash; DTVP-3, [28]), which includes tasks such as identifying shapes within complex visual backgrounds, mentally completing incomplete forms, and detecting similarities between different shapes. For non-verbal intelligence, we administered the CPM (Colored Progressive Matrices, [29]), where children completed patterns by selecting the correct piece to fit into a colored sequence. We included raw scores from both tests in our models to also account for age as a covariate in our models. However, to enhance interpretability for the reader, we report here the mean percentile ranks: the mean percentile for the FEW-3 was 56.6 (range = 2\u0026ndash;98), and for the CPM was 61.4 (range = 6\u0026ndash;100).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStatistical Analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe made sure that objects were not repeated across trials in the free-viewing task; however, a subset of scenes was presented more than once. Since scene repetition can produce potential priming effects in children\u0026rsquo;s eye-tracking behavior, such as reduced exploration, we included only the first presentation of each scene per participant in our final analyses. This led to the exclusion of 14% of trials from the free viewing eye-tracking dataset. For completeness, we report the results of the full model\u0026mdash;including also the repeated trials, with repetition order modeled as a random slope\u0026mdash;in the\u0026nbsp;Supplementary Information 3.\u003c/p\u003e\n\u003cp\u003eWe conducted our statistical analysis using generalized linear mixed models (GLMMs). GLMMs enabled us to incorporate random effects, accounting for variations in both the saliency ranks of the scenes and individual differences among participants. This approach enabled us to model our data at the trial level while appropriately handling non-normal distributions. GLMMs were particularly suited for our binomial outcome measures\u0026mdash;such as correct vs. incorrect object placement in the dollhouse task\u0026mdash;as well as for continuous outcomes that violated the assumptions of linear mixed models (LMMs), including skewed or log-normal response variables.\u003c/p\u003e\n\u003cp\u003eThe analyses were carried out in the R environment (version 2023.09.1) using the lmer function from the lme4 package [30]. We calculated p-values using the lmerTest package [31], applying Satterthwaite\u0026rsquo;s approximation for determining degrees of freedom.\u003c/p\u003e\n\u003cp\u003eFor the free viewing task models, we included dwell time as the dependent variable, and violation type (with levels of consistent, semantically inconsistent, and syntactically inconsistent) as the fixed factor (predictor). In the visual search task, our dependent variables were first fixation time and reaction time, with violation type (with the levels of expected vs. unexpected target locations) as the key fixed factor (predictor). For the dollhouse task, the dependent variables were object placement accuracy (%) for dollhouse semantics, and the mean distance between anchor and local objects for dollhouse syntax. Further, we sought to understand the relationship between our implicit measures in the free viewing and search tasks (such as dwell time, first fixation time, and reaction time) and explicit measures of performance in the dollhouse task (object placement accuracy and distance between related objects). Specifically, we analyzed how dwell time on semantically inconsistent scenes in the free viewing task related to object placement in the dollhouse task (dollhouse semantics), and how dwell time on syntactically inconsistent scenes was associated with mean distance between related objects (dollhouse syntax). Given the syntactic aspect of the inconsistently placed targets in the visual search task, we also examined the relationship between visual search task metrics (first fixation time, reaction time) and dollhouse syntax, enabling us to both examine relationships between explicit-explicit and explicit-implicit measures. Finally, we ran models with our main measures (dwell time on violations, reaction time on violations, first fixation time on violations, object placement accuracy, and distance between related objects) and included age as a continuous predictor to examine the developmental trajectory of scene knowledge.\u003c/p\u003e\n\u003cp\u003eTo account for individual variability, participants were consistently included as a random factor in all models. We also included random intercepts and slopes for the saliency ranks of each scene, allowing both baseline responses and the effects of saliency to differ across scenes. This approach provided a more nuanced model, accommodating variations in both the baseline responses and the predictor effects across different scenes. When age was not a predictor, it was included as a covariate in our models, alongside developmental test scores from the CPM and FEW tests, which were incorporated based on their potential contribution to the model fit and impact on the response variables. Although both CPM and FEW test scores were collected, only scores from the FEW were included in the final models, as the CPM either led to convergence issues or did not improve model fit. Continuous predictors, such as age, FEW scores and the mean distance between related objects, were scaled (z-transformed) prior to analysis to improve model convergence and interpretability of fixed effects.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e\u003cstrong\u003eImplicit measures\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eResults from our model for the free viewing task revealed a significant effect of object consistency on children\u0026rsquo;s dwell time (\u003cem\u003e\u0026beta;\u0026nbsp;\u003c/em\u003e= -0.136, \u003cem\u003eSE\u003c/em\u003e = 0.06, \u003cem\u003ez\u003c/em\u003e = -2.293, \u003cem\u003ep\u003c/em\u003e = .022) (Figure 5a). Specifically, children spent less time looking at consistent objects compared to semantically inconsistent objects (\u003cem\u003eratio\u003c/em\u003e = 0.747, \u003cem\u003eSE\u003c/em\u003e = 0.077, \u003cem\u003ez ratio\u003c/em\u003e = -2.814, \u003cem\u003ep\u003c/em\u003e = .014), indicating a sensitivity to semantic violations. However, this difference was not observed for syntactically inconsistent objects (\u003cem\u003eratio\u003c/em\u003e = .888, \u003cem\u003eSE\u003c/em\u003e = 0.09, \u003cem\u003ez ratio\u003c/em\u003e = -1.152, \u003cem\u003ep\u003c/em\u003e = .482). There was also no significant difference between the dwell time on semantic violations and syntactic violations (\u003cem\u003eratio\u003c/em\u003e = 1.188, \u003cem\u003eSE\u003c/em\u003e = 0.123, \u003cem\u003ez ratio\u003c/em\u003e = -1.662, \u003cem\u003ep\u003c/em\u003e = .22).\u003c/p\u003e\n\u003cp\u003eIn the visual search task, children directed their first fixation on unexpectedly placed target objects significantly later compared to expectedly placed targets (\u003cem\u003e\u0026beta;\u003c/em\u003e = 0.288, \u003cem\u003eSE\u003c/em\u003e = 0.082, \u003cem\u003ez\u003c/em\u003e =\u0026nbsp;3.515, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; .001). Additionally, children reacted significantly slower to unexpectedly placed objects than expectedly placed ones (\u003cem\u003e\u0026beta; = 0\u003c/em\u003e.146, \u003cem\u003eSE = 0\u003c/em\u003e.043, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e3.387, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e\u0026lt; .001) (Figure 5b).\u003c/p\u003e\n\u003cp\u003eTo summarize the results from the implicit measures, children in this age range looked less at semantically inconsistent objects in the free viewing task and located expectedly placed objects more quickly in the visual search task.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eRelationship between implicit and explicit measures\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eExplicit measures obtained from the dollhouse task revealed that children had a mean correct placement performance of 84% (\u003cem\u003eSD\u003c/em\u003e = 6.9%, \u003cem\u003erange\u003c/em\u003e = 60%\u0026ndash;93%). On average, they placed local objects at a distance of 0.29 meters from their anchors (\u003cem\u003eSD\u003c/em\u003e = 0.075 m, \u003cem\u003erange\u003c/em\u003e = 0.03 m\u0026ndash;0.54 m). Further analyses were conducted to examine the relationship between these explicit measures and our implicit measures.\u003c/p\u003e\n\u003cp\u003eDwell time on semantic violations did not significantly relate with dollhouse semantics (\u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e-0.874, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e0.849, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e-1.080, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.280). Also, dwell time on syntactic violations did not significantly predict the mean distance between related objects in the dollhouse task (\u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e-4.153, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e3.423, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e-1.213, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.225). Further analysis revealed a significant relationship showing that as the mean difference in dwell time between syntactically inconsistent and consistent targets increased, the mean distance between related objects decreased (\u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e-197.57, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e96.44 \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e-2.049, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.049).\u003c/p\u003e\n\u003cp\u003eIn the visual search task, neither first fixation time on unexpected targets nor on expected targets predicted dollhouse syntax (unexpected: \u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e-.023, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e0.044, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e-.052, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.603; expected: \u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e0.037, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e0.034, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e1.108, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.268) (Figure 6a). The same insignificant pattern revealed itself for reaction time (unexpected: \u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e-.039, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e0.035, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e-1.13, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.258; expected: \u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e0.022, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e0.036, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e0.606, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.545) (Figure 6b).\u003c/p\u003e\n\u003cp\u003eTogether, these results suggest that syntactic dwell time in the free viewing task was significantly related to explicit syntactic scene knowledge\u0026mdash;but only when the dwell time difference between syntactically inconsistent and consistent targets was considered. No other implicit measures\u0026mdash;including those from the visual search task\u0026mdash;showed significant associations with the explicit measures derived from the dollhouse task.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDevelopmental trajectory of scene knowledge\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn the free viewing task, we observed that children showed reduced dwell time for consistent targets as age increased (\u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e-0.66, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e0.03, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e-2.188, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.028, \u003cem\u003er\u003c/em\u003e = -0.089). For semantic and syntactic violations, dwell time was not significantly related to age (SEM: \u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e0.043, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e.035, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e1.222, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.222, \u003cem\u003er\u003c/em\u003e = 0.108; SYN: \u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e0.023, \u003cem\u003eSE\u0026nbsp;\u003c/em\u003e= 0.04, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e0.564, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.573, \u003cem\u003er\u003c/em\u003e = 0.037) (Figure 7a).\u003c/p\u003e\n\u003cp\u003eA significant positive relationship emerged when examining the differences in dwell time between violations. Simple linear models indicated that the difference in mean dwell time between semantically inconsistent and consistent targets significantly increased with age (\u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e171.98, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e78.85, \u003cem\u003et =\u0026nbsp;\u003c/em\u003e2.181, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.037) (Figure 7b). The difference in mean dwell time between syntactically inconsistent and consistent targets did not show a significant relationship with age (\u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e98.10, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e62.08, \u003cem\u003et =\u0026nbsp;\u003c/em\u003e1.580, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.124) (Figure 7c).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn the visual search task, we found a significant decrease in first fixation time to unexpected targets with age (\u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e-0.09, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e.042, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e-2.124, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.034) and even more so for expected targets (\u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e-.125, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e.034, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e-3.646, \u003cem\u003ep \u0026lt;\u0026nbsp;\u003c/em\u003e.001; see Figure 8a). Age also predicted reaction time to unexpected and expected targets, showing that reaction time to both target types decreased with age (unexpected: \u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e-0.154, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e.034, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e-4.514, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; .001; expected: \u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e-.162, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e.035, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e-4.634, \u003cem\u003ep \u0026lt;\u0026nbsp;\u003c/em\u003e.001) (Figure 8b).\u003c/p\u003e\n\u003cp\u003eIn the dollhouse task, we observed a trend suggesting that object placement performance improves with age (\u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e.096, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e.052, \u003cem\u003ez =\u0026nbsp;\u003c/em\u003e1.854, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.064) (Figure 8c). However, no significant relationship was found between age and the distance between related objects (\u003cem\u003e\u0026beta; =\u0026nbsp;\u003c/em\u003e-.002, \u003cem\u003eSE =\u0026nbsp;\u003c/em\u003e.002, \u003cem\u003et =\u0026nbsp;\u003c/em\u003e-.837, \u003cem\u003ep =\u0026nbsp;\u003c/em\u003e.414) (Figure 8d).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFinally, FEW test scores that were included in the models as covariates to account for individual differences in visual perception did not show significant effects on the dependent variables. Full model outputs are provided in Supplementary Information 4.\u003c/p\u003e\n\u003cp\u003eIn summary, these results suggest that the implicit use of scene grammar knowledge continues to strengthen with age, as evidenced by faster attentional disengagement of consistent targets and more efficient search strategies for expected targets in older children. However, the results from the dollhouse task did not reveal a significant improvement in children\u0026apos;s active use of explicit scene knowledge between the ages of 6-10.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn the present study, we aimed to investigate how scene knowledge manifests itself and develops both implicitly and explicitly in children aged 6 to 10 years, while also examining the relationship between these two types of measures. Our eye-tracking results indicated that children in this age range exhibited a consistency effect in response to semantic scene violations, and effectively utilized their scene knowledge during the visual search task. Furthermore, children who displayed strong syntactic scene knowledge in the free viewing task also tended to perform better in placing objects according to syntactic rules during the dollhouse task. Our findings also illustrated the ongoing developmental process of scene knowledge in children, as evidenced by a growing consistency effect in both free viewing and visual search tasks.\u003c/p\u003e \u003cp\u003eThe results of this study contribute to our understanding of the developmental trajectories of scene grammar knowledge by expanding the age range previously investigating the consistency effect in scenes. Prior research using brain recordings has shown that children can process scene violations as early as two years old [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e], while the implicit manifestation of this effect can be observed in their eye movements by age four [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Our study complements these findings by demonstrating that - as expected - the consistency effect remains robust in children aged 6\u0026ndash;10 while providing further evidence that suggests continued development across this age range. Moreover, by incorporating a visual search task into our methodology, we provide a broader toolbox including a diverse set of ecologically valid tasks that capture different facets of children\u0026rsquo;s scene knowledge. In the free viewing task, shorter dwell times on consistent objects suggested that children required less processing time for scene-congruent elements \u0026mdash; indicating fluent use of their scene-based expectations. In contrast, the visual search task allowed us to assess how quickly children could locate expected objects, with faster fixations reflecting more efficient guidance of attention based on scene grammar. Together, these measures demonstrate that children not only process expected objects more fluently but also actively use their scene knowledge to navigate structured environments more efficiently.\u003c/p\u003e \u003cp\u003eWhile previous studies have demonstrated a consistency effect also for syntactic violations during a free viewing task as early as age 4 [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] and between 5\u0026ndash;6 years [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], we only found a consistency effect for semantic violations. The study in [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] employed a between-subjects design, in which each child was presented with either semantically or syntactically inconsistent scenes. In contrast, our approach of presenting both types of violations within the same experiment may have influenced the processing of violations thereby altering the children's gaze behavior toward syntactic violations. Moreover, the consistency effect identified in [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] intriguingly interacted with saliency, revealing that children exhibited longer dwell times on syntactically inconsistent objects with low saliency. As it is typically expected that visual salience drives gaze more strongly than contextual cues [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e], the authors interpreted these findings from 5- to 6-year-old children as suggesting that their syntactic knowledge may not be as robustly developed as their semantic knowledge, potentially undermining the reliability of these results. Consistent with this interpretation, in our previous study [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], we found a syntactic inconsistency effect only when dividing the typically developing sample into two age groups: only older children (ages 8.92\u0026ndash;10.83, \u003cem\u003en\u003c/em\u003e\u0026thinsp;=\u0026thinsp;11) demonstrated such an effect, and even then, it was less prominent than the effect for semantic inconsistencies. Given these findings, it is not surprising that we could not fully replicate previous studies. Nonetheless, despite the differences in task instructions and motivations between the visual search task and traditional free viewing tasks, the presence of syntactic violations in the visual search task\u0026mdash;and children\u0026rsquo;s later first fixation times on unexpected objects\u0026mdash;suggest that such violations disrupt their search behavior, indicating sensitivity to syntactic scene knowledge. However, further research is needed to clarify specifically the developmental trajectory of processing syntactic rules in scenes.\u003c/p\u003e \u003cp\u003eA distinctive aspect of this study was our examination of the relationship between implicit and explicit measures of scene knowledge, building on and extending previous work presented in [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Our results demonstrated that dwell time difference between syntactically inconsistent objects and consistent objects during the free viewing task was a significant predictor of the dollhouse syntax measure, replicating previous findings. Our results are also in line with those reported in [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], where no significant relationship was observed between dollhouse semantics and dwell time on semantically inconsistent objects in children aged 2\u0026ndash;4 years; similarly, we did not find a significant relationship in our sample of children aged 6\u0026ndash;10 years. The authors of [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] suggested that this absence of an implicit-explicit correlation in semantic knowledge might stem from differences in task demands\u0026mdash;whether or not explicit instructions were given\u0026mdash;and the absence of the scene gist advantage in the dollhouse task. We concur with this interpretation and bolster it with our own findings which showed a significant syntax-performance link\u0026mdash;but an absent semantic link. Additionally, an important factor to keep in mind is that our dollhouse syntax measure only holds when children place objects in the correct room, requiring both semantic and syntactic knowledge. In contrast, dollhouse semantics might be more affected by task-specific factors. Research shows that instructions can significantly influence children\u0026rsquo;s categorization strategies [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. We also noticed during testing that children sometimes struggled to recognize objects due to the absence of explicit labeling\u0026mdash;a deliberate choice to avoid interfering with their object/scene knowledge. Research shows that labels enhance object categorization in infants and children [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. Additionally, understanding an object\u0026rsquo;s function also plays a role [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. Future studies might refine the dollhouse task with more explicit instructions to better capture scene knowledge and control for object recognition. Moreover, examining the effects of naming\u0026mdash;specifically, whether labeling objects during scene construction facilitates placement performance\u0026mdash;could provide important insights into how linguistic processes interact with the application of scene grammar. Lastly, the positive relationship between dwell time differences for syntactic violations in the free viewing task and children's dollhouse syntax performance suggests a link between children's implicit and explicit syntactic scene knowledge. This finding shows that children apply their syntactic scene knowledge both when processing objects in structured scenes and when constructing scenes themselves. Thus, we extend previous research by demonstrating that children\u0026rsquo;s sensitivity to syntactic rules is consistent across tasks requiring either passive exploration of structured scenes or active scene assembly.\u003c/p\u003e \u003cp\u003eWhile our previous study [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] established initial evidence of scene grammar processing in children with and without DLD, the present study expands on this work by including a larger sample of typically developing participants, introducing a visual search task, systematically analyzing age-related effects, and directly examining the relationship between implicit and explicit scene grammar measures. Building on those findings, the current results provide a more detailed picture of how scene knowledge continues to refine ages between 6 to 10. Specifically, our analyses on children\u0026rsquo;s eye movements revealed that as children age, they spend less and less time particularly on consistent objects in the free viewing task and they direct their gaze to these objects earlier in the visual search task. This shift indicates a growing robustness in children's scene knowledge, as they increasingly utilize their existing knowledge by allocating less attention to expected or typical objects and adjusting their search strategies according to their scene-rule-based expectations to make search more efficient. Our results on the developmental trajectory of scene knowledge align with existing research on rule learning which suggests that by age four, children begin developing rule-based categorization skills\u0026mdash;such as for spatial relations, perceptual similarities, or information-integration categories\u0026mdash;ultimately progressing to more complex, higher-order rules as their executive functions mature [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]. Additionally, studies indicate that children's capacity for complex cognitive processing, such as spatial clustering and relational integration, gradually increases starting around age five [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e, \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. Our results affirm that the development of rule-based categorization and relational integration within the context of scene knowledge is also ongoing between the ages of 6 and 10, and they also highlight the strengthening of the consistency effect, which was found to emerge at age four in [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eOur study had its limitations despite methodological refinements. With only 33 children in our dataset and a within-subject design including many different tasks, this sample size may not have been sufficient to establish clear relationships across all measures. Additionally, the inconclusive results regarding the implicit-explicit measures suggest that further refinements to the dollhouse task are warranted. Future studies could consider creating 3D apartments with more realistic, real-world objects to address issues related to the recognition of objects. However, careful attention should also be paid to task instructions to capture the real understanding of scene knowledge.\u003c/p\u003e \u003cp\u003eIn conclusion, the present study contributes to the literature by examining both implicit and explicit measures of scene knowledge in children aged 6\u0026ndash;10 years. Our findings demonstrate that children show increased sensitivity to both semantic and syntactic scene violations as they grow older. Importantly, we found a link between children's eye movement behavior during implicit tasks and their ability to apply syntactic scene rules explicitly in the dollhouse task. These results highlight a developmental progression in children's ability to extract and apply scene grammar rules, suggesting an emerging integration of visual attention and conceptual understanding. We hope this work stimulates further research on the development of scene grammar, utilizing larger sample sizes and more refined methodologies.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe are grateful to all the children and parents who took part in this study. We also sincerely thank Theresa Henke, Judith Hollnagel, Ronja Schnellen, and Jonathan Mader for their assistance with data collection.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)\u0026mdash;project number 222641018\u0026mdash;SFB/TRR 135 (Projects C3 and C7) and the Hessisches Ministerium f\u0026uuml;r Wissenschaft und Kunst (HMWK; project \u0026lsquo;The Adaptive Mind\u0026rsquo;).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConceptualization, D.D.T., D.B., C.K., and M.L.-H.V.; methodology, D.D.T., D.B., C.K., and M.L.-H.V.; software, D.D.T., D.B., C.K., and M.L.-H.V.; validation, D.D.T., D.B., C.K., and M.L.-H.V.; formal analysis, D.D.T. and D.B.; investigation, D.D.T. and D.B.; and resources, C.K., and M.L.-H.V.; data curation, D.D.T. and D.B.; \u0026nbsp;writing\u0026mdash;original draft preparation, D.D.T. and D.B.; writing\u0026mdash;review and editing, D.D.T., D.B., C.K., and M.L.H.V.; visualization, D.D.T., D.B., C.K., and M.L.-H.V.; supervision, C.K. and M.L.-H.V.; project administration, C.K., and M.L.-H.V.; funding acquisition, C.K., and M.L.-H.V. All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll data used in this study are publicly available and can be accessed via https://osf.io/a39gr/\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eBiederman, I., Mezzanotte, R. J. \u0026amp; Rabinowitz, J. C. (1982). Scene perception: Detecting and judging objects undergoing relational violations. \u003cem\u003eCognitive Psychology 14\u003c/em\u003e, 143\u0026ndash;177.\u003c/li\u003e\n\u003cli\u003eDraschkow, D., \u0026amp; V\u0026otilde;, M. L.-H. (2017). Scene grammar shapes the way we interact with objects,strenghtens memories, and speeds search. \u003cem\u003eScientific Reports,7\u003c/em\u003e(1), 16471. https://doi.org/10.1038/s41598-017-16739-x pdf\u003c/li\u003e\n\u003cli\u003eV\u0026otilde;, M. L.-H., \u0026amp; Wolfe, J. M. (2013a). The interplay of episodic and semantic memory in guiding repeated search in scenes. \u003cem\u003eCognition, 126\u003c/em\u003e(2), 198-212. https://doi.org/10.1016/j.cognition.2012.09.017\u003c/li\u003e\n\u003cli\u003eV\u0026otilde;, M. L.-H. , \u0026amp; Wolfe, J. M. (2013b). Differential electrophysiological signatures of semantic and syntactic scene processing. \u003cem\u003ePsychological science, 24\u003c/em\u003e(9), 1816-1823. https://doi.org/10.1177/0956797613476955\u003c/li\u003e\n\u003cli\u003eV\u0026otilde;, M. L.-H. (2021). The Meaning and Structure of Scenes.\u003cem\u003e Vision Research, 181\u003c/em\u003e, 10-20. https://doi.org/10.1016/j.visres.2020.11.003 pdf\u003c/li\u003e\n\u003cli\u003eBahn, D., T\u0026uuml;rk, D. D., Tsenkova, N., Schwarzer, G., V\u0026otilde;, M. L. H., \u0026amp; Kauschke, C. (2025). Processing of Scene Grammar Inconsistencies in Children with Developmental Language Disorder Insights from Implicit and Explicit Measures. \u003cem\u003eBrain Sciences, 15\u003c/em\u003e(2). https://doi.org/10.3390/brainsci15020139 \u003c/li\u003e\n\u003cli\u003eV\u0026otilde;, M. L.-H., \u0026amp; Henderson, J. M. (2009). Does gravity matter? Effects of semantic and syntactic inconsistencies on the allocation of attention during scene perception. \u003cem\u003eJournal of Vision, 9\u003c/em\u003e(3), 24. https://doi.org/10.1167/9.3.24\u003c/li\u003e\n\u003cli\u003eV\u0026otilde;, M. L.-H., \u0026amp; Henderson, J. M. (2011). Object\u0026ndash;scene inconsistencies do not capture gaze: Evidence from the flash-preview moving-window paradigm. \u003cem\u003eAttention, Perception, \u0026amp; Psychophysics, 73\u003c/em\u003e(6), 1742\u0026ndash;1753. https://doi.org/10.3758/s13414-011-0150-6 \u003c/li\u003e\n\u003cli\u003eTurini, J., \u0026amp; V\u0026otilde;, M. L.-H. (2022). Hierarchical organization of objects in scenes is reflected in mental representations of objects. \u003cem\u003eScientific Reports, 12\u003c/em\u003e(1), 20068. https://doi.org/10.1038/s41598-022-24505-x pdf\u003c/li\u003e\n\u003cli\u003eV\u0026otilde;, M. L.-H., Boettcher, S. E., \u0026amp; Draschkow, D. (2019). Reading scenes: How scene grammar guides attention and aids perception in real-world environments. \u003cem\u003eCurrent Opinion in Psychology\u003c/em\u003e, \u003cem\u003e29\u003c/em\u003e, 205-210. https://doi.org/10.1016/j.copsyc.2019.03.009 pdf\u003c/li\u003e\n\u003cli\u003eDe Graef, P., Christiaens, D., \u0026amp; d\u0026rsquo;Ydewalle, G. R. (1990). Perceptual effects of scene context on object identification. \u003cem\u003ePsychological Research Psychologische Forschung, 52,\u003c/em\u003e 317\u0026ndash;329.\u003c/li\u003e\n\u003cli\u003eHenderson, J. M., Weeks, P. A., Jr., \u0026amp; Hollingworth, A. (1999). The effects of semantic consistency on eye movements during complex scene viewing. \u003cem\u003eJournal of Experimental Psychology: Human Perception and Performance, 25\u003c/em\u003e, 210\u0026ndash;228\u003c/li\u003e\n\u003cli\u003e\u0026Ouml;hlschl\u0026auml;ger, S., \u0026amp; V\u0026otilde;, M. L.-H. (2017). SCEGRAM: An image database for semantic and syntactic inconsistencies in scenes. \u003cem\u003eBehavior research method\u003c/em\u003es,\u003cem\u003e 49\u003c/em\u003e(5), 1780-1791. https://doi.org/10.3758/s13428-016-0820-3 pdf\u003c/li\u003e\n\u003cli\u003e\u0026Ouml;hlschl\u0026auml;ger, S., \u0026amp; V\u0026otilde;, M. L.-H. (2020). Development of scene knowledge: Evidence from explicit and implicit scene knowledge measures. \u003cem\u003eJournal of experimental child psychology\u003c/em\u003e, \u003cem\u003e194\u003c/em\u003e, 104782. https://doi.org/10.1016/j.jecp.2019.104782 pdf\u003c/li\u003e\n\u003cli\u003eLauer, T., \u0026amp; V\u0026otilde;, M. L.-H. (2022). The ingredients of scenes that affect object search and perception. In B. Ionescu, W. A. Bainbridge, \u0026amp; N. Murray (Eds\u003cem\u003e.), Human perception of visual information\u003c/em\u003e (pp. 1\u0026ndash;32). Springer. https://doi.org/10.1007/978-3-030-81465-6_1\u003c/li\u003e\n\u003cli\u003eHelo, A., van Ommen, S., Pannasch, S., Danteny-Dordoigne, L., \u0026amp; R\u0026auml;m\u0026auml;, P. (2017). Influence of semantic consistency and perceptual features on visual attention during scene viewing in toddlers. \u003cem\u003eInfant behavior \u0026amp; development\u003c/em\u003e, \u003cem\u003e49\u003c/em\u003e, 248\u0026ndash;266. https://doi.org/10.1016/j.infbeh.2017.09.008\u003c/li\u003e\n\u003cli\u003eMaffongelli, L., \u0026Ouml;hlschl\u0026auml;ger, S., \u0026amp; V\u0026otilde;, M. L.-H. (2020). The development of scene semantics: First ERP indications for the processing of semantic object-scene inconsistencies in 24-month olds. \u003cem\u003eCollabra: Psychology\u003c/em\u003e, 6(1), 17707, 1-8. https://doi.org/10.1525/collabra.17707 pdf\u003c/li\u003e\n\u003cli\u003eHelo, A., Guerra, E., Coloma, C. J., Aravena-Bravo, P., \u0026amp; R\u0026auml;m\u0026auml;, P. (2022). Do Children With Developmental Language Disorder Activate Scene Knowledge to Guide Visual Attention? Effect of Object-Scene Inconsistencies on Gaze Allocation. \u003cem\u003eFrontiers in psychology\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e, 796459. https://doi.org/10.3389/fpsyg.2021.796459\u003c/li\u003e\n\u003cli\u003eHolyoak, K. J., Junn, E. N., \u0026amp; Billman, D. (1984). Development of analogical problem-solving skill. \u003cem\u003eChild Development, 55\u003c/em\u003e, 2042\u0026ndash;2055.\u003c/li\u003e\n\u003cli\u003eGentner, D., \u0026amp; Medina, J. (1998). Similarity and the development of rules. \u003cem\u003eCognition\u003c/em\u003e, \u003cem\u003e65\u003c/em\u003e(2-3), 263-297.\u003c/li\u003e\n\u003cli\u003eHuang-Pollock, C. L., Maddox, W. T., \u0026amp; Karalunas, S. L. (2011). Development of implicit and explicit category learning. \u003cem\u003eJournal of experimental child psychology\u003c/em\u003e, \u003cem\u003e109\u003c/em\u003e(3), 321-335.\u003c/li\u003e\n\u003cli\u003eRabi, R., \u0026amp; Minda, J. P. (2014). Perceptual category learning: Similarity and differences between children and adults. In \u003cem\u003eProceedings of the Annual Meeting of the Cognitive Science Society\u003c/em\u003e (Vol. 36, No. 36).\u003c/li\u003e\n\u003cli\u003eHammer, R., Diesendruck, G., Weinshall, D., \u0026amp; Hochstein, S. (2009). The development of category learning strategies: What makes the difference?. \u003cem\u003eCognition\u003c/em\u003e, \u003cem\u003e112\u003c/em\u003e(1), 105-119.\u003c/li\u003e\n\u003cli\u003eLinardos, A., K\u0026uuml;mmerer, M., Press, O., \u0026amp; Bethge, M. (2021, October). \u003cem\u003eDeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling\u003c/em\u003e. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. https://doi.org/10.1109/ICCV48922.2021.01268\u003c/li\u003e\n\u003cli\u003eTullis, T., \u0026amp; Albert, B. (2008). Performance metrics. In T. Tullis \u0026amp; B. Albert (Eds.), \u003cem\u003eMeasuring the user experience\u003c/em\u003e (pp. 63\u0026ndash;97). Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-373558-4.00004-2\u003c/li\u003e\n\u003cli\u003eMohr, J., Seyfarth, J., Lueschow, A., Weber, J. E., Wichmann, F. A., \u0026amp; Obermayer, K. (2016). BOiS\u0026mdash;Berlin object in scene database: Controlled photographic images for visual search experiments with quantified contextual priors. \u003cem\u003eFrontiers in Psychology\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e, 749.\u003c/li\u003e\n\u003cli\u003eGreene, M. R. (2013). Statistics of high-level scene context. \u003cem\u003eFrontiers in Psychology, 4.\u003c/em\u003e https://doi.org/10.3389/fpsyg.2013.00777\u003c/li\u003e\n\u003cli\u003eB\u0026uuml;ttner, G., Dacheneder, W., Schneider, W., \u0026amp; Hasselhorn, M. (2021). \u003cem\u003eFrostigs Entwicklungstest der visuellen Wahrnehmung\u0026ndash;3\u003c/em\u003e (1st ed.). Pearson.\u003c/li\u003e\n\u003cli\u003eBulheller, S., \u0026amp; Hacker, H. (2001). \u003cem\u003eCPM \u0026ndash; Colored Progressive Matrices\u003c/em\u003e (3rd ed.). Pearson.\u003c/li\u003e\n\u003cli\u003eBates, D., M\u0026auml;chler, M., Bolker, B. M., \u0026amp; Walker, S. C. (2015). Fitting linear mixed-effects models using lme4. \u003cem\u003eJournal of Statistical Software, 67\u003c/em\u003e(1), 1\u0026ndash;48. \u003c/li\u003e\n\u003cli\u003eKuznetsova, A., Brockhoff, P. B., \u0026amp; Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. \u003cem\u003eJournal of Statistical Software, 82\u003c/em\u003e(13). https://doi.org/10.18637/jss.v082.i13.\u003c/li\u003e\n\u003cli\u003eTatler, B. W., \u0026amp; Vincent, B. T. (2008). Systematic tendencies in scene viewing. \u003cem\u003eJournal of Eye Movement Research, 2\u003c/em\u003e(2), 1\u0026ndash;18.\u003c/li\u003e\n\u003cli\u003eParkhurst, D., Law, K., \u0026amp; Niebur, E. (2002). Modelling the role of salience in the allocation of visual selective attention. \u003cem\u003eVision Research, 42\u003c/em\u003e(1), 107\u0026ndash;123.\u003c/li\u003e\n\u003cli\u003eBauer, P. J., \u0026amp; Mandler, J. M. (1989). Taxonomies and triads: Conceptual organization in one-to two-year-olds. \u003cem\u003eCognitive Psychology\u003c/em\u003e, \u003cem\u003e21\u003c/em\u003e(2), 156-184.\u003c/li\u003e\n\u003cli\u003eDavidson, D., Rainey, V. R., Vanegas, S. B., \u0026amp; Hilvert, E. (2018). The effects of type of instruction, animacy cues, and dimensionality of objects on the shape bias in 3‐to 6‐year‐old children. \u003cem\u003eInfant and Child Development\u003c/em\u003e, \u003cem\u003e27\u003c/em\u003e(1), e2044.\u003c/li\u003e\n\u003cli\u003eFreund, L. S., Baker, L., \u0026amp; Sonnenschein, S. (1990). Developmental changes in strategic approaches to classification. \u003cem\u003eJournal of Experimental Child Psychology, 49\u003c/em\u003e, 343\u0026ndash;362.\u003c/li\u003e\n\u003cli\u003eRatner, H. H., \u0026amp; Myers, N. A. (1981). Long-term memory and retrieval at ages 2, 3, 4. \u003cem\u003eJournal of Experimental Child Psychology, 31\u003c/em\u003e, 365\u0026ndash;386.\u003c/li\u003e\n\u003cli\u003ePomiechowska, B., \u0026amp; Gliga, T. (2019). Lexical acquisition through category matching: 12-month-old infants associate words to visual categories. \u003cem\u003ePsychological Science\u003c/em\u003e, \u003cem\u003e30\u003c/em\u003e(2), 288-299.\u003c/li\u003e\n\u003cli\u003eWaxman, S. R., \u0026amp; Braun, I. (2005). Consistent (but not variable) names as invitations to form object categories: New evidence from 12-month-old infants. \u003cem\u003eCognition\u003c/em\u003e, \u003cem\u003e95\u003c/em\u003e(3), B59-B68.\u003c/li\u003e\n\u003cli\u003eBooth, A. E., \u0026amp; Waxman, S. R. (2002). Object names and object functions serve as cues to categories for infants. \u003cem\u003eDevelopmental Psychology, 38\u003c/em\u003e(6), 948\u0026ndash;957.\u003c/li\u003e\n\u003cli\u003eQuinn, P.C. (2004). Spatial representation by young infants: Categorization of spatial relations or sensitivity to a crossing primitive? \u003cem\u003eMemory \u0026amp; Cognition, 32\u003c/em\u003e, 852-861.\u003c/li\u003e\n\u003cli\u003ePlumert, J. M., \u0026amp; Strahan, D. (1997). Relations between task structure and developmental changes in children\u0026apos;s use of spatial clustering strategies. \u003cem\u003eBritish Journal of Developmental Psychology\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(4), 495-514.\u003c/li\u003e\n\u003cli\u003eRichland, L. E., Morrison, R. G., \u0026amp; Holyoak, K. J. (2006). Children\u0026rsquo;s development of analogical reasoning: Insights from scene analogy problems. \u003cem\u003eJournal of experimental child psychology\u003c/em\u003e, \u003cem\u003e94\u003c/em\u003e(3), 249-273.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":true,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"eye movements, scene grammar, semantics, syntax, cognitive development","lastPublishedDoi":"10.21203/rs.3.rs-6861119/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6861119/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eHumans develop \u003cem\u003esemantic\u003c/em\u003e and \u003cem\u003esyntactic\u003c/em\u003e expectations about \u003cem\u003ewhat\u003c/em\u003e objects typically appear \u003cem\u003ewhere\u003c/em\u003e in everyday scenes. This study examined how children aged 6 to 10 process such scene-grammatical rules and how this ability develops. We assessed scene knowledge implicitly using two eye-tracking tasks: a free viewing and a visual search task featuring scenes with either consistent object placements or semantic/syntactic violations. We also measured explicit knowledge by asking children to furnish a dollhouse. Results showed that children looked less at consistent objects in the free viewing task. Our visual search task further revealed earlier fixations and faster reaction times to consistent objects. These results replicate previous findings in adults indicating more efficient processing and stronger expectations for objects placed consistent to scene grammar. Additionally, children who were more sensitive to syntactic violations in images showed greater accuracy in the dollhouse task. Scene knowledge grew more robust with age, as evidenced by shorter dwell times in the free viewing task and earlier fixations and faster responses to consistent objects in the visual search task. Together, these findings highlight the ongoing development of scene grammar in children and offer new insights into how implicit and explicit measures can tap into children\u0026rsquo;s visual cognition.\u003c/p\u003e","manuscriptTitle":"Where Things Belong: The Development of Scene Knowledge in Childhood","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-25 06:23:01","doi":"10.21203/rs.3.rs-6861119/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"0916496c-73d4-4fbd-a668-aa8957869dff","owner":[],"postedDate":"June 25th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":50245573,"name":"Biological sciences/Psychology"},{"id":50245574,"name":"Biological sciences/Psychology/Human behaviour"}],"tags":[],"updatedAt":"2025-08-11T10:08:13+00:00","versionOfRecord":[],"versionCreatedAt":"2025-06-25 06:23:01","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6861119","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6861119","identity":"rs-6861119","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00