A Meta-Analysis of the Effects of Teacher Professional Development in STEM Education: What Have We Done, and Where Are We Going?

doi:10.21203/rs.3.rs-6602739/v1

A Meta-Analysis of the Effects of Teacher Professional Development in STEM Education: What Have We Done, and Where Are We Going?

2025 · doi:10.21203/rs.3.rs-6602739/v1

preprint OA: closed

Full text JSON View at publisher

Full text 207,251 characters · extracted from preprint-html · click to expand

A Meta-Analysis of the Effects of Teacher Professional Development in STEM Education: What Have We Done, and Where Are We Going? | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Meta-Analysis of the Effects of Teacher Professional Development in STEM Education: What Have We Done, and Where Are We Going? Hyesun You, Sunyoung Park, Minju Hong This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6602739/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This study includes meta-analytic evidence from 118 studies published 2010–2022 demonstrating that professional development (PD) programs for science, engineering, technology, and math teachers effectively support teacher content knowledge and pedagogical quality and improve student academic performance. We explored the overarching impact of PD programs and analyzed how various characteristics contribute to the observed effects of these programs. To identify relevant studies, we searched four databases and focused on peer-reviewed English language journal articles available in full text. A selection of articles was made that used quantitative research designs and provided adequate data to estimate effect sizes. Subsequently, we assessed the potential risk of bias to evaluate the quality of the selected studies. The significant effect size (0.739, 95%CI [0.637, 0.842]) of PD programs found in our meta-analysis aligns with that of previous meta-analyses and systematic reviews that have synthesized findings on the impact of PD at the teacher and student levels. Substantial heterogeneity of the effect sizes was moderated by PD dosage hours and grade levels. The results indicated that the pooled effect size was much larger when the program length was greater than 80 hours. Additionally, if a study focused on a specific grade level, the magnitude of the effectiveness of PD was larger than that in a combination of different grade-level groups. By using aggregations of research findings across various studies, the present overview will aid educators and education policy communities in better understanding PD research and how it can be designed and implemented most effectively. professional development STEM teachers K–12 meta-analysis Figures Figure 1 Figure 2 Figure 3 Introduction Professional development (PD) functions as a catalyst for driving change in education and as a strategic approach to empowering teachers to enhance their professional knowledge and teaching practices (Guskey, 2002). School districts, funding agencies, and policymakers continue to invest considerable resources in high-quality PD programs, aiming to improve teacher effectiveness and student achievement; however, mixed findings exist regarding the effectiveness of these programs (Darling-Hammond et al., 2017). PD programs do not consistently lead to successful learning outcomes in the classroom, despite their intent to do so, because of challenges in their design and implementation (Van Driel et al., 2012, Webster-Wright, 2009). The science, technology, engineering, and mathematics (STEM) education fields are making significant efforts to establish high-quality PD programs based on educational policy and practice. These PD programs aim to enhance STEM educators’ teaching skills, address challenges, foster innovation, promote collaboration, and ultimately improve STEM instruction and student learning. In conjunction with these efforts, it is essential to investigate the elements of effective PD and the extent to which PD programs in STEM education are effective. The literature on teacher effectiveness has provided a logic model demonstrating a link between PD intervention and student outcomes (e.g., academic achievement, motivation) in that they are mediated by teacher outcomes (e.g., teacher knowledge, instructional practices). Darling-Hammond et al. (2017) defined effective PD as structured career learning that results in changes in teacher practices and improvements in student learning outcomes. This definition implies a program’s effectiveness can be evaluated either by teachers’ outcomes alone or by a combination of teacher and student outcomes. Particularly, high-quality PD programs in STEM education literature emphasize improvement in teaching practices and use varied measurable factors to reveal PD effectiveness, including teachers’ content knowledge (CK; Kelcey et al., 2017; Kleickmann et al., 2016); dispositions toward teaching (i.e., affective factors such as confidence, beliefs, and self-efficacy; Hayes et al., 2019; Nadelson et al., 2015); instructional practices (Knight et al., 2014; Maeng et al., 2020); and student outcomes (Lynch et al., 2019; Robinson et al., 2014; Yoon et al., 2017). Researchers have examined the effects of teacher development initiatives for STEM teachers and identified the core features that should be considered in PD. However, few recent meta-analysis studies have addressed the comprehensive effectiveness of PD programs for K–12 STEM teachers. Lynch et al. (2019) conducted a meta-analysis to identify the STEM PD program content, activities, and formats related to specific student outcomes. They presented results from 95 experimental and quasi-experimental preK–12 STEM PD programs published between January 2004 and March 2016 and found a pooled effect size of +0.21 standard deviations (95%CI [0.12, 0.28]). This is smaller than the effect sizes found in other meta-analyses. For example, Yoon et al. (2007) showed 0.51 SD in science and 0.57 SD in math through studies published between 1986 and 2006. Gonzalez et al. (2022) also studied PD interventions’ impacts on both teacher knowledge and classroom instruction through 37 published experimental studies of preK–12 STEM education for 3 decades and how these outcomes affect student achievement. Teachers who participated in STEM classroom interventions showed improvements in CK, Pedagogical Content Knowledge (PCK), and classroom instruction, with a pooled average impact estimate of +0.56 SD. Programs with more significant impacts on teacher practice yielded more prominent effects on student achievement, on average. Kuehnert et al. (2019) focused on the effects of reform-based PD on the PCK of K–12 math and science teachers. Their findings revealed a robust connection between PD and PCK; the effect size was significant ( d = 0.51, p < .001). Notably, the authors did not observe significant variation in study characteristics. Recent meta-analyses have evaluated the efficacy of PD by exclusively focusing on either the outcomes for teachers or those for students. These outcomes have been highly specific, and the studies considered for inclusion have been limited in number. Moreover, the meta-analysis studies have typically focused on mathematics and science teachers. Our ongoing study encompasses a distinct scope and poses research questions that diverged from those addressed in the previous STEM PD meta-analyses. In the current study, we explored the effects of diverse PD outcomes, including cognitive and affective outcomes of teachers and students, thereby broadening the general understanding of the effects of PD. Furthermore, we examined the influence of potential moderators on PD outcomes: structural features of PD (e.g., duration, dosage), contextual factors of PD (e.g., funded vs. nonfunded PD, country), and process features (e.g., active learning, reflection opportunity). We considered the core characteristics of PD programs shown in the literature as moderators through which to investigate our hypothesis regarding whether the variation in effect size among studies is associated with differences in PD characteristics. Our study not only adds novel findings to the existing literature on the quality and significance of teacher PD programs but also provides insight into evidence concerning the design and implementation of teacher PD, specifically, through exploring mechanisms by which PD programs in STEM education influence teacher knowledge, teacher practices, and student achievement. Literature Review Teacher Change Model The theoretical framework of our study was guided by the principle that the positive effects of PD interventions can strengthen teacher instruction—and therefore students’ academic performance levels. Desimone ( 2009 ) proposed a comprehensive framework of how PD influences both teachers’ instructional practice and student learning. The core theory of action includes four main steps: teachers participate in a PD program; teachers’ participation increases their knowledge and skills, changes their attitudes and beliefs, or both; given their new knowledge and skills (or attitudes and beliefs), teachers improve their classroom instruction through changes in content, pedagogy, or both; and the changes in teachers’ instructional practices promote student learning. In this framework, personal factors for both students and teachers—along with contextual factors such as school characteristics, curriculum, and policies—can mediate the influence of PD on teacher and student development. Figure 1 displays the simplified linear path model of the four steps whereby the conceptual model links the critical features of PD and increases in student achievement via the effects of intervening variables such as teacher knowledge and instructional practice. We constructed this model based on Desimone’s conceptual PD model (2009) and existing literature reviews of studies on PD effects (e.g., Egert et al., 2018 ; Yoon et al., 2007 ). This adapted model allowed us to identify measurable PD outcomes, determine significantly influencing PD features, and explore PD effectiveness. INSERT FIGURE 1 HERE Core Characteristics of High-Quality PD Researchers have placed significant emphasis on identifying the key components of effective PD and assessing the impact of these components on teacher knowledge, skills, and practices. We conducted a targeted search of PD studies, specifically focusing on key features that have a significant impact on PD outcomes. We systematically coded these features to categorize and label them. After reaching a point of saturation, we identified a total of nine essential features: content focus, active learning, coherence with goals, collective participation, using models of effective practice, coaching and expert support, feedback and reflection, duration, dosage, and technology integration. Content Focus Content comprises the knowledge and skills that teachers should possess to enhance their professional practices (Guskey & Sparks, 1996 ). Scholars have argued that PD programs that provide knowledge of content, pedagogical knowledge, and alternative teaching practices are more effective than those that do not. Furthermore, Garet et al. ( 2016 ), Desimone ( 2011 ), and Darling-Hammond and Richardson ( 2009 ) highlighted teachers improving their knowledge of the subjects they teach and possessing adequate skills to communicate this knowledge to students as an effective characteristic of PD programs. For example, Garet et al. ( 2016 ) examined the impact of content-focused PD on teachers’ practices and CK and provided evidence that PD for improving teachers’ content knowledge and content-specific pedagogy significantly improved teachers’ knowledge for 1 year. This study indicated a potential approach for concentrating PD efforts on enhancing knowledge and practice outcomes. Active Learning As a key component to making PD effective, many scholars have agreed that providing teachers with opportunities to participate in activities aligned with active learning is critical (e.g., Birman et al., 2000 ; Van Veen et al., 2012 ). A teacher’s active engagement in PD significantly affects teachers’ knowledge, teaching practices, and student learning (Garet et al., 2001 ; Loucks-Horsley et al., 2009 ; Marra et al., 2011 ; Whitworth et al., 2018 ). Active learning within the PD context could include diverse activities such as doing simulated practices, developing lesson plans and rubrics, and leading argumentation (Birman et al., 2000 ). Engaging teachers as active participants in PD can help them understand the importance of students’ active learning. Coherence With Goals A systematic review by Lindvall and Ryve ( 2019 ) identified three distinct ways coherence has been conceptualized in PD (2019). External coherence indicates coherence with elements outside the PD program itself, such as the curriculum, standards, assessments, or teachers’ needs and knowledge. Research has shown a positive correlation between teacher development and PD programs that are coherent with local curriculum standards and benchmarks (Birman et al., 2000 ; Desimone, 2009 ). When PD aligns with the curriculum, it helps teachers better understand how to implement instructional practices. Additionally, when PD activities mirror national, state, or district-wide reforms, it is more likely that what teachers learn in PD directly translates into improved student achievement. Internal coherence emphasizes that different aspects of the PD program, including activities, content, and resources, should be self-consistent. Created coherence highlights a function of PD that can draw connections between different aspects of teachers’ work rather than merely reinforcing existing coherence. Desimone also defined coherence as a unified concept, describing it as "the extent to which teacher learning aligns with teachers' knowledge and beliefs" (p. 184). She argued that such coherence promotes a deeper integration of new ideas, as teachers are more inclined to adopt changes that resonate with their established pedagogical frameworks. Collective Participation Darling-Hammond et al. ( 2017 ) and Desimone ( 2011 ) highlighted that PD provides opportunities for teachers to work actively and collaboratively, sharing their ideas, voices, experiences, and resources. Wayne et al. ( 2008 ) indicated that PD is effective for influencing teacher learning and teaching practice if teachers from the same school, department, content area, or grade level participate collectively. The collaborative environment among teachers from similar settings can increase their understanding of diverse perspectives in teaching and empower their growth as educators (Garet et al., 2001 ). Using Models of Effective Practice PD allows teachers to envision best teaching practices as they incorporate curricula or instructional models deemed effective. These models can include exemplary lesson plans, unit plans, sample student work, and recorded or written teaching practices (Darling-Hammond et al., 2017 ). When provided with useful models, teachers tend to be open to modifications to their teaching (Levitt, 2002 ). Coaching and Expert Support Many researchers have found that coaching by experts positively affects teachers’ knowledge, attitudes, and practices. The ongoing nature of coaching provided sustained and personalized support that helped teachers incorporate new teaching methods into lesson planning, along with useful instructional feedback (e.g., Desimone & Pak, 2017 ; Garet et al., 2001 ; Liao, 2018 ). Feedback Feedback and suggestions from PD facilitators and peers can ensure that teachers receive relevant guidance directly applicable to their classrooms and aligned with PD objectives. Continuous evaluation and timely feedback not only help in refining the PD process but also support teachers in implementing new strategies effectively (Desimone, 2009 ). Hattie and Timperley (2007) also emphasized that feedback is most effective when it is ongoing rather than a one-time event. Timely feedback allows teachers to reflect on their practice, make adjustments, and gradually build their expertise in new teaching strategies. Incorporating feedback mechanisms into PD programs also fosters a culture of continuous improvement among teachers. By regularly receiving feedback, teachers are more likely to engage in reflective practice, which is a key component of professional growth. The feedback that teachers receive as part of PD can help identify gaps in implementation and areas where additional support may be needed. This targeted support can address challenges that teachers face in adopting new strategies, thereby increasing the likelihood of successful implementation and positive student outcomes (Darling-Hammond et al., 2017 ). Duration and Dosage A sufficient time span and appropriate amount of time spent on PD programs would help teachers learn new information and techniques and determine how such techniques can evolve or be adapted from existing teaching practices (Darling-Hammond et al., 2017 ). Scholars have argued that both the duration and dosage of PD are significant factors in eliciting meaningful learning experiences from participating teachers (e.g., Hill et al., 2013 ; Wayne et al., 2008 ). Duration refers to the length of time over which the PD program is implemented, whereas dosage pertains to the intensity of the PD activities experienced by the teachers (Kennedy, 2016 ; Van Driel et al., 2001 ). Generally, scholars have argued that a longer duration allows for sustained engagement and deepening of understanding, and a sufficient dosage ensures that teachers receive enough exposure to new knowledge, skills, and strategies for change in their teaching practices. However, the appropriate duration and dosage for PD differs depending on its goal and context, as shown in diverse studies. Garet et al. ( 2001 ) indicated that PD is more effective when it is sustained over time and includes a substantial number of contact hours. Desimone ( 2011 ) recommended that PD activity should be spread over a semester, including at least 20 hours of contact time either over the course of several months or through intensive experiences (e.g., summer workshops). Furthermore, Yoon et al. ( 2007 ) reported that an average of 49 hours is a substantial PD duration for a single program to positively affect student achievement. Banilower et al. ( 2006 ), however, suggested that this number should be 100 + hours, including the activity length, the time spent working on and implementing the activity, and the time for administrators to provide feedback to teachers. Ramey et al. ( 2011 ) conducted an intervention aimed at revealing the impact of different durations of PD while maintaining the same dosage. Teachers participated in 120 hours of in-classroom coaching and were randomly assigned to either an immersion condition group, which involved 20 full days spread over 5 weeks, or a low-density condition group, receiving 1 full day (8 hours) of PD per week over 20 weeks. The immersion condition’s classrooms showed modest improvements across the school year, but the low-density conditions either remained stagnant or saw a decline in academic performance. Technology Integration Technology integration into learning has become a potent force for change, reshaping the way students engage with education (Lawless & Pellegrino, 2007 ). This transformation has been particularly pronounced in the realm of STEM education and has been extended to teachers’ PD (Polgampala et al., 2017 ). By implementing technology-enabled PD, STEM teachers can access tailored learning experiences and experiment with innovative teaching methods to captivate and engage students through interactive and dynamic experiences (Authors, 2021 ). Scholars have indicated that effective technology integration engages teachers through interactive and dynamic experiences. Technology-integrated PD allows teachers to become familiar with meaningfully applying technology and offering engaging and hands-on activities in a student-centered learning environment (Barak & Assal, 2018 ). Virtual simulations and robotics applications provide immersive environments. This experiential learning approach enhances teachers’ confidence in adopting novel instructional strategies and unlocks the potential to promote STEM learning through active and coherent teaching practices (Authors, 2021 ; Yildirim et al., 2020 ). The meaning of technology integration can be interpreted in different ways. We defined technology integration as it applies to science, math, and engineering education. Funding Developing and implementing PD initiatives requires sufficient financial backing to address various crucial elements within these programs. Adequate funding supports essential aspects, including the creation of training materials and resources and compensation for instructors and participating teachers. Funds for PD seem to influence other structural facets of programs, such as duration (Hayes et al., 2020; McNaull 2014). However, despite the acknowledgement of funding as a critical factor, empirical evidence regarding its direct impact on educational outcomes within PD remains limited. While this study will not delve into the intricate heterogeneity characterizing the relationship between financial resources and outcomes, it does shed light on the association between funding availability and PD outcomes. Expanding our understanding of the interplay between funding and PD outcomes is imperative for informed decision-making and effective resource allocation in PD. By recognizing the connections between financial resources and programmatic outcomes, stakeholders and policymakers can better strategize and optimize their investment in PD endeavors. PD Outcomes Influencing Educational Changes Outcome measures of PD discussed in the literature may be indicators for determining the effectiveness of PD. Desimone’s (2009) model illustrated “interactive, nonrecursive relationships between the critical features of PD, teacher knowledge and beliefs, classroom practices, and student outcomes” (p. 184). Guskey (2000) provided a multilevel framework demonstrating PD’s outcome variables, including teachers’ knowledge and skills, application of knowledge, and student learning, and noted that these outcomes can be used for evaluating PD. Similarly, several scholars of the effects of PD on teachers, students, or both have reported multiple outcomes measured by diverse instruments and methods. Through performing a comprehensive literature search and using Desimone’s conceptual PD model, we sorted PD outcomes into four categories: (a) teachers’ content and/or pedagogical knowledge; (b) teachers’ affective domains; (c) classroom practices; and (d) students’ outcomes. Teachers’ Cognitive Components As part of the evaluation purpose of PD, teachers are often assessed to determine the extent to which their PD participation led to increased knowledge of the content they teach and PCK, including teaching and learning strategies appropriate to the content they teach. Teachers require a certain degree of CK to teach their subjects effectively and effectively implement their knowledge and skill sets. PCK (Shulman, 1986) refers to the interconnected domains of teacher knowledge; PCK is often discussed as existing on a continuum wherein teachers acquire more PCK through appropriate training and experience. Thus, most PD programs emphasize improving teacher knowledge and applying new knowledge and skills to their classroom practices. For example, Clary et al. (2017) investigated science knowledge retention at the Teacher Academy in the Natural Sciences (TANS) beyond a specific instructional year. The TANS targeted middle school science teachers of chemistry, geosciences, and physics; the study’s goals were to increase teacher CK and enhance student learning outcomes. Roth et al. (2019) studied the influences of two PD programs, Science Teacher Learning from Lesson Analysis (STeLLA) and Content Deeping (CD), on upper-elementary teachers’ knowledge. The authors hypothesized and confirmed that the STeLLA program would improve science teachers’ CK, PCK, and classroom practice more than the CD program, which focused only on CK. Researchers have acknowledged the importance of measuring teachers’ knowledge. However, considering only cognitive factors is insufficient for PD evaluation and underestimates the significance of the changes teachers make. Many scholars (Desimone, 2009; Guskey, 2000) have argued that affective measures such as teacher perception and attitude toward PD are key to evaluating PD outcomes. Teachers’ Affective Components Researchers must consider affective factors when evaluating the impact of PD. Affective components include the feelings, moods, and emotions people have experienced in a certain situation. Within PD programs, affective components—such as a teacher’s attitudes, beliefs, self-efficacy, and confidence—are measured to identify the effects of scaffolding and treatment. Based on affective–cognitive consistency theory, these affective components may change with the provision of new information, and this relationship further moderates the nature of teacher behaviors (Millar & Tesser, 1989). Authors (2021) designed and developed a PD program to help middle school teachers effectively integrate robotics in science and mathematics classrooms. The primary goal of the PD program was for teachers to increase their confidence and self-efficacy in robotics-integrated teaching, thereby enhancing their ability to develop lesson plans infused with robotics activities. Similarly, Hawley and Sinatra (2019) aimed to reduce the perceived conflict and negative emotions surrounding the relationship between faith and science for Christian teachers through a PD workshop. In the study, teachers reported decreases in misconceptions and increases in self-efficacy and positive emotions related to teaching evolution. Teaching Practices A wealth of literature supported the impartment of a PD model on the quality of teachers’ instruction (e.g., Chitpin, 2011; Fishman et al., 2017). Desimone et al. demonstrated the positive impact of PD by focusing on the specific instructional practices and teachers’ use of those practices in the classroom. The authors used three instructional methods in the PD—technology use, high-order instructional methods (e.g., working on interdisciplinary lessons), and alternative assessments—which elicited changes in instructional practice. Lotter et al. (2018) demonstrated a positive effect on teachers’ beliefs and self-efficacy of inquiry as well as the implementation of practice under a PD model. Specifically, the electronic quality of inquiry protocol analysis from before and after the PD demonstrated a significant increase in teachers’ quality of inquiry instruction. As such, teacher participation in PD may be directly linked with the successful implementation of new teaching practices. Additionally, connecting PD to teacher practice may encourage teachers to think differently about their instruction and modify their teaching practices. Student Outcomes The ultimate goal of most PD programs is to empower students by fostering teacher growth and learning. Diverse PD studies have demonstrated the significant impact of professional guidance on students’ academic performance. As Darling-Hammond (1998) asserted, teachers teach what they know, and teacher knowledge exerts a substantial positive influence on student learning. For example, Yoon et al. (2017) reported the findings of a 2-year PD study that revealed the professional guidance provided by the PD program resulted in a significantly enhanced classroom learning experience and improved student CK. Similarly, Furtak et al. (2018) documented that the learning outcomes of the majority of students observed over a 4-year period progressed from lower to upper benchmarks across various learning progressions. It is worth noting that few scholars have focused on evaluating the effectiveness of PD through its impact on student outcomes in K–12 STEM education. Study Purpose Many studies on the effectiveness of PD have confined themselves to the perspective of either teachers or students and have not considered teacher and student outcomes simultaneously. Additionally, most PD metastudies have focused on specific academic subjects (e.g., only science) or grade levels (e.g., only elementary-grade students). By contrast, the current study incorporated diverse, measurable student and teacher outcomes to adequately determine the quality of PD programs. We have included a wide range of PD studies in K–12 STEM education to generate generalizable results and identify common practices or models that contribute to the effectiveness of PD. Therefore, our meta-analysis covers a broad range of results synthesized from individual studies published over the past decade concerning the impact of PD in STEM education. We also investigate whether and which characteristics of PD programs might moderate these effects. We selected the moderators through a literature review and theoretical rationale. Published metastudies related to PD programs, existing teacher change models, and the core characteristics affecting PD programs from prior research laid the foundation for the selection of moderators. We sought to address the following two primary research questions: What are the effects of PD programs for STEM teachers in studies between 2010 and 2022? What characteristics of PD programs (e.g., grade level, STEM subject, race/ethnicity, gender, funding) explain the degree of their effects in PD studies between 2010 and 2022? Methods Search Procedures In this review, we examined experimental studies of PD for K–12 teachers teaching STEM subjects. We conducted an electronic search using four databases (ERIC, PsycINFO, Web of Science, and EduSource) for PD program studies published January 2010–May 2022. Search terms for title and abstract included the following: [“PD” or “professional development”] AND [“teacher”] AND [“STEM” or “STEAM” or “science” or “math*” or “technology” or “engineering”]. Three researchers reviewed an initial pool of 1,121 studies, comprising 296 studies from PsycINFO, 109 studies from ERIC, 324 studies from Web of Science, and 392 studies from EduSource. After removing 30 duplicated studies based on the titles, authors, and journal information, 1,091 studies remained based on their titles or abstract content. Of those, 302 were irrelevant to PD effectiveness, and we removed them based on a closer review of the full text. Additionally, we excluded 45 studies related to other grade levels, such as early childhood or higher education, and 543 qualitative studies. Of the remaining 201 studies, 83 failed to provide sufficient statistical data, such as mean, standard deviation, and sample size, which are essential for estimating effect sizes. After removing those studies, we were left with 118 articles, from which we initially calculated 946 effect sizes. However, we subsequently excluded eight effect sizes from five studies (Greene et al., 2013; Kim et al., 2012; Malanson et al., 2014; Robinson et al., 2014; Tobin et al., 2012) as outliers, using the interquartile range (IQR) method (Acuna & Rodriguez, 2004; Tukey, 1977). The IQR method detects values that fall below −2.328 or exceed 3.615 as outliers. This multistep search procedure resulted in a total of 938 effect sizes from the final 118 studies for meta-analysis (see Figure 2). In the search process, we implemented a rigorous triple-review approach for each search step. This method ensured a thorough and comprehensive evaluation of the data and results. We assessed the interrater reliability of the coding using Fleiss’s (1971) generalized kappa, which is computationally versatile, accommodating a variety of coding variables and raters. Three coders collected coding information related to basic study details (e.g., title, authors, publication year, journal names), statistical data (e.g., mean, standard deviation, sample size, correlations, t -value, F -ratio), and moderators (see Table 1). INSERT FIGURE 2 HERE Inclusion and Exclusion Criteria Types of Interventions The included interventions were K–12 STEM teachers’ PD. We focused on studies that addressed how PD programs have been implemented and the effects of PD programs on teacher quality (e.g., self-efficacy, CK, teaching practice) and student learning outcomes. We omitted studies on early childhood education teachers (e.g., Roman, 2019; Tsamir et al., 2014); practice/assistant/preservice teachers (e.g., Berisha & Vula, 2021; Karisan et al., 2019); and professors, instructors in higher education, or both (e.g., Derting et al., 2016). Participants The participants included a diverse group of teachers and their students, focusing specifically on the K–12 educational level. These teachers taught STEM subjects. We therefore excluded early childhood and preschool teachers, preservice teachers, and faculty and instructors in higher education. Study Design We included studies that used a quantitative research design. Specifically, we sought studies that reported sufficient statistics, enabling us to calculate the effect sizes of overall selected studies. Notably, the majority of the studies included in our analysis involved comparisons between control and treatment groups or examinations of sample groups through the predesign and postdesign. We excluded studies that adopted qualitative research approaches such as interviews or narrative research. Intervention Year and Settings We included studies of interventions conducted 2010–2022. Our aim was to gain insights into PD programs over the past decade. We included full-text peer-reviewed journal articles and excluded conference proceedings, grant proposals, reports, book reviews, dissertations, and theses. Because of language constraints, we included only articles published in English. Coding and PD Characteristics We created a spreadsheet to record the pool of potential studies and collect information from each study for the meta-analysis. The coding sheet included ID, sample size, methodology (i.e., experimental or nonexperimental design with pretests and posttests of treatment group), and quantitative data to calculate effect sizes, gender, race, funding, country, target sample (i.e., teacher, student, or both), grade level, technology use, PD dosage (i.e., the number of contact hours teachers spent in PD experiences), PD length (i.e., the time span over which those hours were spread), targeting outcomes in PD (i.e., a cognitive or affective factor), the main subject of the teachers participating in PD program, measurement tool, and type of PD treatment. We used these characteristics of PD programs as moderators to reveal how the characteristics can explain variations in each study’s effect size. As discussed in the literature review, the literature related to PD characteristics and prior PD meta-analysis studies provided insights into the “what,” “how,” and “who” components of PD. Based on the guidelines from the Campbell Collaboration (n.d.), the first 10 reviews were coded by three independent coders, and disagreements between the coders were resolved through discussion and consensus. Then the remaining reviews were coded by three coders independently based on revised coding criteria. In the initial coding of the first 10 studies, the Fleiss kappa score was 0.610, indicating moderate agreement among the coders. The majority of disagreements occurred during the statistical data and moderators coding process, during which decisions were made regarding the reference group, determining the accurate final sample size, and naming the precise measure. To resolve these discrepancies, criteria were clarified based on the observed disagreements. The coding results showed Fleiss kappa score of 0.81, indicating high agreement (Viera & Garrett, 2005). Risk of Bias We evaluated the risk of bias that could have an impact on the magnitude of effect sizes using the Cochrane Collaboration tool (Higgins et. al., 2011). The manual suggests evaluating five characteristics: selection bias (random sequence generation and allocation concealment); reporting bias (selective reporting); performance bias (blinding of participants and personnel); detection bias (blinding of outcome assessment); and attrition bias (incomplete outcome data). We also took into account the reporting of psychometrics information to identify any potential bias arising from the measures used. Regarding selection bias, we evaluated whether the studies incorporated a control or comparison group, as well as whether they indicated the use of random assignment, random sampling, or any other sampling procedure (e.g., matched design) aimed at creating equivalent groups in the sampling process. In assessing reporting bias, we compared the measures outlined in the method section with the results presented in the results section. If measures mentioned in the method section were not reported in the results section, we inferred the presence of reporting bias. In terms of performance bias, if we provided any information about intervention to the participants (e.g., informed consent), we inferred that the blinding of the trial may been violated. In terms of performance bias, if the researchers provided any information about intervention to the participants (e.g., informed consent), we inferred that the blinding of the trial may been violated. Furthermore, if the study indicated that researchers were aware of participant allocation during data analysis, we inferred the potential presence of detection bias. We checked whether the studies accurately reported their attrition by providing pre- and postsample sizes or by directly mentioning the proportion of attrition out of the total sample. Additionally, we examined any reliability test results (such as Cronbach’s alpha or inter-rater reliability) or validity test results (such as factor analysis) for psychometrics information. Statistical Analysis Data analysis consisted of four steps: estimating individual effect sizes using Hedges’ g, which included bias-corrected estimates in comparison to Cohen’s d (Cooper et al., 2019); synthesizing effect size estimates by using robust variance estimation (RVE); conducting moderator analysis of meta-regression for continuous variables and categorical variables under the RVE; and assessing publication bias. We performed all analyses using RStudio (RStudio Team, 2020). Specifically, we used the 'robumeta' package (Fisher & Tipton, 2015) for robust variance estimation (RVE) and Egger's regression, as well as the 'metafor' package (Viechtbauer, 2010) for the Trim and Fill analysis. When calculating effect sizes with repeated design (prescore and postscore), most studies did not report a correlation value between prescore and postscore, even though this information is needed to calculate effect sizes and corresponding sampling variance. Thus, as a sensitivity analysis, we used two conventional correlation values: 0.4 as a moderate level and 0.7 as a strong level. However, all results from both correlation values were not substantially different. Thus, we used one of the correlation values, 0.4, as the inputted correlation value between the prescore and postscore. In the current study, to address the dependence among multiple effect sizes from a study, we used RVE (Hedges et al., 2010) for synthesizing effect size as well as moderator analyses and ultimately obtained 938 effect sizes among 118 primary studies, which indicates that one study reported more than one effect size. In this case, the multiple effect sizes from the same study could have a dependency on one another because they are from the same sample. If we ignore dependency and use the univariate random-effect model, the estimates may be biased. Unlike standard model-based methods such as multivariate meta-analysis, RVE can address dependency by improving standard error estimation. Although the generalized least squares (GLS; Gleser & Olkin, 2009) method was originally developed to address dependency, GLS requires correlation values among dependent effect sizes that are typically not reported in primary studies. Both the multilevel model (Van den Noortgate et al., 2013) and RVE are viable options that can address the dependency. A previous study (Author, 2019) found that both the multilevel model and RVE can yield unbiased results with large sample sizes, whereas the multilevel model may experience low convergence rates with smaller sample sizes. Given the pooled effect sizes across a total sample (here, 938 effect sizes), employing a multilevel model may not present challenges. However, during moderator analyses, missing data resulted in a decrease in the sample size. To address the issue of small sample size in moderator analyses, we opted to use RVE instead of the multilevel model. Moreover, RVE does not require the correlation structure when calculating standard errors and hypothesis tests; rather, it empirically estimates the standard errors using a sandwich estimator (Author, 2019), so it is recommended for meta-analyses with dependent effect sizes. We set statistical significance at .05 when degrees of freedom associated with the moderators were > 4 and at .01 when degrees of freedom were < 4 (Tanner-Smith et al., 2016). In addition to overall effect size, we conducted a moderator analysis to investigate the effects of potential moderators on PD programs, investigating a total of 16 moderating factors: (a) target sample for PD (i.e., teacher or student); (b) country (i.e., United States or non-United States); (c) technology use (i.e., technology was used or not used); (d) funding status (i.e., funded PD or nonfunded PD); (e) outcome type (i.e., cognitive or affective); (f) dosage (i.e., ≤ 24, 25–72, or ≥ 73 hours); (g) duration (i.e., ≤ 6, 7–12, 13–23, or ≥ 24 hours); (h) targeted subject (i.e., science only, math only, technology only, or more than two subjects); and (i) grade level (i.e., K–elementary, middle school only, high school only, K–middle school, middle–high school, or K–12). In addition, we used seven core characteristics of PD as moderators: (j) active learning; (k) content-focused; (l) collective participation; (m) expert support; (n) feedback and reflection; (o) using models; and (p) coherence with goals. One example of the moderators is measurable PD outcomes found in the studies, which centered on cognitive and affective dimensions of PD, that capture the characteristics of PD models. Cognitive outcomes in PD reflect the knowledge and reasoning of both teachers and students, whereas affective outcomes encompass teachers’ and students’ feelings, emotions, and attitudes toward the targeted objectives of PD. Funding status refers to the financial resources required for conducting PD-related studies. We used categorical variables for dosage and duration due to the observed nonlinear relationship in the effectiveness of moderators. Thus, we categorized the moderators into meaningful ranges and conducted meta-regression employing dummy variables. Table 1 details this moderator information. We assessed publication bias using a trim-and-fill analysis (Duval & Tweedie, 2000) and the Egger regression analysis (Egger et al., 1997). When publication bias was detected, we used the trim-and-fill analysis to provide adjusted mean effect sizes (i.e., the estimate after including artificial effect sizes to make a symmetrical graph in the funnel plot). Egger’s test provides a statistical estimate of the degree and significance of asymmetry in the funnel plot, using a regression analysis with the effect sizes as the dependent variable and sampling variance as the independent variable. The nonsignificance of the coefficient for the sampling variance indicates that publication bias is not present; if the coefficient is significant, the value of the intercept indicates the adjusted pooled effect size (Egger et al., 1997). INSERT TABLE 1 HERE Results The 118 PD programs offered a unique combination of duration, purpose, content focus, and contextual factors highlighting the multifaceted nature of PD. In terms of duration and depth, these programs covered a wide range of training experiences, with some lasting only a few days and others extending across several years. The objectives of the programs varied; some primarily aimed at enhancing teachers’ CK, whereas others focused on improving teachers’ instructional practices or self-efficacy. Additionally, we aimed to encompass a variety of STEM-related content. Contextual factors could play a significant role in the diversity of PD. For example, when programs were conducted in different countries, the programs bring a distinct cultural and educational context. The use of technology within PD also varied, with some embracing technological tools and others not. Based on the results of the risk of bias assessment, we found that 44 studies (37.3%) incorporated a control/comparison group into their design, whereas only 34 studies (28.8%) indicated the use of a randomization process in their sampling procedure or allocation. Among those, only 27 studies (22.9% of the total) demonstrated both a control/comparison group and randomization, indicating a low risk of bias. In terms of reporting bias, all studies reported outcomes mentioned in the method section. Regarding performance bias, we observed that all studies provided intervention information to participants through informed consent; however, no study attempted to blind outcome assessment. Among the 113 studies employing pre- and posttest designs, 80 studies (70.8%) clearly reported pre- and posttest sample sizes. Among these 80 studies, 29 (36.3%) experienced attrition in the posttest sample. (The remaining 38 studies (29.2%), provided no information available regarding pre- and posttest sample sizes.) Ninety studies (76.3%) reported the reliability or validity of measures used to ensure the quality of the measures employed. In terms of publication bias analysis, the results of the trim-and-fill analysis conducted without the eight outliers mentioned earlier (resulting in 938 effect sizes) demonstrated that no effect sizes were inputted into a symmetric distribution. This suggests a low likelihood of publication bias, as depicted in Figure 3. The results remained consistent when including the eight outliers (resulting in 946 effect sizes). Additionally, Egger’s regression analysis when including the eight outliers indicated a potential presence of publication bias ( b = 2.712, SE = 0.172, p < .05), albeit with a small positive effect ( = 0.124, SE = 0.043, p < .05). The results of Egger’s regression conducted without the eight outliers demonstrated that the coefficient of the sampling variance term is statistically significant ( b = 3.030, SE = 0.716, p < .05). This indicates the possible existence of publication bias. However, the adjusted pooled effect size ( = 0.565, SE = 0.065, p < .001) remained moderately and positively significant. When we included eight effect sizes while assuming the presence of outliers, the overall effect sizes were 0.769 (SE = 0.057, p < .001) with a 95% confidence interval (0.656, 0.881). In contrast, when considering the data with only 938 effect sizes, the average overall effect size representing the effectiveness of the PD programs was g = .739 (SE = 0.052, p < .001) with a 95% confidence interval (0.637, 0.842), which is statistically significant. As a result, the inclusion of the eight effect sizes led to an increase in the effect sizes by 0.03. Consequently, the subsequent analyses used the data without the eight outliers. INSERT FIGURE 3 HERE Regarding research design, out of the 938 effect sizes, 65 were derived from studies employing an independent group design. Additionally, there were 227 effect sizes from using a repeated measure group design while controlling for preexisting confounders through an equivalent test with a control group. The remaining studies with 646 effect sizes used a repeated measure group design without any control group. To assess the impact of research design on effect sizes, we conducted a sensitivity analysis and found no significant differences based on research design. Therefore, we included all effect sizes in our dataset irrespective of the type of research design. In the final sample, I 2 statistics was 98.124, indicating that 98.124% of the variance in observed effects reflects the variation in true effects rather than chance. The variance of true effects (tau-squared, ) was 0.505. The prediction interval also does not include zero (95% prediction interval [0.565, 0.913]). The results demonstrated substantial variability across effect sizes rather than sampling error. Based on the results, we conducted a moderator analysis to explore the reasons for variability. First, because previous studies measured the effectiveness of PD programs along with teacher and student outcomes, we synthesized the overall effect size by targeting participants. According to the results (see Table 1), we found that the overall effect size based on students’ outcomes is 0.771 (SE = 0.095, p < .001), whereas the overall effect size based on teachers’ outcomes is 0.705 (SE = 0.059, p <.001). Both pooled effect sizes are statistically significant, but the difference between the two pooled effect sizes is not statistically significant. In terms of types of PD outcomes, the cognitive outcomes ( = 0.788, SE = 0.063, p < .001) have a higher pooled effect size than the affective outcomes ( = 0.696, SE = 0.081, p < .001), but this difference is not statistically significant. Second, in terms of study-level characteristics, we considered the funding for conducting PD studies and the country for each PD study published. According to the results (see Table 2), the pooled effect sizes are significantly large regardless of funding support ( = 0.815, SE = 0.107, p < .001 for No; = 0.714, SE = 0.060, p < .001 for Yes) or the study’s origin country ( = 0.729, SE = 0.057, p < .001 for US; = 0.771, SE = 0.130, p < .001 for Other). Third, in exploring the reasons for variability across effect sizes, we considered the grade levels teachers taught. We found marginal differences between the middle school-only group and the K through middle school group ( t (10.05) = 2.03, p < .10) and between the high school-only and the kindergarten through middle school group ( t (12.76) = 1.86, p < .10). Generally, if a study focused on a specific grade level, such as the middle school group ( = 0.856, SE = 0.108, p < .001) or high school group ( = 0.832, SE = 0.130, p < .001), as their target population, PD effectiveness was more significant than in a group of a combination of different grade levels (see Table 3). Next, we examined the impacts of technology-integrated PD and target STEM subjects on PD (see Table 4). We identified that studies that used technology ( = 0.738, SE = 0.100, p < .001) and studies that did not use technology ( = 0.750, SE = 0.062, p < .001) had similar effect sizes. There was no significant difference in the effectiveness of PD when technology was integrated. In terms of target subjects, we found that if a PD program considered science ( = 0.762, SE = 0.065, p < .001), technology ( = 0.763, SE = 0.284, p < .05), or more than two subjects ( = 0.728, SE = 0.099, p < .001), the effect sizes were larger than those that considered math ( = 0.572, SE = 0.136, p < .001); however, this difference was not statistically significant. To examine the relationship between program length and effectiveness, we considered dosage (i.e., total hours) and program duration. According to the results (see Table 4), the pooled effect size was large enough even if the program duration were of fewer than 80 hours ( = 0.773, SE = 0.110, p <.001), whereas the pooled effect size was much larger ( t (24.60) = 1.91, p < .10) when the program was longer than 80 hours ( = 1.121, SE = 0.182, p < .001). In addition, if the program were between 13 and 23 months ( = 0.781, SE = 0.214, p < .05) or longer than 24 months ( = 0.830, SE = 0.133, p < .001), the effectiveness of PD increased. However, these differences were not statistically significant. Finally, Table 5 presents the results of moderator analyses related to PD’s core characteristics. We found that if the PD studies included components of active learning ( g̅ = 0.756, SE = 0.054, p < .001), collective participation ( g̅ = 0.771, SE = 0.107, p < .001), expert support ( g̅ = 0.746, SE = 0.066, p < .001), and feedback ( g̅ = 0.737, SE = 0.060, p < .001), the effect sizes tended to be larger than in studies that did not include these components, although the differences were not statistically significant. Both PD programs with and without a content focus show a strong positive effect on outcomes, but the effect size is somewhat higher for programs without a content focus. However, the lack of statistical significance in the meta-regression suggests that the difference between programs with and without content focus may not be robust across studies. This implies that while content focus is beneficial, it may not be the sole determining factor for PD effectiveness. PD programs with coherence had a smaller effect size (0.673) compared to those without coherence (0.870). Although coherence is generally regarded as beneficial for teacher learning, the data suggests that PD programs without a strong coherence component may still show higher overall effects. However, the difference between PD programs with and without coherence was found to be statistically insignificant, meaning that the variation in effect sizes could be due to chance rather than a meaningful influence of coherence. Similarly, while using models is typically a valuable and positive component of PD, the insignificant results suggest that incorporating models does not necessarily provide an advantage over PD that does not include modeling. INSERT TABLE 2 INSERT TABLE 3 INSERT TABLE 4 INSERT TABLE 5 Discussion Policymakers, school and district leaders, and researchers have become increasingly concerned with improving the quality of teacher PD, particularly in terms of its impact on student outcomes. A 2015 survey by the New Teacher Project (Klan, 2017) indicated that an average of $18,000 has been spent annually on PD for teachers, potentially $8 billion annually for the largest U.S. districts. Yet this investment might not be leading to the expected outcomes. The current study provides the most extensive meta-analytical evidence from 118 individual studies demonstrating that PD programs for STEM teachers effectively support teachers’ CK and pedagogical quality and improve students’ academic performance. The magnitude of the large effect size of PD programs found in this meta-analysis is larger than the results of other meta-analyses and systematic reviews synthesizing findings on the impact of PD on both teachers and students (e.g., Blank & de Las Alas, 2009; Egert et al., 2018). Egert et al. examined the effects of PD programs on external quality ratings and child development in preschool, preK, and kindergarten contexts. There was a significant medium-weighted effect size for in-service programs on quality ratings based on the meta-analysis. Blank and de Las Alas used 104 effect sizes from 16 studies to examine the effects of math and science PD on student achievement, and they observed a medium effect size in most of the studies. We used 118 PD studies to evaluate various cognitive and affective outcomes of teachers and students, highlighting that both features are essential in STEM PD programs. Little agreement exists that cognitive abilities and CK are most important for teachers’ practices. Teachers’ multiplex cognitive features could make finding the core knowledge for effective teaching difficult. Furthermore, empirical investigations in PD settings have rarely explored connections between the cognitive and affective types. The connection might be developed under the assumption that affective characteristics such as self-efficacy would be beneficial to improving learned knowledge from PD programs. Russell and Pratt’s (1980) model of affect detailed that affective dimensions are the mediating variables for cognitive processes and behaviors. Scholars have suggested that such affective factors mediate learning by increasing cognitive engagement (e.g., Park et al., 2014). Russell and Pratt’s framework is particularly useful in understanding the outcomes and benefits of PD. Most of the PD studies we analyzed focused on teacher outcomes ( n = 89) rather than student ones ( n = 41), so we assume that there is more difficulty in linking PD to student outcomes than in revealing the relationship between PD and teacher outcomes. Deducting and specifying a causal relationship linking teacher knowledge to student outcomes can be challenging when using indirect measurement methods (Dede, 2009). Factors Contributing to Effectiveness of STEM PD Dosage Among the diverse moderators related to PD programs’ effectiveness, a critical moderator is dosage. The present study demonstrated the significance of total contact hours in determining the effectiveness of PD. Our results identified a threshold indicating the minimum number of hours required for effective PD. PD programs lasting longer than 81 hours had significantly stronger effects than those lasting less than 80 hours. This result indicated that more exposure to materials or training provided by a PD program could be advantageous for STEM teachers. This result aligns with findings from several prior studies. Zaslow et al. (2010) suggested that general models with a high dosage of PD tended to be associated with positive outcomes for both teachers and students. Yoon et al. (2007) found that the three studies in their review that had the least amount of PD (5–14 hours) revealed no statistically significant impacts on student learning, whereas PD programs with more than 14 contact hours had positive and significant effects on student outcomes. Other authors suggested different findings regarding dosage affecting positive impacts: 20 contact hours (Desimone, 2009), over 30 hours (Guskey & Yoon, 2009), and 49 hours (Darling-Hammond & Richardson, 2009). Although the dosage factor in PD needs to be further reported and analyzed, it is commonly argued that attending multiple PD sessions allows teachers to better reinforce, elaborate upon, and follow up with information presented during previous sessions; by contrast, short one-time sessions can often be disconnected from teachers’ needs, potentially leading to a lack of motivation and hindering meaningful changes (Cobb et al. 2003). Our results regarding PD duration revealed that it is an important moderator that affects PD; in contrast to dosage, however, there appears to be no specific duration threshold that increases its effectiveness. Three previous meta-analyses, Basma and Savage (2018), Kraft et al. (2018), and Lynch et al. (2019), found no correlation between an extended duration of professional development (PD) and its outcomes. Additionally, Yin et al. (2015) suggested that PD programs spanning 2 or 3 years yielded negative effect sizes compared to those lasting only 1 year. These findings imply that simply extending the duration of PD does not guarantee significant outcomes in PD programs. Grade Level Another notable finding of this study is the benefit of homogeneous teacher groups according to grade levels. The effects of PD on homogeneous teacher groups (i.e., only K–elementary, middle school, or high school teachers) were significantly better than PD programs comprising teachers of heterogeneous groups with a wide range of grade levels. The current study thus suggests that a promising approach to PD is to engage teachers in similar grade levels to bring about significant positive changes. The teachers’ homogenous group composition based on grade levels can be a key factor in providing common experiences for the development of their CK and practices. In addition, in such homogenous groups, grouping teachers with similar professional backgrounds and work experience can encourage maximum collaboration among teachers with a common goal. A teacher group comprising various grade levels would have different experiences with PD, with teachers possessing different prior knowledge, backgrounds, and expectations. The variation of effects between grade levels across outcomes implies the importance of collective participation in PD, as Garet et al. (2001) and Desimone (2009) suggested. Factors Not Contributing to Effectiveness of STEM PD In addition to the significant moderators that lead to PD success, the current study reveals examples of insignificant moderators, such as available funding for PD studies and technology use. Almost 75% of the studies included in this meta-analysis were funded to design and implement PD. This supports the argument that a large amount of funding is spent yearly on PD development and implementation (e.g., Lipsey et al., 2012). The current study also shows that 91 out of 118 (77.1%) STEM PD studies were funded, indicating that significant expenditure would be needed for the development and presentation of PD activities or for purchasing teacher time to participate in PD. Even though funding for conducting PD studies may not significantly moderate PD’s effectiveness, it does not necessarily imply that grant funding would not be beneficial or positively affect teacher and student outcomes. This finding should be further researched in future studies before we can conclude that financial resources for conducting PD studies are not a critical factor in their effectiveness. When coding the funding status in this study, we coded only if it were clearly stated. It is possible that some PD studies supported by funding omitted their funding names or organizations, thus causing us to count them as unfunded. Another possible argument is that funded studies have more conservative standards by which to evaluate PD programs, and unfunded studies’ results would overestimate the effectiveness of theis. If we balanced the funded PD studies with additional unfunded studies, the interpretation regarding PD funding would be clearer. In addition, if we considered the evaluation of study quality for all unfunded and funded studies, this finding would be more robust. Recent studies have demonstrated that effective technology integration has the potential to promote STEM learning through reform-oriented teaching practices (e.g., Authors, 2021; Barak & Assal, 2018). In this trend, STEM PD programs tend to include technology features for improving STEM learning and teaching. This study demonstrates that one-third of the total studies incorporated technology to implement their STEM PD, while the remaining two-thirds did not use any form of technology. The current study did not identify technology as a significant moderator contributing to PD success. Nonetheless, the rapid advancement of educational technology has the potential to reshape the PD landscape; incorporating technological features thus warrants a more thorough evaluation. It will be imperative to conduct rigorous investigations into how technology-enhanced PD programs affect teaching and learning outcomes for students and teachers. Limitations and Future Research Despite the important findings and implications of the current study, some limitations can be identified. Although this study revealed certain factors that significantly contributed to the effectiveness of PD, it represents just the first step toward meaningful interventions. We examined the influence of various moderators, including contextual factors and core characteristics, on the effectiveness of PD. Further investigation is required to generalize the effects of these moderators and other variables. Additionally, we did not explore the dynamic interactions among these moderators and the outcomes of PD. A meta-analytic structural equation modeling approach can be employed to investigate how these factors interact and affect the results of PD. PD developers and educators can then ensure a tailored and well-informed program design that maximizes its impact. Future research exploring these key factors and interactions could provide comprehensive insight into PD design. Many PD studies employed a repeated measure design without a control group. Some authors employed equivalent tests to mitigate this limitation without adjusting for confounders, but the results from the dataset may have limited external validity. Based on the sensitivity analysis results, which compared control and treatment group designs with repeated measure designs, we included all effect sizes regardless of research design. However, when interpreting the results one must be on the lookout for potential external validity issues. A control group is essential for conducting rigorous research on PD, but it is often difficult to implement them in education research; withholding an intervention from a control group, especially one with significant benefits, poses an ethical dilemma. Logical constraints and limitations of resources could be additional reasons for omitting control groups. Establishing control groups can often be logistically complex, involving challenges such as scheduling, participant coordination, and adherence to the research protocol. Another factor that may come into play is limited resources. Sometimes, researchers must prioritize one group over the other because of financial constraints. Furthermore, unlike in clinical trials where blinding participants and researchers is feasible (e.g., through placebo use), it is impractical or impossible to anonymize teacher evaluations in PD programs for teachers, which introduces the risk of bias evaluation. Although we included performance bias in our coding, it would not provide useful information for assessing the risk of bias. Regarding attrition bias, approximately 70% of the studies included in the current meta-analysis reported the required pre- and postsample sizes, whereas the remaining 30% did not provide this information. Given the significance of sample size in PD studies and potential meta-analyses, it is strongly recommended that researchers report this information. Finally, it is important to note that our study did not incorporate unpublished research, such as dissertations and theses, which could potentially introduce a publication bias. Although we have taken steps to account for this possibility through various analyses, it is crucial to acknowledge that our findings may lean toward an overestimation of true effectiveness. In future research, adding unpublished studies, or at least dissertations and theses, would ensure a more comprehensive review. Conclusion Teacher PD programs are a tool for preparing and supporting a high-quality workforce to improve overall student achievement. Most STEM teacher PD programs aim to improve diverse teacher and student outcomes, which are shared goals of the PD programs. A wealth of studies has addressed the effects of PD programs for K–12 STEM teachers; however, little research has systematically assessed the results of previous PD research. This study examines the merged findings on the effects of PD for STEM teachers and the effects of its potential moderators; specifically, the variation of effects related to the dosage hours and grade levels across outcomes suggests that researchers should consider the specific outcome that is most relevant for their intervention. This study helps educators and education policy communities better understand PD research and the ways PD can be designed and implemented most effectively by aggregating research findings across different STEM PD studies. Future researchers will need to take a more rigorous approach regarding the caliber of studies to be conducted in terms of design quality and length as well, as the content and type of PD delivery undertaken. Declarations Funding Declaration: No funding was provided for the completion of this work. Acknowledgement NA Data Availability The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. References Acuna, E., & Rodriguez, C. (2004). A meta-analysis study of outlier detection methods in classification . Technical paper, Department of Mathematics, University of Puerto Rico at Mayaguez, 1 , 25. Author (2019). Authors (2021). Banilower, E. R., Boyd, S. E., Pasley, J. D., & Weiss, I. R. (2006) . Lessons from a decade of mathematics and science reform: A capstone report for the local systemic change through teacher enhancement initiative. Horizon Research, Inc. Barak, M., & Assal, M. (2018). Robotics and STEM learning: Students’ achievements in assignments according to the P3 Task Taxonomy—Practice, problem solving, and projects. International Journal of Technology and Design Education , 28 (1), 121–144. Basma, B., & Savage, R. (2018). Teacher professional development and student literacy growth: A systematic review and meta-analysis. Educational Psychology Review , 30 (2), 457–481. Berisha, F., & Vula, E. (2021). Developing pre-service teachers’ conceptualization of STEM and STEM pedagogical practices. Frontiers in Education , 6 , Article 585075. https://doi.org/10.3389/feduc.2021.585075 Birman, B. F., Desimone, L., Porter, A. C., & Garet, M. S. (2000). Designing professional development that works. Educational Leadership , 57 (8), 28–33. Blank, R. K., & de Las Alas, N. (2009). Effects of teacher professional development on gains in student achievement. Council of Chief State School Officers. Campbell Collaboration. (2021). Campbell systematic reviews: policies and guidelines. Version 1.8. https://onlinelibrary.wiley.com/pbassets/assets/18911803/ Campbell%20Policies%20and %20Guidelines%20_May3%202022-1653054593497.pdf Chitpin, S. (2011). Can mentoring and reflection cause change in teaching practice? A professional development journey of a Canadian teacher educator. Professional Development in Education , 37 (2), 225–240. Clary, R. M., Dunne, J. A., Elder, A. D., Saebo, S., Beard, D. J., Wax, C. L., Winter, J., & Tucker, D. L. (2017). Optimizing online content instruction for effective hybrid teacher professional development programs. Journal of Science Teacher Education , 28 (6), 507–521. Cobb, P., Confrey, J., DiSessa, A., Lehrer, R., & Schauble, L. (2003). Design experiments in educational research. Educational Researcher , 32 (1), 9–13. Cooper, H., Hedges, L. V., & Valentine, J. C. (Eds.). (2019). The handbook of research synthesis and meta-analysis . Russell Sage Foundation. Darling-Hammond, L. (1998). Teachers and teaching: Testing policy hypotheses from a national commission report. Educational Researcher , 27 (1), 5–15. Darling-Hammond, L., & Richardson, N. (2009). Research review/teacher learning: What matters. Educational Leadership , 66 (5), 46–53. Darling-Hammond, L., Hyler, M. E., & Gardner, M. (2017). Effective teacher professional development . Learning Policy Institute. Dede, C. (2009). Immersive interfaces for engagement and learning. Science , 323 (5910), 66–69. Derting, T. L., Ebert-May, D., Henkel, T. P., Maher, J. M., Arnold, B., & Passmore, H. A. (2016). Assessing faculty professional development in STEM higher education: Sustainability of outcomes. Science Advances , 2 (3), e1501422. Desimone, L. M. (2009). Improving impact studies of teachers’ professional development: Toward better conceptualizations and measures. Educational Researcher , 38 (3), 181–199. Desimone, L. M. (2011). A primer on effective professional development. Phi Delta Kappan , 92 (6), 68–71. Desimone, L. M., & Pak, K. (2017). Instructional coaching as high-quality professional development. Theory Into Practice , 56 (1), 3–12. Duval, S., & Tweedie, R. (2000). Trim and fill: A simple funnel‐plot–based method of testing and adjusting for publication bias in meta‐analysis. Biometrics , 56 (2), 455–463. Egert, F., Fukkink, R. G., & Eckhardt, A. G. (2018). Impact of in-service professional development programs for early childhood teachers on quality ratings and child outcomes: A meta-analysis. Review of Educational Research , 88 (3), 401–433. Egger, M., Smith, G., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315 (7109), 629–634. Fisher, Z., & Tipton, E. (2015). Robumeta: An R-package for robust variance estimation in meta-analysis (No. 02220; ArXiv Preprint). ArXiv. https://arxiv.org/abs/1506.02220 Fishman, E. J., Borko, H., Osborne, J., Gomez, F., Rafanelli, S., Reigh, E., Tseng, A., Million, S., & Berson, E. (2017). A practice-based professional development program to support scientific argumentation from evidence in the elementary classroom. Journal of Science Teacher Education, 28 (3), 222–249. Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin , 76 (5), 378–382. Furtak, E. M., Circi, R., & Heredia, S. C. (2018). Exploring alignment among learning progressions, teacher-designed formative assessment tasks, and student growth: Results of a four-year study. Applied Measurement in Education , 31 (2), 143–156. Garet, M. S., Porter, A. C., Desimone, L., Birman, B. F., & Yoon, K. S. (2001). What makes professional development effective? Results from a national sample of teachers. American Educational Research Journal , 38 (4), 915–945. Garet, M. S., Heppen, J., Walters, K., Smith, T., & Yang, R. (2016). Does content-focused teacher professional development work? Findings from three Institute of Education Sciences studies. National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences. https://ies.ed.gov/ncee/pubs/20174010/pdf/20174010.pdf Gleser, L. & Olkin, I. (2009). Stochastically dependent effect sizes. The Handbook of Research Synthesis and Meta-analysis , 2 , 357–376. Gonzalez K., Lynch K., Hill H. C. (2022). A meta-analysis of the experimental evidence linking STEM classroom interventions to teacher knowledge, classroom instruction, and student achievement . EdWorkingPaper. Greene, B. A., Lubin, I. A., Slater, J. L., & Walden, S. E. (2013). Mapping changes in science teachers’ content knowledge: Concept maps and authentic professional development. Journal of Science Education and Technology , 22 , 287–299. Guskey, T. R. (2000). Evaluating professional development . Corwin. Guskey, T. R. (2002). Professional development and teacher change. Teachers and Teaching, 8 (3), 381–391. Guskey, T. R., & Sparks, D. (1996). Exploring the relationship between staff development and improvements in student learning. Journal of Staff Development, 17 (4), 34–38. Guskey, T. R., & Yoon, K. S. (2009). What works in professional development? Phi Delta Kappan , 90 (7), 495–500. Hawley, P. H., & Sinatra, G. M. (2019). Declawing the dinosaurs in the science classroom: Reducing Christian teachers’ anxiety and increasing their efficacy for teaching evolution. Journal of Research in Science Teaching , 56 (4), 375–401. Hayes, K. N., Bae, C. L., O’Connor, D., & Seitz, J. C. (2020). Beyond funding: How organizational resources support science professional learning. American Journal of Education , 126 (3), 389-422. Hayes, K. N., Wheaton, M., & Tucker, D. (2019). Understanding teacher instructional change: The case of integrating NGSS and stewardship in professional development. Environmental Education Research , 25 (1), 115–134. Hedges, L.V., Tipton, E., & Johnson, M. (2010). Robust variance estimation in meta regression with dependent effect size estimates. Research Synthesis Methods , 1 (1): 39–65. Higgins, J. P., Altman, D. G., Gøtzsche, P. C., Jüni, P., Moher, D., Oxman, A. D., Savović, J., Schulz, K. F., & Sterne, J. A. (2011). The Cochrane Collaboration’s tool for assessing risk of bias in randomized trials. British Medical Journal , 343. 1–9. Hill, H. C., Beisiegel, M., & Jacob, R. (2013). Professional development research: Consensus, crossroads, and challenges. Educational Researcher , 42 (9), 476–487. Karisan, D., Macalalag, A., & Johnson, J. (2019). The effect of methods courses on preservice teachers’ awareness and intentions of teaching science, technology, engineering, and mathematics (STEM) subjects. International Journal of Research in Education and Science , 5 (1), 22–35. Kelcey, B., Spybrook, J., Phelps, G., Jones, N., & Zhang, J. (2017). Designing large-scale multisite and cluster-randomized studies of professional development. The Journal of Experimental Education , 85 (3), 389–410. Kennedy, M. M. (2016). How does professional development improve teaching? Review of Educational Research , 86 (4), 945–980. Kim, H. J., Miller, H. R., Herbert, B., Pedersen, S., & Loving, C. (2012). Using a wiki in a scientist–teacher professional learning community: Impact on teacher perception changes. Journal of Science Education and Technology , 21 , 440–452. Klan, T. (2017, Nov). How to create a cost-effective PD program that impresses. https://www.eschoolnews.com/2017/11/13/cost-effective-pd-program/ Kleickmann, T., Tröbst, S., Jonen, A., Vehmeyer, J., & Möller, K. (2016). The effects of expert scaffolding in elementary science professional development on teachers’ beliefs and motivations, instructional practices, and student achievement. Journal of Educational Psychology , 108 (1), 21 –42. Knight, S. L., Parker, D., Zimmerman, W., & Ikhlief, A. (2014). Relationship between perceived and observed student-centred learning environments in Qatari elementary mathematics and science classrooms. Learning Environments Research , 17 (1), 29–47. Kraft, M. A., Blazar, D., & Hogan, D. (2018). The effect of teacher coaching on instruction and achievement: A meta-analysis of the causal evidence. Review of Educational Research , 88 (4), 547–588. Kuehnert, E., Cason, M., Young, J., & Pratt, S. (2019). A meta-analysis of reform-based professional development in STEM: Implications for effective praxis. International Journal of Technology in Education , 2 (1), 60–68. Lawless, K. A., & Pellegrino, J. W. (2007). Professional development in integrating technology into teaching and learning: Knowns, unknowns, and ways to pursue better questions and answers. Review of Educational Research , 77 (4), 575–614. Levitt, K. E. (2002). An analysis of elementary teachers’ beliefs regarding the teaching and learning of science. Science Education , 86 (1), 1–22. Liao, Y. C. (2018). Coaching in teacher professional development for technology integration: Examining teacher practices and perceptions (Doctoral dissertation, Indiana University). Lindvall, J., & Ryve, A. (2019). Coherence and the positioning of teachers in professional development programs. A systematic review. Educational Research Review , 27 , 140–154. Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W., Roberts, M., Anthony, K. S., & Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. National Center for Special Education Research. Lotter, C. R., Thompson, S., Dickenson, T. S., Smiley, W. F., Blue, G., & Rea, M. (2018). The impact of a practice-teaching professional development model on teachers’ inquiry instruction and inquiry efficacy beliefs. International Journal of Science and Mathematics Education , 16 (2), 255–273. Loucks-Horsley, S., Stiles, K. E., Mundry, S., Love, N., & Hewson, P. W. (2009). Designing professional development for teachers of science and mathematics . Corwin. Lynch, K., Hill, H. C., Gonzalez, K. E., & Pollard, C. (2019). Strengthening the research base that informs STEM instructional improvement efforts: A meta-analysis. Educational Evaluation and Policy Analysis , 41 (3), 260–293. Maeng, J. L., Whitworth, B. A., Bell, R. L., & Sterling, D. R. (2020). The effect of professional development on elementary science teachers’ understanding, confidence, and classroom implementation of reform‐based science instruction . Science Education , 104 (2), 326–353. Malanson, K., Jacque, B., Faux, R., & Meiri, K. F. (2014). Modeling for fidelity: Virtual mentorship by scientists fosters teacher self-efficacy and promotes implementation of novel high school biomedical curricula. PLoS One , 9 (12), e114929. Marra, R. M., Arbaugh, F., Lannin, J., Abell, S., Ehlert, M., Smith, R., Merle-Johnson, D., & Rogers, M. P. (2011). Orientations to professional development design and implementation: Understanding their relationship to PD outcomes across multiple projects. International Journal of Science and Mathematics Education , 9 (4), 793–816. McNaull, A. (2014, March). Federal STEM educator professional development programs: A discussion of funding, approaches, and implementation. In APS April Meeting Abstracts (Vol. 2014, ID. S10-008). Millar, M. G., & Tesser, A. (1989). The effects of affective–cognitive consistency and thought on the attitude–behavior relation. Journal of Experimental Social Psychology , 25 (2), 189–202. Nadelson, L. S., Pfiester, J., Callahan, J., & Pyke, P. (2015). Who is doing the engineering, the student or the teacher? The development and use of a rubric to categorize level of design for the elementary classroom. Journal of Technology Education , 26 (2), 22–45. Park, B., Plass, J. L., & Brünken, R. (2014). Cognitive and affective processes in multimedia learning. Learning and Instruction , 29 , 125–127. Ramey, S., Crowell, N. A., Ramey, C. T., Grace, C., Timraz, N., & Davis, L. E. (2011). The dosage of professional development for early childhood professionals: How the amount and density of professional development may influence its effectiveness. In The early childhood educator professional development grant: Research and practice (pp. 11–32). Emerald Group. Polgampala, A. S. V., Shen, H., & Huang, F. (2017). STEM teacher education and professional development and training: Challenges and trends. American Journal of Applied Psychology , 6 (5), 93–97. Robinson, A., Dailey, D., Hughes, G., & Cotabish, A. (2014). The effects of a science-focused STEM intervention on gifted elementary students’ science knowledge and skills. Journal of Advanced Academics , 25 (3), 189–213. Roman, A. F. (2019). Identifying the preschool teachers’ needs on transversal competences training career using the questionnaire. Educația Plus , 23 (SP IS), 164–170. Roth, K. J., Wilson, C. D., Taylor, J. A., Stuhlsatz, M. A., & Hvidsten, C. (2019). Comparing the effects of analysis-of-practice and content-based professional development on teacher and student outcomes in science. American Educational Research Journal , 56 (4), 1217–1253. RStudio Team (2020). RStudio: Integrated Development for R. RStudio , PBC, Boston, MA. http://www.rstudio.com Russell, J. A., & Pratt, G. (1980). A description of the affective quality attributed to environments. Journal of Personality and Social Psychology , 38 (2), 311 –322. Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher , 15 (2), 4–14. Tanner-Smith, E. E., Tipton, E., & Polanin, J. R. (2016). Handling complex meta-analytic data structures using robust variance estimates: A tutorial in R. Journal of Developmental and Life-Course Criminology , 2 (1), 85–112. Tobin, R. G., Crissman, S., Doubler, S., Gallagher, H., Goldstein, G., Lacy, S., Rogers, C. B., Schwartz, J., & Wagoner, P. (2012). Teaching teachers about energy: Lessons from an inquiry-based workshop for K–8 teachers. Journal of Science Education and Technology , 21 , 631–639. Tsamir, P., Tirosh, D., Levenson, E., Tabach, M., & Barkai, R. (2014). Developing preschool teachers’ knowledge of students’ number conceptions. Journal of Mathematics Teacher Education , 17 , 61–83. Tukey, J. W. (1977). Exploratory data analysis . Addison-Wesley. Van Driel, J. H., Meirink, J. A., van Veen, K., & Zwart, R. C. (2012). Current trends and missing links in studies on teacher professional development in science education: A review of design features and quality of research. Studies in Science Education , 48 (2), 129–160. Van den Noortgate, W., López-López, J. A., Marín-Martínez, F., & Sánchez-Meca, J. (2013). Three-level meta-analyses of dependent effect sizes. Behavior Research Methods , 45, 576–594. Van Driel, J. H., Beijaard, D., & Verloop, N. (2001). Professional development and reform in science education: The role of teachers’ practical knowledge. Journal of Research in Science Teaching , 38 (2), 137–158. Van Veen, K., Zwart, R. C., & Meirink, J. A. (2012). What makes teacher professional development effective? A literature review. In M. Kooy & K. van Veen (Eds.), Teacher learning that matters (pp. 3–21). Routledge. Viechtbauer W (2010). Conducting meta-analyses in R with the metafor package, Journal of statistical software, 36 (3), 1–48. Viera, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: The kappa statistic. Family Medicine , 37 (5), 360–363. Wayne, A. J., Yoon, K. S., Zhu, P., Cronen, S., & Garet, M. S. (2008). Experimenting with teacher professional development: Motives and methods. Educational Researcher , 37 (8), 469–479. Webster-Wright, A. (2009). Reframing professional development through understanding authentic professional learning. Review of Educational Research, 79 (2), 702–739. Whitworth, B. A., Maeng, J. L., & Bell, R. L. (2018). Exploring practices of science coordinators participating in targeted professional development. Science Education , 102 (3), 474–497. Yildirim, B., Topalcengiz, E. S., Arikan, G., & Timur, S. (2020). Using virtual reality in the classroom: Reflections of STEM teachers on the use of teaching and learning tools. Journal of Education in Science Environment and Health , 6 (3), 231–245. Yin, Y., Olson, J., Olson, M., Solvin, H., & Brandon, P. R. (2015). Comparing two versions of professional development for teachers using formative assessment in networked mathematics classrooms. Journal of Research on Technology in Education , 47 (1), 41–70. Yoon, K. S., Duncan, T., Lee, S. W.-Y., Scarloss, B., & Shapley, K. (2007). Reviewing the evidence on how teacher professional development affects student achievement (Issues & Answers Report, REL 2007–No. 033). U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southwest. https://files.eric.ed.gov/fulltext/ED498548.pdf Yoon, S. A., Anderson, E., Koehler-Yom, J., Evans, C., Park, M., Sheldon, J., Schoenfeld, L., Wendel, D., Scheintaub, H., & Klopfer, E. (2017). Teaching about complex systems is no simple matter: Building effective professional development for computer-supported complex systems instruction. Instructional Science , 45 (1), 99–121. Zaslow, M., Tout, K., Halle, T., Whittaker, J. V., & Lavelle, B. (2010). Toward the identification of features of effective professional development for early childhood educators. Office of Planning, Evaluation and Policy Development, U.S. Department of Education. https://eric.ed.gov/?id=ED527140 Tables Tables 1 to 5 are available in the Supplementary Files section. Additional Declarations No competing interests reported. Supplementary Files Table1PDMETAnew.docx Table1PDMETAnew.docx Table2PDMETAnew.docx Table3PDMETAnew.docx Table4PDMETAnew.docx Table5PDMETAnew.docx AppendixCodingScheme5.6.25.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6602739","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":465517419,"identity":"313f4389-2391-4408-838f-5c9a01f8230d","order_by":0,"name":"Hyesun You","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA4ElEQVRIiWNgGAWjYBACAwST+QBDAoYgfi1sCSRr4THAIogFmEskH/7Mu4MhsV8i55vEg4rD8gzszdsk8GmxnJGWJs17hiFx5ozcbRIJZw4bNvAcK8OrxeBGjhkzb9v/xA03gFoS224zNkjkmBHQkv/5M28bA1BLzjOQFvsG+TeEtOQwSEO1sIG0JDZI8BDQcuaZmeTcNgbjmT3PjC0SzvxPbuNJK7bAq+V48uMPb9sYZPvZkx/e/FGRZtvPfnjjDXxaGAQSwJRjA0yADa9yEOA/AKbsCSocBaNgFIyCkQsAF8xK/H3smWkAAAAASUVORK5CYII=","orcid":"","institution":"University of Iowa","correspondingAuthor":true,"prefix":"","firstName":"Hyesun","middleName":"","lastName":"You","suffix":""},{"id":465517421,"identity":"d5e4d540-274b-4622-a151-dc252c39a575","order_by":1,"name":"Sunyoung Park","email":"","orcid":"","institution":"California Lutheran University","correspondingAuthor":false,"prefix":"","firstName":"Sunyoung","middleName":"","lastName":"Park","suffix":""},{"id":465517423,"identity":"c022dee8-e54d-4027-bf80-1223ece1d7c5","order_by":2,"name":"Minju Hong","email":"","orcid":"","institution":"Chung-Ang University","correspondingAuthor":false,"prefix":"","firstName":"Minju","middleName":"","lastName":"Hong","suffix":""}],"badges":[],"createdAt":"2025-05-06 11:53:40","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6602739/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6602739/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":83942479,"identity":"618d89c2-2081-4e7d-8c06-932cf4a79ba7","added_by":"auto","created_at":"2025-06-04 19:05:16","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":34790,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eSimplified conceptual framework for effective PD. (Adapted from Desimone’s model (2009) based on previous PD effectiveness studies)\u003c/em\u003e\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-6602739/v1/8f076d4528660e6c4a8a3bbd.png"},{"id":83942480,"identity":"1c4b5f75-630b-4b2d-bb17-a66b3ef9be9a","added_by":"auto","created_at":"2025-06-04 19:05:16","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":41546,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eThe PRISMA flow chart\u003c/em\u003e\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-6602739/v1/5dfb4b45f27fe502016e5b3b.png"},{"id":83942837,"identity":"a63959c1-4b73-4ad6-bfcf-43a5298553f1","added_by":"auto","created_at":"2025-06-04 19:13:16","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":107253,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eFunnel Plot of Trim and Fill\u003c/em\u003e\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-6602739/v1/5ca149d74c06c662de210e0f.png"},{"id":91743314,"identity":"b6a56bb0-4665-41b9-b202-9e9ad4c51ace","added_by":"auto","created_at":"2025-09-19 20:16:34","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1140409,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6602739/v1/f685b0af-f46b-444b-b90e-25a69bf04d2b.pdf"},{"id":83942484,"identity":"ad584c6a-d14f-4e27-9af4-be51d60da7ec","added_by":"auto","created_at":"2025-06-04 19:05:16","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":15495,"visible":true,"origin":"","legend":"","description":"","filename":"Table1PDMETAnew.docx","url":"https://assets-eu.researchsquare.com/files/rs-6602739/v1/4f6186b573152661074d0c88.docx"},{"id":83942487,"identity":"8081518a-0251-4ef4-98ca-193e81c66e30","added_by":"auto","created_at":"2025-06-04 19:05:16","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":15495,"visible":true,"origin":"","legend":"","description":"","filename":"Table1PDMETAnew.docx","url":"https://assets-eu.researchsquare.com/files/rs-6602739/v1/7dbd5f7b91a77c4f42759921.docx"},{"id":83942916,"identity":"a6d18a17-3e09-4aac-8617-444a56b0d242","added_by":"auto","created_at":"2025-06-04 19:21:16","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":15295,"visible":true,"origin":"","legend":"","description":"","filename":"Table2PDMETAnew.docx","url":"https://assets-eu.researchsquare.com/files/rs-6602739/v1/03dd36b18988eb126fefc9b1.docx"},{"id":83942489,"identity":"5e150a6f-c7d4-40d4-8cfd-c37a497a156e","added_by":"auto","created_at":"2025-06-04 19:05:16","extension":"docx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":16033,"visible":true,"origin":"","legend":"","description":"","filename":"Table3PDMETAnew.docx","url":"https://assets-eu.researchsquare.com/files/rs-6602739/v1/40cd62042aa93231d735dc54.docx"},{"id":83942491,"identity":"7b1560a5-a6b8-478a-b541-b9587ca5a23e","added_by":"auto","created_at":"2025-06-04 19:05:16","extension":"docx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":17503,"visible":true,"origin":"","legend":"","description":"","filename":"Table4PDMETAnew.docx","url":"https://assets-eu.researchsquare.com/files/rs-6602739/v1/08100b48ec89de31306b8ab1.docx"},{"id":83942493,"identity":"3306749b-8fe1-41f9-9d02-2921be2b1630","added_by":"auto","created_at":"2025-06-04 19:05:16","extension":"docx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":18275,"visible":true,"origin":"","legend":"","description":"","filename":"Table5PDMETAnew.docx","url":"https://assets-eu.researchsquare.com/files/rs-6602739/v1/94119b8c811f68266985711c.docx"},{"id":83942496,"identity":"17363e59-1537-473a-acbd-911aadf34eb4","added_by":"auto","created_at":"2025-06-04 19:05:16","extension":"docx","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":15658,"visible":true,"origin":"","legend":"","description":"","filename":"AppendixCodingScheme5.6.25.docx","url":"https://assets-eu.researchsquare.com/files/rs-6602739/v1/f94a2b8e376f509f6ab4c54e.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eA Meta-Analysis of the Effects of Teacher Professional Development in STEM Education: What Have We Done, and Where Are We Going?\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eProfessional development (PD) functions as a catalyst for driving change in education and as a strategic approach to empowering teachers to enhance their professional knowledge and teaching practices (Guskey, 2002). School districts, funding agencies, and policymakers continue to invest considerable resources in high-quality PD programs, aiming to improve teacher effectiveness and student achievement; however, mixed findings exist regarding the effectiveness of these programs (Darling-Hammond et al., 2017). PD programs do not consistently lead to successful learning outcomes in the classroom, despite their intent to do so, because of challenges in their design and implementation (Van Driel et al., 2012, Webster-Wright, 2009). The science, technology, engineering, and mathematics (STEM) education fields are making significant efforts to establish high-quality PD programs based on educational policy and practice.\u0026nbsp;These PD programs aim to enhance STEM educators’ teaching skills, address challenges, foster innovation, promote collaboration, and ultimately improve STEM instruction and student learning.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn conjunction with these efforts, it is essential to investigate the elements of effective PD and the extent to which PD programs in STEM education are effective. The literature on teacher effectiveness has provided a logic model demonstrating a link between PD intervention and student outcomes (e.g., academic achievement, motivation) in that they are mediated by teacher outcomes (e.g., teacher knowledge, instructional practices). Darling-Hammond et al. (2017) defined \u003cem\u003eeffective PD\u003c/em\u003e as structured career learning that results in changes in teacher practices and improvements in student learning outcomes. This definition implies a program’s effectiveness can be evaluated either by teachers’ outcomes alone or by a combination of teacher and student outcomes. Particularly, high-quality PD programs in STEM education literature emphasize improvement in teaching practices and use varied measurable factors to reveal PD effectiveness, including teachers’ content knowledge (CK; Kelcey et al., 2017; Kleickmann et al., 2016); dispositions toward teaching (i.e., affective factors such as confidence, beliefs, and self-efficacy; Hayes et al., 2019; Nadelson et al., 2015); instructional practices (Knight et al., 2014; Maeng et al., 2020); and student outcomes (Lynch et al., 2019; Robinson et al., 2014; Yoon et al., 2017).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eResearchers have examined the effects of teacher development initiatives for STEM teachers and identified the core features that should be considered in PD. However, few recent meta-analysis studies have addressed the comprehensive effectiveness of PD programs\u0026nbsp;for K–12 STEM teachers. Lynch et al. (2019) conducted a meta-analysis to identify the STEM PD program content, activities, and formats related to specific student outcomes.\u0026nbsp;They presented results from 95 experimental and quasi-experimental preK–12 STEM PD programs published between January 2004 and March 2016 and found a pooled effect size of +0.21 standard deviations (95%CI [0.12, 0.28]). This is smaller than the effect sizes found in other meta-analyses. For example, Yoon et al. (2007) showed 0.51 SD in science and 0.57 SD in math through studies published between 1986 and 2006.\u003c/p\u003e\n\u003cp\u003eGonzalez et al. (2022)\u0026nbsp;also studied PD\u0026nbsp;interventions’ impacts on both teacher knowledge and classroom instruction through 37 published experimental studies of preK–12 STEM education for 3 decades\u0026nbsp;and how these outcomes affect student achievement. Teachers who participated in STEM classroom interventions showed improvements in CK, Pedagogical Content Knowledge (PCK), and classroom instruction, with a pooled average impact estimate of +0.56 SD. Programs with more significant impacts on teacher practice yielded more prominent effects on student achievement, on average. Kuehnert et al. (2019) focused on the effects of reform-based PD on the PCK of K–12 math and science teachers. Their findings revealed a robust connection between PD and PCK; the effect size was significant (\u003cem\u003ed\u003c/em\u003e = 0.51, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001). Notably, the authors did not observe significant variation in study characteristics.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eRecent meta-analyses have evaluated the efficacy of PD by exclusively focusing on either the outcomes for teachers or those for students. These outcomes have been highly specific, and the studies considered for inclusion have been limited in number. Moreover, the meta-analysis studies have typically focused on mathematics and science teachers. Our ongoing study encompasses a distinct scope and poses research questions that diverged from those addressed in the previous STEM PD meta-analyses.\u0026nbsp;In the current study, we explored the effects of diverse PD outcomes, including cognitive and affective outcomes of teachers and students, thereby broadening the general understanding of the effects of PD. Furthermore, we examined the influence of potential moderators on PD outcomes: structural features of PD (e.g., duration, dosage), contextual factors of PD (e.g., funded vs. nonfunded PD, country), and process features (e.g., active learning, reflection opportunity). We considered the core characteristics of PD programs shown in the literature as moderators through which to investigate our hypothesis regarding whether the variation in effect size among studies is associated with differences in PD characteristics.\u003c/p\u003e\n\u003cp\u003eOur study not only adds novel findings to the existing literature on the quality and significance of teacher PD programs but also provides insight into evidence concerning the design and implementation of teacher PD, specifically, through exploring mechanisms by which PD programs in STEM education influence teacher knowledge, teacher practices, and student achievement.\u0026nbsp;\u003c/p\u003e"},{"header":"Literature Review","content":"\u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003eTeacher Change Model\u003c/h2\u003e \u003cp\u003eThe theoretical framework of our study was guided by the principle that the positive effects of PD interventions can strengthen teacher instruction\u0026mdash;and therefore students\u0026rsquo; academic performance levels. Desimone (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2009\u003c/span\u003e) proposed a comprehensive framework of how PD influences both teachers\u0026rsquo; instructional practice and student learning. The core theory of action includes four main steps: teachers participate in a PD program; teachers\u0026rsquo; participation increases their knowledge and skills, changes their attitudes and beliefs, or both; given their new knowledge and skills (or attitudes and beliefs), teachers improve their classroom instruction through changes in content, pedagogy, or both; and the changes in teachers\u0026rsquo; instructional practices promote student learning. In this framework, personal factors for both students and teachers\u0026mdash;along with contextual factors such as school characteristics, curriculum, and policies\u0026mdash;can mediate the influence of PD on teacher and student development.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e displays the simplified linear path model of the four steps whereby the conceptual model links the critical features of PD and increases in student achievement via the effects of intervening variables such as teacher knowledge and instructional practice. We constructed this model based on Desimone\u0026rsquo;s conceptual PD model (2009) and existing literature reviews of studies on PD effects (e.g., Egert et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Yoon et al., \u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). This adapted model allowed us to identify measurable PD outcomes, determine significantly influencing PD features, and explore PD effectiveness.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eINSERT\u003c/b\u003e FIGURE \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e \u003cb\u003eHERE\u003c/b\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eCore Characteristics of High-Quality PD\u003c/h2\u003e \u003cp\u003eResearchers have placed significant emphasis on identifying the key components of effective PD and assessing the impact of these components on teacher knowledge, skills, and practices. We conducted a targeted search of PD studies, specifically focusing on key features that have a significant impact on PD outcomes. We systematically coded these features to categorize and label them. After reaching a point of saturation, we identified a total of nine essential features: content focus, active learning, coherence with goals, collective participation, using models of effective practice, coaching and expert support, feedback and reflection, duration, dosage, and technology integration.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eContent Focus\u003c/h3\u003e\n\u003cp\u003eContent comprises the knowledge and skills that teachers should possess to enhance their professional practices (Guskey \u0026amp; Sparks, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e1996\u003c/span\u003e). Scholars have argued that PD programs that provide knowledge of content, pedagogical knowledge, and alternative teaching practices are more effective than those that do not. Furthermore, Garet et al. (\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2016\u003c/span\u003e), Desimone (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2011\u003c/span\u003e), and Darling-Hammond and Richardson (\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2009\u003c/span\u003e) highlighted teachers improving their knowledge of the subjects they teach and possessing adequate skills to communicate this knowledge to students as an effective characteristic of PD programs. For example, Garet et al. (\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2016\u003c/span\u003e) examined the impact of content-focused PD on teachers\u0026rsquo; practices and CK and provided evidence that PD for improving teachers\u0026rsquo; content knowledge and content-specific pedagogy significantly improved teachers\u0026rsquo; knowledge for 1 year. This study indicated a potential approach for concentrating PD efforts on enhancing knowledge and practice outcomes.\u003c/p\u003e\n\u003ch3\u003eActive Learning\u003c/h3\u003e\n\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eAs a key component to making PD effective, many scholars have agreed that providing teachers with opportunities to participate in activities aligned with active learning is critical (e.g., Birman et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2000\u003c/span\u003e; Van Veen et al., \u003cspan citationid=\"CR85\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). A teacher\u0026rsquo;s active engagement in PD significantly affects teachers\u0026rsquo; knowledge, teaching practices, and student learning (Garet et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2001\u003c/span\u003e; Loucks-Horsley et al., \u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; Marra et al., \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Whitworth et al., \u003cspan citationid=\"CR90\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Active learning within the PD context could include diverse activities such as doing simulated practices, developing lesson plans and rubrics, and leading argumentation (Birman et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2000\u003c/span\u003e). Engaging teachers as active participants in PD can help them understand the importance of students\u0026rsquo; active learning.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003eCoherence With Goals\u003c/h3\u003e\n\u003cp\u003eA systematic review by Lindvall and Ryve (\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) identified three distinct ways coherence has been conceptualized in PD (2019). \u003cem\u003eExternal coherence\u003c/em\u003e indicates coherence with elements outside the PD program itself, such as the curriculum, standards, assessments, or teachers\u0026rsquo; needs and knowledge. Research has shown a positive correlation between teacher development and PD programs that are coherent with local curriculum standards and benchmarks (Birman et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2000\u003c/span\u003e; Desimone, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). When PD aligns with the curriculum, it helps teachers better understand how to implement instructional practices. Additionally, when PD activities mirror national, state, or district-wide reforms, it is more likely that what teachers learn in PD directly translates into improved student achievement. \u003cem\u003eInternal coherence\u003c/em\u003e emphasizes that different aspects of the PD program, including activities, content, and resources, should be self-consistent. \u003cem\u003eCreated coherence\u003c/em\u003e highlights a function of PD that can draw connections between different aspects of teachers\u0026rsquo; work rather than merely reinforcing existing coherence.\u003c/p\u003e \u003cp\u003eDesimone also defined coherence as a unified concept, describing it as \"the extent to which teacher learning aligns with teachers' knowledge and beliefs\" (p. 184). She argued that such coherence promotes a deeper integration of new ideas, as teachers are more inclined to adopt changes that resonate with their established pedagogical frameworks.\u003c/p\u003e\n\u003ch3\u003eCollective Participation\u003c/h3\u003e\n\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eDarling-Hammond et al. (\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2017\u003c/span\u003e) and Desimone (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2011\u003c/span\u003e) highlighted that PD provides opportunities for teachers to work actively and collaboratively, sharing their ideas, voices, experiences, and resources. Wayne et al. (\u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2008\u003c/span\u003e) indicated that PD is effective for influencing teacher learning and teaching practice if teachers from the same school, department, content area, or grade level participate collectively. The collaborative environment among teachers from similar settings can increase their understanding of diverse perspectives in teaching and empower their growth as educators (Garet et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2001\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eUsing Models of Effective Practice\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003ePD allows teachers to envision best teaching practices as they incorporate curricula or instructional models deemed effective. These models can include exemplary lesson plans, unit plans, sample student work, and recorded or written teaching practices (Darling-Hammond et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). When provided with useful models, teachers tend to be open to modifications to their teaching (Levitt, \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e2002\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eCoaching and Expert Support\u003c/h3\u003e\n\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eMany researchers have found that coaching by experts positively affects teachers\u0026rsquo; knowledge, attitudes, and practices. The ongoing nature of coaching provided sustained and personalized support that helped teachers incorporate new teaching methods into lesson planning, along with useful instructional feedback (e.g., Desimone \u0026amp; Pak, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Garet et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2001\u003c/span\u003e; Liao, \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2018\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e\n\u003ch3\u003eFeedback\u003c/h3\u003e\n\u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eFeedback and suggestions from PD facilitators and peers can ensure that teachers receive relevant guidance directly applicable to their classrooms and aligned with PD objectives. Continuous evaluation and timely feedback not only help in refining the PD process but also support teachers in implementing new strategies effectively (Desimone, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). Hattie and Timperley (2007) also emphasized that feedback is most effective when it is ongoing rather than a one-time event. Timely feedback allows teachers to reflect on their practice, make adjustments, and gradually build their expertise in new teaching strategies.\u003c/p\u003e \u003cp\u003eIncorporating feedback mechanisms into PD programs also fosters a culture of continuous improvement among teachers. By regularly receiving feedback, teachers are more likely to engage in reflective practice, which is a key component of professional growth. The feedback that teachers receive as part of PD can help identify gaps in implementation and areas where additional support may be needed. This targeted support can address challenges that teachers face in adopting new strategies, thereby increasing the likelihood of successful implementation and positive student outcomes (Darling-Hammond et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2017\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eDuration and Dosage\u003c/h2\u003e \u003cp\u003eA sufficient time span and appropriate amount of time spent on PD programs would help teachers learn new information and techniques and determine how such techniques can evolve or be adapted from existing teaching practices (Darling-Hammond et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Scholars have argued that both the duration and dosage of PD are significant factors in eliciting meaningful learning experiences from participating teachers (e.g., Hill et al., \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Wayne et al., \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2008\u003c/span\u003e). \u003cem\u003eDuration\u003c/em\u003e refers to the length of time over which the PD program is implemented, whereas \u003cem\u003edosage\u003c/em\u003e pertains to the intensity of the PD activities experienced by the teachers (Kennedy, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Van Driel et al., \u003cspan citationid=\"CR84\" class=\"CitationRef\"\u003e2001\u003c/span\u003e). Generally, scholars have argued that a longer duration allows for sustained engagement and deepening of understanding, and a sufficient dosage ensures that teachers receive enough exposure to new knowledge, skills, and strategies for change in their teaching practices. However, the appropriate duration and dosage for PD differs depending on its goal and context, as shown in diverse studies. Garet et al. (\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2001\u003c/span\u003e) indicated that PD is more effective when it is sustained over time and includes a substantial number of contact hours. Desimone (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2011\u003c/span\u003e) recommended that PD activity should be spread over a semester, including at least 20 hours of contact time either over the course of several months or through intensive experiences (e.g., summer workshops). Furthermore, Yoon et al. (\u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e2007\u003c/span\u003e) reported that an average of 49 hours is a substantial PD duration for a single program to positively affect student achievement. Banilower et al. (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2006\u003c/span\u003e), however, suggested that this number should be 100\u0026thinsp;+\u0026thinsp;hours, including the activity length, the time spent working on and implementing the activity, and the time for administrators to provide feedback to teachers.\u003c/p\u003e \u003cp\u003eRamey et al. (\u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e2011\u003c/span\u003e) conducted an intervention aimed at revealing the impact of different durations of PD while maintaining the same dosage. Teachers participated in 120 hours of in-classroom coaching and were randomly assigned to either an immersion condition group, which involved 20 full days spread over 5 weeks, or a low-density condition group, receiving 1 full day (8 hours) of PD per week over 20 weeks. The immersion condition\u0026rsquo;s classrooms showed modest improvements across the school year, but the low-density conditions either remained stagnant or saw a decline in academic performance.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eTechnology Integration\u003c/h2\u003e \u003cp\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003eTechnology integration into learning has become a potent force for change, reshaping the way students engage with education (Lawless \u0026amp; Pellegrino, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). This transformation has been particularly pronounced in the realm of STEM education and has been extended to teachers\u0026rsquo; PD (Polgampala et al., \u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). By implementing technology-enabled PD, STEM teachers can access tailored learning experiences and experiment with innovative teaching methods to captivate and engage students through interactive and dynamic experiences (Authors, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Scholars have indicated that effective technology integration engages teachers through interactive and dynamic experiences. Technology-integrated PD allows teachers to become familiar with meaningfully applying technology and offering engaging and hands-on activities in a student-centered learning environment (Barak \u0026amp; Assal, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2018\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eVirtual simulations and robotics applications provide immersive environments. This experiential learning approach enhances teachers\u0026rsquo; confidence in adopting novel instructional strategies and unlocks the potential to promote STEM learning through active and coherent teaching practices (Authors, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Yildirim et al., \u003cspan citationid=\"CR91\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The meaning of technology integration can be interpreted in different ways. We defined \u003cem\u003etechnology integration\u003c/em\u003e as it applies to science, math, and engineering education.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003c/div\u003e\u003ch2\u003eFunding\u003c/h2\u003e\n\u003cp\u003eDeveloping and implementing PD initiatives requires sufficient financial backing to address various crucial elements within these programs. Adequate funding supports essential aspects, including the creation of training materials and resources and compensation for instructors and participating teachers. Funds for PD seem to influence other structural facets of programs, such as duration (Hayes et al., 2020; McNaull 2014).\u003c/p\u003e\n\u003cp\u003eHowever, despite the acknowledgement of funding as a critical factor, empirical evidence regarding its direct impact on educational outcomes within PD remains limited. While this study will not delve into the intricate heterogeneity characterizing the relationship between financial resources and outcomes, it does shed light on the association between funding availability and PD outcomes. Expanding our understanding of the interplay between funding and PD outcomes is imperative for informed decision-making and effective resource allocation in PD. By recognizing the connections between financial resources and programmatic outcomes, stakeholders and policymakers can better strategize and optimize their investment in PD endeavors.\u003c/p\u003e\n\u003ch2\u003ePD Outcomes Influencing Educational Changes\u003c/h2\u003e\n\u003cp\u003eOutcome measures of PD discussed in the literature may be indicators for determining the effectiveness of PD. Desimone’s (2009) model illustrated “interactive, nonrecursive relationships between the critical features of PD, teacher knowledge and beliefs, classroom practices, and student outcomes” (p. 184). Guskey (2000) provided a multilevel framework demonstrating PD’s outcome variables, including teachers’ knowledge and skills, application of knowledge, and student learning, and noted that these outcomes can be used for evaluating PD. Similarly, several scholars of the effects of PD on teachers, students, or both have reported multiple outcomes measured by diverse instruments and methods. Through performing\u0026nbsp;a comprehensive literature search and using Desimone’s conceptual PD model, we sorted PD outcomes into four categories: (a) teachers’ content and/or pedagogical knowledge; (b) teachers’ affective domains; (c) classroom practices; and (d) students’ outcomes.\u003c/p\u003e\n\u003ch3\u003eTeachers’ Cognitive Components\u003c/h3\u003e\n\u003cp\u003eAs part of the evaluation purpose of PD, teachers are often assessed to determine the extent to which their PD participation led to increased knowledge of the content they teach and PCK, including teaching and learning strategies appropriate to the content they teach. Teachers require a certain degree of CK to teach their subjects effectively and effectively implement their knowledge and skill sets.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003ePCK (Shulman, 1986) refers to the interconnected domains of teacher knowledge; PCK is often discussed as existing on a continuum wherein teachers acquire more PCK through appropriate training and experience. Thus, most PD programs emphasize improving teacher knowledge and applying new knowledge and skills to their classroom practices. For example, Clary et al. (2017) investigated science knowledge retention at the Teacher Academy in the Natural Sciences (TANS) beyond a specific instructional year. The TANS targeted middle school science teachers of chemistry, geosciences, and physics; the study’s goals were to increase teacher CK and enhance student learning outcomes. Roth et al. (2019) studied the influences of two PD programs, Science Teacher Learning from Lesson Analysis (STeLLA) and Content Deeping (CD), on upper-elementary teachers’ knowledge. The authors hypothesized and confirmed that the STeLLA program would improve science teachers’ CK, PCK, and classroom practice more than the CD program, which focused only on CK.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eResearchers have acknowledged the importance of measuring teachers’ knowledge. However, considering only cognitive factors is insufficient for PD evaluation and underestimates the significance of the changes teachers make. Many scholars (Desimone, 2009; Guskey, 2000) have argued that affective measures such as teacher perception and attitude toward PD are key to evaluating PD outcomes.\u003c/p\u003e\n\u003ch3\u003eTeachers’ Affective Components\u003c/h3\u003e\n\u003cp\u003eResearchers must consider affective factors when evaluating the impact of PD. Affective components include the feelings, moods, and emotions people have experienced in a certain situation. Within PD programs, affective components—such as a teacher’s attitudes, beliefs, self-efficacy, and confidence—are measured to identify the effects of scaffolding and treatment. Based on affective–cognitive consistency theory, these affective components may change with the provision of new information, and this relationship further moderates the nature of teacher behaviors (Millar \u0026amp; Tesser, 1989).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAuthors (2021) designed and developed a PD program to help middle school teachers effectively integrate robotics in science and mathematics classrooms. The primary goal of the PD program was for teachers to increase their confidence and self-efficacy in robotics-integrated teaching, thereby enhancing their ability to develop lesson plans infused with robotics activities. Similarly, Hawley and Sinatra (2019) aimed to reduce the perceived conflict and negative emotions surrounding the relationship between faith and science for Christian teachers through a PD workshop. In the study, teachers reported decreases in misconceptions and increases in self-efficacy and positive emotions related to teaching evolution.\u0026nbsp;\u003c/p\u003e\n\u003ch3\u003eTeaching Practices\u003c/h3\u003e\n\u003cp\u003eA wealth of literature supported the impartment of a PD model on the quality of teachers’ instruction (e.g., Chitpin, 2011; Fishman et al., 2017). Desimone et al. demonstrated the positive impact of PD by focusing on the specific instructional practices and teachers’ use of those practices in the classroom. The authors used three instructional methods in the PD—technology use, high-order instructional methods (e.g., working on interdisciplinary lessons), and alternative assessments—which elicited changes in instructional practice.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eLotter et al. (2018) demonstrated a positive effect on teachers’ beliefs and self-efficacy of inquiry as well as the implementation of practice under a PD model. Specifically, the electronic quality of inquiry protocol analysis from before and after the PD demonstrated a significant increase in teachers’ quality of inquiry instruction. As such, teacher participation in PD may be directly linked with the successful implementation of new teaching practices. Additionally, connecting PD to teacher practice may encourage teachers to think differently about their instruction and modify their teaching practices.\u003c/p\u003e\n\u003ch3\u003eStudent Outcomes\u003c/h3\u003e\n\u003cp\u003eThe ultimate goal of most PD programs is to empower students by fostering teacher growth and learning. Diverse PD studies have demonstrated the significant impact of professional guidance on students’ academic performance. As Darling-Hammond (1998) asserted, teachers teach what they know, and teacher knowledge exerts a substantial positive influence on student learning. For example, Yoon et al. (2017) reported the findings of a 2-year PD study that revealed the professional guidance provided by the PD program resulted in a significantly enhanced classroom learning experience and improved student CK. Similarly, Furtak et al. (2018) documented that the learning outcomes of the majority of students observed over a 4-year period progressed from lower to upper benchmarks across various learning progressions. It is worth noting that few scholars have focused on evaluating the effectiveness of PD through its impact on student outcomes in K–12 STEM education.\u003c/p\u003e\n\u003ch1\u003eStudy Purpose\u003c/h1\u003e\n\u003cp\u003eMany studies on the effectiveness of PD have confined themselves to the perspective of either teachers or students and have not considered teacher and student outcomes simultaneously. Additionally, most PD metastudies have focused on specific academic subjects (e.g., only science) or grade levels (e.g., only elementary-grade students). By contrast, the current study incorporated diverse, measurable student and teacher outcomes to adequately determine the quality of PD programs. We have included a wide range of PD studies in K–12 STEM education to generate generalizable results and identify common practices or models that contribute to the effectiveness of PD.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTherefore, our meta-analysis covers a broad range of results synthesized from individual studies published over the past decade concerning the impact of PD in STEM education. We also investigate whether and which characteristics of PD programs might moderate these effects. We selected the moderators through a literature review and theoretical rationale. Published metastudies related to PD programs, existing teacher change models, and the core characteristics affecting PD programs from prior research laid the foundation for the selection of moderators.\u003c/p\u003e\n\u003cp\u003eWe sought to address the following two primary research questions:\u0026nbsp;\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eWhat are the effects of PD programs for STEM teachers in studies between 2010 and 2022?\u003c/li\u003e\n \u003cli\u003eWhat characteristics of PD programs (e.g., grade level, STEM subject, race/ethnicity, gender, funding) explain the degree of their effects in PD studies between 2010 and 2022?\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Methods","content":"\u003ch2\u003eSearch Procedures\u003c/h2\u003e\n\u003cp\u003eIn this review, we examined experimental studies of PD for K–12 teachers teaching STEM subjects. We conducted an electronic search using four databases (ERIC, PsycINFO, Web of Science, and EduSource) for PD program studies published January 2010–May 2022. Search terms for title and abstract included the following: [“PD” or “professional development”] AND [“teacher”] AND [“STEM” or “STEAM” or “science” or “math*” or “technology” or “engineering”].\u003c/p\u003e\n\u003cp\u003eThree researchers reviewed an initial pool of 1,121 studies, comprising 296 studies from PsycINFO, 109 studies from ERIC, 324 studies from Web of Science, and 392 studies from EduSource. After removing 30 duplicated studies based on the titles, authors, and journal information, 1,091 studies remained based on their titles or abstract content. Of those, 302 were irrelevant to PD effectiveness, and we removed them based on a closer review of the full text. Additionally, we excluded 45 studies related to other grade levels, such as early childhood or higher education, and 543 qualitative studies. Of the remaining 201 studies, 83 failed to provide sufficient statistical data, such as mean, standard deviation, and sample size, which are essential for estimating effect sizes. After removing those studies, we were left with 118 articles, from which we initially calculated 946 effect sizes. However, we subsequently excluded eight effect sizes from five studies (Greene et al., 2013; Kim et al., 2012; Malanson et al., 2014; Robinson et al., 2014; Tobin et al., 2012) as outliers, using the interquartile range (IQR) method (Acuna \u0026amp; Rodriguez, 2004; Tukey, 1977). The IQR method detects values that fall below −2.328 or exceed 3.615 as outliers. This multistep search procedure resulted in a total of 938 effect sizes from the final 118 studies for meta-analysis (see Figure 2).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn the search process, we implemented a rigorous triple-review approach for each search step. This method ensured a thorough and comprehensive evaluation of the data and results. We assessed the interrater reliability of the coding using Fleiss’s (1971) generalized kappa, which is computationally versatile, accommodating a variety of coding variables and raters. Three coders collected coding information related to basic study details (e.g., title, authors, publication year, journal names), statistical data (e.g., mean, standard deviation, sample size, correlations, \u003cem\u003et\u003c/em\u003e-value, \u003cem\u003eF\u003c/em\u003e-ratio), and moderators (see Table 1).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eINSERT FIGURE 2 HERE\u003c/strong\u003e\u003c/p\u003e\n\u003ch2\u003eInclusion and Exclusion Criteria\u003c/h2\u003e\n\u003ch3\u003eTypes of Interventions\u003c/h3\u003e\n\u003cp\u003eThe included interventions were K–12 STEM teachers’ PD. We focused on studies that addressed how PD programs have been implemented and the effects of PD programs on teacher quality (e.g., self-efficacy, CK, teaching practice) and student learning outcomes. We omitted studies on early childhood education teachers (e.g., Roman, 2019; Tsamir et al., 2014); practice/assistant/preservice teachers (e.g., Berisha \u0026amp; Vula, 2021; Karisan et al., 2019); and professors, instructors in higher education, or both (e.g., Derting et al., 2016).\u003c/p\u003e\n\u003ch3\u003eParticipants\u003c/h3\u003e\n\u003cp\u003eThe participants included a diverse group of teachers and their students, focusing specifically on the K–12 educational level. These teachers taught STEM subjects. We therefore excluded early childhood and preschool teachers, preservice teachers, and faculty and instructors in higher education.\u003c/p\u003e\n\u003ch3\u003eStudy Design\u003c/h3\u003e\n\u003cp\u003eWe included studies that used a quantitative research design. Specifically, we sought studies that reported sufficient statistics, enabling us to calculate the effect sizes of overall selected studies. Notably, the majority of the studies included in our analysis involved comparisons between control and treatment groups or examinations of sample groups through the predesign and postdesign. We excluded studies that adopted qualitative research approaches such as interviews or narrative research.\u0026nbsp;\u003c/p\u003e\n\u003ch3\u003eIntervention Year and Settings\u003c/h3\u003e\n\u003cp\u003eWe included studies of interventions conducted 2010–2022. Our aim was to gain insights into PD programs over the past decade. We included full-text peer-reviewed journal articles and excluded conference proceedings, grant proposals, reports, book reviews, dissertations, and theses. Because of language constraints, we included only articles published in English.\u003c/p\u003e\n\u003ch2\u003eCoding and PD Characteristics\u003c/h2\u003e\n\u003cp\u003eWe created a spreadsheet to record the pool of potential studies and collect information from each study for the meta-analysis. The coding sheet included ID, sample size, methodology (i.e., experimental or nonexperimental design with pretests and posttests of treatment group), and quantitative data to calculate effect sizes, gender, race, funding, country, target sample (i.e., teacher, student, or both), grade level, technology use, PD dosage (i.e., the number of contact hours teachers spent in PD experiences), PD length (i.e., the time span over which those hours were spread), targeting outcomes in PD (i.e., a cognitive or affective factor), the main subject of the teachers participating in PD program, measurement tool, and type of PD treatment. We used these characteristics of PD programs as moderators to reveal how the characteristics can explain variations in each study’s effect size. As discussed in the literature review, the literature related to PD characteristics and prior PD meta-analysis studies provided insights into the “what,” “how,” and “who” components of PD. Based on the guidelines from the Campbell Collaboration (n.d.), the first 10 reviews were coded by three independent coders, and disagreements between the coders were resolved through discussion and consensus. Then the remaining reviews were coded by three coders independently based on revised coding criteria. In the initial coding of the first 10 studies, the Fleiss kappa score was 0.610, indicating moderate agreement among the coders. The majority of disagreements occurred during the statistical data and moderators coding process, during which decisions were made regarding the reference group, determining the accurate final sample size, and naming the precise measure. To resolve these discrepancies, criteria were clarified based on the observed disagreements. The coding\u0026nbsp;results showed Fleiss kappa score of 0.81, indicating high agreement (Viera \u0026amp; Garrett, 2005).\u003c/p\u003e\n\u003ch2\u003eRisk of Bias\u003c/h2\u003e\n\u003cp\u003eWe evaluated the risk of bias that could have an impact on the magnitude of effect sizes using the Cochrane Collaboration tool (Higgins et. al., 2011). The manual suggests evaluating five characteristics: selection bias (random sequence generation and allocation concealment); reporting bias (selective reporting); performance bias (blinding of participants and personnel); detection bias (blinding of outcome assessment); and attrition bias (incomplete outcome data). We also took into account the reporting of psychometrics information to identify any potential bias arising from the measures used.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eRegarding selection bias, we evaluated whether the studies incorporated a control or comparison group, as well as whether they indicated the use of random assignment, random sampling, or any other sampling procedure (e.g., matched design) aimed at creating equivalent groups in the sampling process. In assessing reporting bias, we compared the measures outlined in the method section with the results presented in the results section. If measures mentioned in the method section were not reported in the results section, we inferred the presence of reporting bias. In terms of performance bias, if we provided any information about intervention to the participants (e.g., informed consent), we inferred that the blinding of the trial may been violated. In terms of performance bias, if the researchers provided any information about intervention to the participants (e.g., informed consent), we inferred that the blinding of the trial may been violated. Furthermore, if the study indicated that researchers were aware of participant allocation during data analysis, we inferred the potential presence of detection bias. We checked whether the studies accurately reported their attrition by providing pre- and postsample sizes or by directly mentioning the proportion of attrition out of the total sample. Additionally, we examined any reliability test results (such as Cronbach’s alpha or inter-rater reliability) or validity test results (such as factor analysis) for psychometrics information.\u003c/p\u003e\n\u003ch2\u003eStatistical Analysis\u003c/h2\u003e\n\u003cp\u003eData analysis consisted of four steps: estimating individual effect sizes using Hedges’ \u003cem\u003eg,\u003c/em\u003e which included bias-corrected estimates in comparison to Cohen’s \u003cem\u003ed\u003c/em\u003e (Cooper et al., 2019); synthesizing effect size estimates by using robust variance estimation (RVE); conducting moderator analysis of meta-regression for continuous variables and categorical variables under the RVE; and assessing publication bias. We performed all analyses using RStudio (RStudio Team, 2020). Specifically, we used the 'robumeta' package (Fisher \u0026amp; Tipton, 2015) for robust variance estimation (RVE) and Egger's regression, as well as the 'metafor' package (Viechtbauer, 2010) for the Trim and Fill analysis.\u003c/p\u003e\n\u003cp\u003eWhen calculating effect sizes with repeated design (prescore and postscore), most studies did not report a correlation value between prescore and postscore, even though this information is needed to calculate effect sizes and corresponding sampling variance. Thus, as a sensitivity analysis, we used two conventional correlation values: 0.4 as a moderate level and 0.7 as a strong level. However, all results from both correlation values were not substantially different. Thus, we used one of the correlation values, 0.4, as the inputted correlation value between the prescore and postscore.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn the current study, to address the dependence among multiple effect sizes from a study, we used RVE (Hedges et al., 2010) for synthesizing effect size as well as moderator analyses and ultimately obtained 938 effect sizes among 118 primary studies, which indicates that one study reported more than one effect size. In this case, the multiple effect sizes from the same study could have a dependency on one another because they are from the same sample. If we ignore dependency and use the univariate random-effect model, the estimates may be biased. Unlike standard model-based methods such as multivariate meta-analysis, RVE can address dependency by improving standard error estimation. Although the generalized least squares (GLS; Gleser \u0026amp; Olkin, 2009) method was originally developed to address dependency, GLS requires correlation values among dependent effect sizes that are typically not reported in primary studies. Both the multilevel model (Van den Noortgate et al., 2013) and RVE are viable options that can address the dependency. A previous study (Author, 2019) found that both the multilevel model and RVE can yield unbiased results with large sample sizes, whereas the multilevel model may experience low convergence rates with smaller sample sizes. Given the pooled effect sizes across a total sample (here, 938 effect sizes), employing a multilevel model may not present challenges. However, during moderator analyses, missing data resulted in a decrease in the sample size. To address the issue of small sample size in moderator analyses, we opted to use RVE instead of the multilevel model.\u003c/p\u003e\n\u003cp\u003eMoreover, RVE does not require the correlation structure when calculating standard errors and hypothesis tests; rather, it empirically estimates the standard errors using a sandwich estimator (Author, 2019), so it is recommended for meta-analyses with dependent effect sizes. We set statistical significance at .05 when degrees of freedom associated with the moderators were \u0026gt; 4 and at .01 when degrees of freedom were \u0026lt; 4 (Tanner-Smith et al., 2016).\u003c/p\u003e\n\u003cp\u003eIn addition to overall effect size, we conducted a moderator analysis to investigate the effects of potential moderators on PD programs, investigating a total of 16 moderating factors: (a) target sample for PD (i.e., teacher or student); (b) country (i.e., United States or non-United States); (c) technology use (i.e., technology was used or not used); (d) funding status (i.e., funded PD or nonfunded PD); (e) outcome type (i.e., cognitive or affective); (f) dosage (i.e., ≤ 24, 25–72, or ≥ 73 hours); (g) duration (i.e., ≤ 6, 7–12, 13–23, or ≥ 24 hours); (h) targeted subject (i.e., science only, math only, technology only, or more than two subjects); and (i) grade level (i.e., K–elementary, middle school only, high school only, K–middle school, middle–high school, or K–12). In addition, we used seven core characteristics of PD as moderators: (j) active learning; (k) content-focused; (l) collective participation; (m) expert support; (n) feedback and reflection; (o) using models; and (p) coherence with goals. One example of the moderators is measurable PD outcomes found in the studies, which centered on cognitive and affective dimensions of PD, that capture the characteristics of PD models. Cognitive outcomes in PD reflect the knowledge and reasoning of both teachers and students, whereas affective outcomes encompass teachers’ and students’ feelings, emotions, and attitudes toward the targeted objectives of PD. \u003cem\u003eFunding status\u003c/em\u003e refers to the financial resources required for conducting PD-related studies. We used categorical variables for dosage and duration due to the observed nonlinear relationship in the effectiveness of moderators. Thus, we categorized the moderators into meaningful ranges and conducted meta-regression employing dummy variables. Table 1 details this moderator information.\u003c/p\u003e\n\u003cp\u003eWe assessed publication bias using a trim-and-fill analysis (Duval \u0026amp; Tweedie, 2000) and the Egger regression analysis (Egger et al., 1997). When publication bias was detected, we used the trim-and-fill analysis to provide adjusted mean effect sizes (i.e., the estimate after including artificial effect sizes to make a symmetrical graph in the funnel plot). Egger’s test provides a statistical estimate of the degree and significance of asymmetry in the funnel plot, using a regression analysis with the effect sizes as the dependent variable and sampling variance as the independent variable. The nonsignificance of the coefficient for the sampling variance indicates that publication bias is not present; if the coefficient is significant, the value of the intercept indicates the adjusted pooled effect size (Egger et al., 1997).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eINSERT TABLE 1 HERE\u003c/strong\u003e\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eThe 118 PD programs offered a unique combination of duration, purpose, content focus, and contextual factors highlighting the multifaceted nature of PD. In terms of duration and depth, these programs covered a wide range of training experiences, with some lasting only a few days and others extending across several years. The objectives of the programs varied; some primarily aimed at enhancing teachers’ CK, whereas others focused on improving teachers’ instructional practices or self-efficacy.\u003c/p\u003e\n\u003cp\u003eAdditionally, we aimed to encompass a variety of STEM-related content. Contextual factors could play a significant role in the diversity of PD. For example, when programs were conducted in different countries, the programs bring a distinct cultural and educational context. The use of technology within PD also varied, with some embracing technological tools and others not.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eBased on the results of the risk of bias assessment, we found that 44 studies (37.3%) incorporated a control/comparison group into their design, whereas only 34 studies (28.8%) indicated the use of a randomization process in their sampling procedure or allocation. Among those, only 27 studies (22.9% of the total) demonstrated both a control/comparison group and randomization, indicating a low risk of bias. In terms of reporting bias, all studies reported outcomes mentioned in the method section. Regarding performance bias, we observed that all studies provided intervention information to participants through informed consent; however, no study attempted to blind outcome assessment. Among the 113 studies employing pre- and posttest designs, 80 studies (70.8%) clearly reported pre- and posttest sample sizes. Among these 80 studies, 29 (36.3%) experienced attrition in the posttest sample. (The remaining 38 studies (29.2%), provided no information available regarding pre- and posttest sample sizes.) Ninety studies (76.3%) reported the reliability or validity of measures used to ensure the quality of the measures employed.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn terms of publication bias analysis, the results of the trim-and-fill analysis conducted without the eight outliers mentioned earlier (resulting in 938 effect sizes) demonstrated that no effect sizes were inputted into a symmetric distribution. This suggests a low likelihood of publication bias, as depicted in Figure 3. The results remained consistent when including the eight outliers (resulting in 946 effect sizes). Additionally, Egger’s regression analysis when including the eight outliers indicated a potential presence of publication bias (\u003cem\u003eb\u003c/em\u003e = 2.712, SE = 0.172, \u003cem\u003ep\u003c/em\u003e \u0026lt; .05), albeit with a small positive effect (\u0026nbsp;\u0026nbsp;= 0.124, SE = 0.043, \u003cem\u003ep\u003c/em\u003e \u0026lt; .05). The results of Egger’s regression conducted without the eight outliers demonstrated that the coefficient of the sampling variance term is statistically significant (\u003cem\u003eb\u003c/em\u003e = 3.030, SE = 0.716, \u003cem\u003ep\u003c/em\u003e \u0026lt; .05). This indicates the possible existence of publication bias. However, the adjusted pooled effect size (\u0026nbsp;\u0026nbsp;= 0.565, SE = 0.065, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001) remained moderately and positively significant.\u003c/p\u003e\n\u003cp\u003eWhen we included eight effect sizes while assuming the presence of outliers, the overall effect sizes were 0.769 (SE = 0.057, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001) with a 95% confidence interval (0.656, 0.881). In contrast, when considering the data with only 938 effect sizes, the average overall effect size representing the effectiveness of the PD programs was \u003cem\u003eg\u003c/em\u003e = .739 (SE = 0.052, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001) with a 95% confidence interval (0.637, 0.842), which is statistically significant. As a result, the inclusion of the eight effect sizes led to an increase in the effect sizes by 0.03. Consequently, the subsequent analyses used the data without the eight outliers.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eINSERT FIGURE 3 HERE\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRegarding research design, out of the 938 effect sizes, 65 were derived from studies employing an independent group design. Additionally, there were 227 effect sizes from using a repeated measure group design while controlling for preexisting confounders through an equivalent test with a control group. The remaining studies with 646 effect sizes used a repeated measure group design without any control group. To assess the impact of research design on effect sizes, we conducted a sensitivity analysis and found no significant differences based on research design. Therefore, we included all effect sizes in our dataset irrespective of the type of research design.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn the final sample, \u003cem\u003eI\u003csup\u003e2\u003c/sup\u003e\u003c/em\u003e statistics was 98.124, indicating that 98.124% of the variance in observed effects reflects the variation in true effects rather than chance. The variance of true effects (tau-squared,\u0026nbsp;\u0026nbsp;) was 0.505. The prediction interval also does not include zero (95% prediction interval [0.565, 0.913]). The results demonstrated substantial variability across effect sizes rather than sampling error. Based on the results, we conducted a moderator analysis to explore the reasons for variability.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFirst, because previous studies measured the effectiveness of PD programs along with teacher and student outcomes, we synthesized the overall effect size by targeting participants. According to the results (see Table 1), we found that the overall effect size based on students’ outcomes is 0.771 (SE = 0.095, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001), whereas the overall effect size based on teachers’ outcomes is 0.705 (SE = 0.059, \u003cem\u003ep\u003c/em\u003e \u0026lt;.001). Both pooled effect sizes are statistically significant, but the difference between the two pooled effect sizes is not statistically significant. In terms of types of PD outcomes, the cognitive outcomes (\u0026nbsp;\u0026nbsp;= 0.788, SE = 0.063, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001) have a higher pooled effect size than the affective outcomes (\u0026nbsp;\u0026nbsp;= 0.696, SE = 0.081, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001), but this difference is not statistically significant.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSecond, in terms of study-level characteristics, we considered the funding for conducting PD studies and the country for each PD study published. According to the results (see Table 2), the pooled effect sizes are significantly large regardless of funding support (\u0026nbsp;\u0026nbsp;= 0.815, SE = 0.107, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001 for No;\u0026nbsp;\u0026nbsp;\u0026nbsp;= 0.714, SE = 0.060, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001 for Yes) or the study’s origin country (\u0026nbsp;\u0026nbsp;= 0.729, SE = 0.057, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001 for US;\u0026nbsp;\u0026nbsp;\u0026nbsp;= 0.771, SE = 0.130, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001 for Other).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThird, in exploring the reasons for variability across effect sizes, we considered the grade levels teachers taught. We found marginal differences between the middle school-only group and the K through middle school group (\u003cem\u003et\u003c/em\u003e(10.05) = 2.03, \u003cem\u003ep\u003c/em\u003e \u0026lt; .10) and between the high school-only and the kindergarten through middle school group (\u003cem\u003et\u003c/em\u003e(12.76) = 1.86, \u003cem\u003ep\u003c/em\u003e \u0026lt; .10). Generally, if a study focused on a specific grade level, such as the middle school group (\u0026nbsp;\u0026nbsp;= 0.856, SE = 0.108, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001) or high school group (\u0026nbsp;\u0026nbsp;= 0.832, SE = 0.130, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001), as their target population, PD effectiveness was more significant than in a group of a combination of different grade levels (see Table 3).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eNext, we examined the impacts of technology-integrated PD and target STEM subjects on PD (see Table 4). We identified that studies that used technology (\u0026nbsp;\u0026nbsp;= 0.738, SE = 0.100, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001) and studies that did not use technology (\u0026nbsp;\u0026nbsp;= 0.750, SE = 0.062, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001) had similar effect sizes. There was no significant difference in the effectiveness of PD when technology was integrated. In terms of target subjects, we found that if a PD program considered science (\u0026nbsp;\u0026nbsp;= 0.762, SE = 0.065, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001), technology (\u0026nbsp;\u0026nbsp;= 0.763, SE = 0.284, \u003cem\u003ep\u003c/em\u003e \u0026lt; .05), or more than two subjects (\u0026nbsp;\u0026nbsp;= 0.728, SE = 0.099, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001), the effect sizes were larger than those that considered math (\u0026nbsp;\u0026nbsp;= 0.572, SE = 0.136, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001); however, this difference was not statistically significant.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTo examine the relationship between program length and effectiveness, we considered dosage (i.e., total hours) and program duration. According to the results (see Table 4), the pooled effect size was large enough even if the program duration were of fewer than 80 hours (\u0026nbsp;\u0026nbsp;= 0.773, SE = 0.110, \u003cem\u003ep\u003c/em\u003e \u0026lt;.001), whereas the pooled effect size was much larger (\u003cem\u003et\u003c/em\u003e(24.60) = 1.91, \u003cem\u003ep\u003c/em\u003e \u0026lt; .10) when the program was longer than 80 hours (\u0026nbsp;\u0026nbsp;= 1.121, SE = 0.182, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001). In addition, if the program were between 13 and 23 months (\u0026nbsp;\u0026nbsp;= 0.781, SE = 0.214, \u003cem\u003ep\u003c/em\u003e \u0026lt; .05) or longer than 24 months (\u0026nbsp;\u0026nbsp;= 0.830, SE = 0.133, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001), the effectiveness of PD increased. However, these differences were not statistically significant.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFinally, Table 5 presents the results of moderator analyses related to PD’s core characteristics. We found that if the PD studies included components of active learning (\u003cem\u003eg̅\u0026nbsp;\u003c/em\u003e= 0.756, SE = 0.054, \u003cem\u003ep\u0026nbsp;\u003c/em\u003e\u0026lt; .001), collective participation (\u003cem\u003eg̅\u0026nbsp;\u003c/em\u003e= 0.771, SE = 0.107, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001), expert support (\u003cem\u003eg̅\u0026nbsp;\u003c/em\u003e= 0.746, SE = 0.066, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001), and feedback (\u003cem\u003eg̅\u003c/em\u003e = 0.737, SE = 0.060, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001), the effect sizes tended to be larger than in studies that did not include these components, although the differences were not statistically significant. Both PD programs with and without a content focus show a strong positive effect on outcomes, but the effect size is somewhat higher for programs without a content focus. However, the lack of statistical significance in the meta-regression suggests that the difference between programs with and without content focus may not be robust across studies. This implies that while content focus is beneficial, it may not be the sole determining factor for PD effectiveness. PD programs with coherence had a smaller effect size (0.673) compared to those without coherence (0.870). Although coherence is generally regarded as beneficial for teacher learning, the data suggests that PD programs without a strong coherence component may still show higher overall effects. However, the difference between PD programs with and without coherence was found to be statistically insignificant, meaning that the variation in effect sizes could be due to chance rather than a meaningful influence of coherence. Similarly, while using models is typically a valuable and positive component of PD, the insignificant results suggest that incorporating models does not necessarily provide an advantage over PD that does not include modeling.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eINSERT TABLE 2\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eINSERT TABLE 3\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eINSERT TABLE 4\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eINSERT TABLE 5\u003c/strong\u003e\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003ePolicymakers, school and district leaders, and researchers have become increasingly concerned with improving the quality of teacher PD, particularly in terms of its impact on student outcomes. A 2015 survey by the New Teacher Project (Klan, 2017) indicated that an average of $18,000 has been spent annually on PD for teachers, potentially $8 billion annually for the largest U.S. districts. Yet this investment might not be leading to the expected outcomes.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe current study provides the most extensive meta-analytical evidence from 118 individual studies demonstrating that PD programs for STEM teachers effectively support teachers’ CK and pedagogical quality and improve students’ academic performance. The magnitude of the large effect size of PD programs found in this meta-analysis is larger than the results of other meta-analyses and systematic reviews synthesizing findings on the impact of PD on both teachers and students (e.g., Blank \u0026amp; de Las Alas, 2009; Egert et al., 2018). Egert et al. examined the effects of PD programs on external quality ratings and child development in preschool, preK, and kindergarten contexts. There was a significant medium-weighted effect size for in-service programs on quality ratings based on the meta-analysis. Blank and de Las Alas used 104 effect sizes from 16 studies to examine the effects of math and science PD on student achievement, and they observed a medium effect size in most of the studies.\u003c/p\u003e\n\u003cp\u003eWe used 118 PD studies to evaluate various cognitive and affective outcomes of teachers and students, highlighting that both features are essential in STEM PD programs. Little agreement exists that cognitive abilities and CK are most important for teachers’ practices. Teachers’ multiplex cognitive features could make finding the core knowledge for effective teaching difficult. Furthermore, empirical investigations in PD settings have rarely explored connections between the cognitive and affective types. The connection might be developed under the assumption that affective characteristics such as self-efficacy would be beneficial to improving learned knowledge from PD programs. Russell and Pratt’s (1980) model of affect detailed that affective dimensions are the mediating variables for cognitive processes and behaviors. Scholars have suggested that such affective factors mediate learning by increasing cognitive engagement (e.g., Park et al., 2014).\u0026nbsp;Russell and Pratt’s framework is particularly useful in understanding the outcomes and benefits of PD.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eMost of the PD studies we analyzed focused on teacher outcomes (\u003cem\u003en\u003c/em\u003e = 89) rather than student ones (\u003cem\u003en\u0026nbsp;\u003c/em\u003e= 41), so we assume that there is more difficulty in linking PD to student outcomes than in revealing the relationship between PD and teacher outcomes. Deducting and specifying a causal relationship linking teacher knowledge to student outcomes can be challenging when using indirect measurement methods\u0026nbsp;(Dede, 2009).\u003c/p\u003e\n\u003ch2\u003eFactors Contributing to Effectiveness of STEM PD\u003c/h2\u003e\n\u003ch3\u003eDosage\u003c/h3\u003e\n\u003cp\u003eAmong the diverse moderators related to PD programs’ effectiveness, a critical moderator is dosage. The present study demonstrated the significance of total contact hours in determining the effectiveness of PD. Our results identified a threshold indicating the minimum number of hours required for effective PD. PD programs lasting longer than 81 hours had significantly stronger effects than those lasting less than 80 hours. This result indicated that more exposure to materials or training provided by a PD program could be advantageous for STEM teachers.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThis result aligns with findings from several prior studies. Zaslow et al. (2010) suggested that general models with a high dosage of PD tended to be associated with positive outcomes for both teachers and students. Yoon et al. (2007) found that the three studies in their review that had the least amount of PD (5–14 hours) revealed no statistically significant impacts on student learning, whereas PD programs with more than 14 contact hours had positive and significant effects on student outcomes.\u0026nbsp;Other authors suggested different findings regarding dosage affecting positive impacts: 20 contact hours (Desimone, 2009), over 30 hours (Guskey \u0026amp; Yoon, 2009), and 49 hours (Darling-Hammond \u0026amp; Richardson, 2009).\u0026nbsp;Although the dosage factor in PD needs to be further reported and analyzed, it is commonly argued\u0026nbsp;that attending multiple PD sessions allows teachers to better reinforce, elaborate upon, and follow up with information presented during previous sessions; by contrast, short one-time sessions can often be disconnected from teachers’ needs, potentially leading to a lack of motivation and hindering meaningful changes (Cobb et al. 2003).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOur results regarding PD duration revealed that it is an important moderator that affects PD; in contrast to dosage, however,\u0026nbsp;there appears to be no specific duration threshold that increases its effectiveness. Three previous meta-analyses, Basma and Savage (2018), Kraft et al. (2018), and Lynch et al. (2019), found no correlation between an extended duration of professional development (PD) and its outcomes. Additionally, Yin et al. (2015) suggested that PD programs spanning 2 or 3 years yielded negative effect sizes compared to those lasting only 1 year. These findings imply that simply extending the duration of PD does not guarantee significant outcomes in PD programs.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eGrade Level\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAnother notable finding of this study is the benefit of homogeneous teacher groups\u0026nbsp;according to grade levels. The effects of PD on homogeneous teacher groups (i.e., only K–elementary, middle school, or high school teachers) were significantly better than PD programs comprising teachers of heterogeneous groups with a wide range of grade levels. The current study thus suggests that a promising approach to PD is to engage teachers in similar grade levels to bring about significant positive changes. The teachers’ homogenous group composition based on grade levels can be a key factor in providing common experiences for the development of their CK and practices.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn addition, in such homogenous groups, grouping teachers with similar professional backgrounds and work experience can encourage maximum collaboration among teachers with a common goal. A teacher group comprising various grade levels would have different experiences with PD, with teachers possessing different prior knowledge, backgrounds, and expectations. The variation of effects between grade levels across outcomes implies the importance of collective participation in PD, as Garet et al. (2001) and Desimone (2009) suggested.\u003c/p\u003e\n\u003ch2\u003eFactors Not Contributing to Effectiveness of STEM PD\u003c/h2\u003e\n\u003cp\u003eIn addition to the significant moderators that lead to PD success, the current study reveals examples of insignificant moderators, such as available funding for PD studies and technology use. Almost 75% of the studies included in this meta-analysis were funded to design and implement PD. This supports the argument that a large amount of funding is spent yearly on PD development and implementation (e.g., Lipsey et al., 2012).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe current study also shows that 91 out of 118 (77.1%) STEM PD studies were funded, indicating that significant expenditure would be needed for the development and presentation of PD activities or for purchasing teacher time to participate in PD. Even though funding for conducting PD studies may not significantly moderate PD’s effectiveness, it does not necessarily imply that grant funding would not be beneficial or positively affect teacher and student outcomes. This finding should be further researched in future studies before we can conclude that financial resources for conducting PD studies are not a critical factor in their effectiveness.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWhen coding the funding status in this study, we coded only if it were clearly stated. It is possible that some PD studies supported by funding omitted their funding names or organizations, thus causing us to count them as unfunded. Another possible argument is that funded studies have more conservative standards by which to evaluate PD programs, and unfunded studies’ results would overestimate the effectiveness of theis. If we balanced the funded PD studies with additional unfunded studies, the interpretation regarding PD funding would be clearer. In addition, if we considered the evaluation of study quality for all unfunded and funded studies, this finding would be more robust.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eRecent studies have demonstrated that effective technology integration has the potential to promote STEM learning through reform-oriented teaching practices (e.g.,\u0026nbsp;Authors, 2021; Barak \u0026amp; Assal, 2018). In this trend, STEM PD programs tend to include technology features for improving STEM learning and teaching. This study demonstrates that one-third of the total studies incorporated technology to implement their STEM PD, while the remaining two-thirds did not use any form of technology. The current study did not identify technology as a significant moderator contributing to PD success. Nonetheless, the rapid advancement of educational technology has the potential to reshape the PD landscape; incorporating technological features thus warrants a more thorough evaluation. It will be imperative to conduct rigorous investigations into how technology-enhanced PD programs affect teaching and learning outcomes for students and teachers.\u003c/p\u003e\n\u003ch1\u003eLimitations and Future Research\u003c/h1\u003e\n\u003cp\u003eDespite the important findings and implications of the current study, some limitations can be identified. Although this study revealed certain factors that significantly contributed to the effectiveness of PD, it represents just the first step toward meaningful interventions. We examined the influence of various moderators, including contextual factors and core characteristics, on the effectiveness of PD. Further investigation is required to generalize the effects of these moderators and other variables. Additionally, we did not explore the dynamic interactions among these moderators and the outcomes of PD. A meta-analytic structural equation modeling approach can be employed to investigate how these factors interact and affect the results of PD. PD developers and educators can then ensure a tailored and well-informed program design that maximizes its impact. Future research exploring these key factors and interactions could provide comprehensive insight into PD design.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eMany PD studies employed a repeated measure design without a control group. Some authors employed equivalent tests to mitigate this limitation without adjusting for confounders, but the results from the dataset may have limited external validity. Based on the sensitivity analysis results, which compared control and treatment group designs with repeated measure designs, we included all effect sizes regardless of research design. However, when interpreting the results one must be on the lookout for potential external validity issues. A control group is essential for conducting rigorous research on PD, but it is often difficult to implement them in education research; withholding an intervention from a control group, especially one with significant benefits, poses an ethical dilemma. Logical constraints and limitations of resources could be additional reasons for omitting control groups. Establishing control groups can often be logistically complex, involving challenges such as scheduling, participant coordination, and adherence to the research protocol. Another factor that may come into play is limited resources. Sometimes, researchers must prioritize one group over the other because of financial constraints.\u003c/p\u003e\n\u003cp\u003eFurthermore, unlike in clinical trials where blinding participants and researchers is feasible (e.g., through placebo use), it is impractical or impossible to anonymize teacher evaluations in PD programs for teachers, which introduces the risk of bias evaluation. Although we included performance bias in our coding, it would not provide useful information for assessing the risk of bias. Regarding attrition bias, approximately 70% of the studies included in the current meta-analysis reported the required pre- and postsample sizes, whereas the remaining 30% did not provide this information. Given the significance of sample size in PD studies and potential meta-analyses, it is strongly recommended that researchers report this information.\u003c/p\u003e\n\u003cp\u003eFinally, it is important to note that our study did not incorporate unpublished research, such as dissertations and theses, which could potentially introduce a publication bias. Although we have taken steps to account for this possibility through various analyses, it is crucial to acknowledge that our findings may lean toward an overestimation of true effectiveness. In future research, adding unpublished studies, or at least dissertations and theses, would ensure a more comprehensive review.\u0026nbsp;\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eTeacher PD programs are a tool for preparing and supporting a high-quality workforce to improve overall student achievement. Most STEM teacher PD programs aim to improve diverse teacher and student outcomes, which are shared goals of the PD programs. A wealth of studies has addressed the effects of PD programs for K\u0026ndash;12 STEM teachers; however, little research has systematically assessed the results of previous PD research. This study examines the merged findings on the effects of PD for STEM teachers and the effects of its potential moderators; specifically,\u0026nbsp;the variation of effects related to the dosage hours and grade levels across outcomes suggests that researchers should consider the specific outcome that is most relevant for their intervention.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThis study helps educators and education policy communities better understand PD research and the ways PD can be designed and implemented most effectively by aggregating research findings across different STEM PD studies. Future researchers will need to take a more rigorous approach regarding the caliber of studies to be conducted in terms of design quality and length as well, as the content and type of PD delivery undertaken.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eFunding Declaration: No funding was provided for the completion of this work.\u003c/p\u003e\n\u003ch2\u003eAcknowledgement\u003c/h2\u003e\n\u003cp\u003eNA\u003c/p\u003e\n\u003ch2\u003eData Availability\u003c/h2\u003e\n\u003cp\u003eThe datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAcuna, E., \u0026amp; Rodriguez, C. (2004). \u003cem\u003eA meta-analysis study of outlier detection methods in classification\u003c/em\u003e. Technical paper, Department of Mathematics, University of Puerto Rico at Mayaguez, \u003cem\u003e1\u003c/em\u003e, 25.\u003c/li\u003e\n\u003cli\u003eAuthor (2019).\u003c/li\u003e\n\u003cli\u003eAuthors (2021).\u003c/li\u003e\n\u003cli\u003eBanilower, E. R., Boyd, S. E., Pasley, J. D., \u0026amp; Weiss, I. R. (2006)\u003cem\u003e. Lessons from a decade of mathematics and science reform: A capstone report for the local systemic change through teacher enhancement initiative. \u003c/em\u003eHorizon Research, Inc.\u003c/li\u003e\n\u003cli\u003eBarak, M., \u0026amp; Assal, M. (2018). Robotics and STEM learning: Students\u0026rsquo; achievements in assignments according to the P3 Task Taxonomy\u0026mdash;Practice, problem solving, and projects. \u003cem\u003eInternational Journal of Technology and Design Education\u003c/em\u003e, \u003cem\u003e28\u003c/em\u003e(1), 121\u0026ndash;144.\u003c/li\u003e\n\u003cli\u003eBasma, B., \u0026amp; Savage, R. (2018). Teacher professional development and student literacy growth: A systematic review and meta-analysis. \u003cem\u003eEducational Psychology Review\u003c/em\u003e, \u003cem\u003e30\u003c/em\u003e(2), 457\u0026ndash;481.\u003c/li\u003e\n\u003cli\u003eBerisha, F., \u0026amp; Vula, E. (2021). Developing pre-service teachers\u0026rsquo; conceptualization of STEM and STEM pedagogical practices. \u003cem\u003eFrontiers in Education\u003c/em\u003e,\u003cem\u003e 6\u003c/em\u003e, Article 585075. https://doi.org/10.3389/feduc.2021.585075\u003c/li\u003e\n\u003cli\u003eBirman, B. F., Desimone, L., Porter, A. C., \u0026amp; Garet, M. S. (2000). Designing professional development that works. \u003cem\u003eEducational Leadership\u003c/em\u003e,\u003cem\u003e \u003c/em\u003e\u003cem\u003e57\u003c/em\u003e(8), 28\u0026ndash;33.\u003c/li\u003e\n\u003cli\u003eBlank, R. K., \u0026amp; de Las Alas, N. (2009). \u003cem\u003eEffects of teacher professional development on gains in student achievement.\u003c/em\u003e Council of Chief State School Officers.\u003c/li\u003e\n\u003cli\u003eCampbell Collaboration. (2021). Campbell systematic reviews: policies and guidelines. Version 1.8. https://onlinelibrary.wiley.com/pbassets/assets/18911803/ Campbell%20Policies%20and %20Guidelines%20_May3%202022-1653054593497.pdf\u003c/li\u003e\n\u003cli\u003eChitpin, S. (2011). Can mentoring and reflection cause change in teaching practice? A professional development journey of a Canadian teacher educator. \u003cem\u003eProfessional Development in Education\u003c/em\u003e, \u003cem\u003e37\u003c/em\u003e(2), 225\u0026ndash;240.\u003c/li\u003e\n\u003cli\u003eClary, R. M., Dunne, J. A., Elder, A. D., Saebo, S., Beard, D. J., Wax, C. L., Winter, J., \u0026amp; Tucker, D. L. (2017). Optimizing online content instruction for effective hybrid teacher professional development programs. \u003cem\u003eJournal of Science Teacher Education\u003c/em\u003e, \u003cem\u003e28\u003c/em\u003e(6), 507\u0026ndash;521.\u003c/li\u003e\n\u003cli\u003eCobb, P., Confrey, J., DiSessa, A., Lehrer, R., \u0026amp; Schauble, L. (2003). Design experiments in educational research. \u003cem\u003eEducational Researcher\u003c/em\u003e, \u003cem\u003e32\u003c/em\u003e(1), 9\u0026ndash;13.\u003c/li\u003e\n\u003cli\u003eCooper, H., Hedges, L. V., \u0026amp; Valentine, J. C. (Eds.). (2019). \u003cem\u003eThe handbook of research synthesis and meta-analysis\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e Russell Sage Foundation.\u003c/li\u003e\n\u003cli\u003eDarling-Hammond, L. (1998). Teachers and teaching: Testing policy hypotheses from a national commission report. \u003cem\u003eEducational Researcher\u003c/em\u003e, \u003cem\u003e27\u003c/em\u003e(1), 5\u0026ndash;15.\u003c/li\u003e\n\u003cli\u003eDarling-Hammond, L., \u0026amp; Richardson, N. (2009). Research review/teacher learning: What matters. \u003cem\u003eEducational Leadership\u003c/em\u003e,\u003cem\u003e 66\u003c/em\u003e(5), 46\u0026ndash;53.\u003c/li\u003e\n\u003cli\u003eDarling-Hammond, L., Hyler, M. E., \u0026amp; Gardner, M. (2017). \u003cem\u003eEffective teacher professional development\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e Learning Policy Institute. \u003c/li\u003e\n\u003cli\u003eDede, C. (2009). Immersive interfaces for engagement and learning. \u003cem\u003eScience\u003c/em\u003e, \u003cem\u003e323\u003c/em\u003e(5910), 66\u0026ndash;69.\u003c/li\u003e\n\u003cli\u003eDerting, T. L., Ebert-May, D., Henkel, T. P., Maher, J. M., Arnold, B., \u0026amp; Passmore, H. A. (2016). Assessing faculty professional development in STEM higher education: Sustainability of outcomes. \u003cem\u003eScience Advances\u003c/em\u003e, \u003cem\u003e2\u003c/em\u003e(3), e1501422.\u003c/li\u003e\n\u003cli\u003eDesimone, L. M. (2009). Improving impact studies of teachers\u0026rsquo; professional development: Toward better conceptualizations and measures. \u003cem\u003eEducational Researcher\u003c/em\u003e,\u003cem\u003e 38\u003c/em\u003e(3), 181\u0026ndash;199.\u003c/li\u003e\n\u003cli\u003eDesimone, L. M. (2011). A primer on effective professional development. \u003cem\u003ePhi Delta Kappan\u003c/em\u003e,\u003cem\u003e \u003c/em\u003e\u003cem\u003e92\u003c/em\u003e(6), 68\u0026ndash;71.\u003c/li\u003e\n\u003cli\u003eDesimone, L. M., \u0026amp; Pak, K. (2017). Instructional coaching as high-quality professional development. \u003cem\u003eTheory Into Practice\u003c/em\u003e,\u003cem\u003e 56\u003c/em\u003e(1), 3\u0026ndash;12.\u003c/li\u003e\n\u003cli\u003eDuval, S., \u0026amp; Tweedie, R. (2000). Trim and fill: A simple funnel‐plot\u0026ndash;based method of testing and adjusting for publication bias in meta‐analysis. \u003cem\u003eBiometrics\u003c/em\u003e,\u003cem\u003e 56\u003c/em\u003e(2), 455\u0026ndash;463.\u003c/li\u003e\n\u003cli\u003eEgert, F., Fukkink, R. G., \u0026amp; Eckhardt, A. G. (2018). Impact of in-service professional development programs for early childhood teachers on quality ratings and child outcomes: A meta-analysis. \u003cem\u003eReview of Educational Research\u003c/em\u003e,\u003cem\u003e \u003c/em\u003e\u003cem\u003e88\u003c/em\u003e(3), 401\u0026ndash;433.\u003c/li\u003e\n\u003cli\u003eEgger, M., Smith, G., Schneider, M., \u0026amp; Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. \u003cem\u003eBritish Medical Journal, 315\u003c/em\u003e(7109), 629\u0026ndash;634. \u003c/li\u003e\n\u003cli\u003eFisher, Z., \u0026amp; Tipton, E. (2015). \u003cem\u003eRobumeta: An R-package for robust variance estimation in meta-analysis\u003c/em\u003e (No. 02220; ArXiv Preprint). ArXiv. https://arxiv.org/abs/1506.02220\u003c/li\u003e\n\u003cli\u003eFishman, E. J., Borko, H., Osborne, J., Gomez, F., Rafanelli, S., Reigh, E., Tseng, A., Million, S., \u0026amp; Berson, E. (2017). A practice-based professional development program to support scientific argumentation from evidence in the elementary classroom. \u003cem\u003eJournal of Science Teacher Education, 28\u003c/em\u003e(3), 222\u0026ndash;249.\u003c/li\u003e\n\u003cli\u003eFleiss, J. L. (1971). Measuring nominal scale agreement among many raters. \u003cem\u003ePsychological Bulletin\u003c/em\u003e, \u003cem\u003e76\u003c/em\u003e(5), 378\u0026ndash;382.\u003c/li\u003e\n\u003cli\u003eFurtak, E. M., Circi, R., \u0026amp; Heredia, S. C. (2018). Exploring alignment among learning progressions, teacher-designed formative assessment tasks, and student growth: Results of a four-year study. \u003cem\u003eApplied Measurement in Education\u003c/em\u003e, \u003cem\u003e31\u003c/em\u003e(2), 143\u0026ndash;156.\u003c/li\u003e\n\u003cli\u003eGaret, M. S., Porter, A. C., Desimone, L., Birman, B. F., \u0026amp; Yoon, K. S. (2001). What makes professional development effective? Results from a national sample of teachers. \u003cem\u003eAmerican Educational Research Journal\u003c/em\u003e\u003cem\u003e, 38\u003c/em\u003e(4), 915\u0026ndash;945.\u003c/li\u003e\n\u003cli\u003eGaret, M. S., Heppen, J., Walters, K., Smith, T., \u0026amp; Yang, R. (2016). Does content-focused teacher professional development work? Findings from three Institute of Education Sciences studies. \u003cem\u003eNational Center for Education Evaluation and Regional Assistance, Institute of Education Sciences. \u003c/em\u003ehttps://ies.ed.gov/ncee/pubs/20174010/pdf/20174010.pdf\u003c/li\u003e\n\u003cli\u003eGleser, L. \u0026amp; Olkin, I. (2009). Stochastically dependent effect sizes. \u003cem\u003eThe Handbook of Research Synthesis and Meta-analysis\u003c/em\u003e, \u003cem\u003e2\u003c/em\u003e, 357\u0026ndash;376.\u003c/li\u003e\n\u003cli\u003eGonzalez K., Lynch K., Hill H. C. (2022). \u003cem\u003eA meta-analysis of the experimental evidence linking STEM classroom interventions to teacher knowledge, classroom instruction, and student achievement\u003c/em\u003e. EdWorkingPaper.\u003c/li\u003e\n\u003cli\u003eGreene, B. A., Lubin, I. A., Slater, J. L., \u0026amp; Walden, S. E. (2013). Mapping changes in science teachers\u0026rsquo; content knowledge: Concept maps and authentic professional development. \u003cem\u003eJournal of Science Education and Technology\u003c/em\u003e, \u003cem\u003e22\u003c/em\u003e, 287\u0026ndash;299.\u003c/li\u003e\n\u003cli\u003eGuskey, T. R. (2000). \u003cem\u003eEvaluating professional development\u003c/em\u003e. Corwin.\u003c/li\u003e\n\u003cli\u003eGuskey, T. R. (2002). Professional development and teacher change. \u003cem\u003eTeachers and Teaching, 8\u003c/em\u003e(3), 381\u0026ndash;391.\u003c/li\u003e\n\u003cli\u003eGuskey, T. R., \u0026amp; Sparks, D. (1996). Exploring the relationship between staff development and improvements in student learning. \u003cem\u003eJournal of Staff Development, 17\u003c/em\u003e(4), 34\u0026ndash;38.\u003c/li\u003e\n\u003cli\u003eGuskey, T. R., \u0026amp; Yoon, K. S. (2009). What works in professional development? \u003cem\u003ePhi Delta Kappan\u003c/em\u003e, \u003cem\u003e90\u003c/em\u003e(7), 495\u0026ndash;500.\u003c/li\u003e\n\u003cli\u003eHawley, P. H., \u0026amp; Sinatra, G. M. (2019). Declawing the dinosaurs in the science classroom: Reducing Christian teachers\u0026rsquo; anxiety and increasing their efficacy for teaching evolution. \u003cem\u003eJournal of Research in Science Teaching\u003c/em\u003e, \u003cem\u003e56\u003c/em\u003e(4), 375\u0026ndash;401.\u003c/li\u003e\n\u003cli\u003eHayes, K. N., Bae, C. L., O\u0026rsquo;Connor, D., \u0026amp; Seitz, J. C. (2020). Beyond funding: How organizational resources support science professional learning. \u003cem\u003eAmerican Journal of Education\u003c/em\u003e, \u003cem\u003e126\u003c/em\u003e(3), 389-422.\u003c/li\u003e\n\u003cli\u003eHayes, K. N., Wheaton, M., \u0026amp; Tucker, D. (2019). Understanding teacher instructional change: The case of integrating NGSS and stewardship in professional development. \u003cem\u003eEnvironmental Education Research\u003c/em\u003e,\u003cem\u003e \u003c/em\u003e\u003cem\u003e25\u003c/em\u003e(1), 115\u0026ndash;134.\u003c/li\u003e\n\u003cli\u003eHedges, L.V., Tipton, E., \u0026amp; Johnson, M. (2010). Robust variance estimation in meta regression with dependent effect size estimates. \u003cem\u003eResearch Synthesis Methods\u003c/em\u003e,\u003cem\u003e 1\u003c/em\u003e(1): 39\u0026ndash;65.\u003c/li\u003e\n\u003cli\u003eHiggins, J. P., Altman, D. G., G\u0026oslash;tzsche, P. C., J\u0026uuml;ni, P., Moher, D., Oxman, A. D., Savović, J., Schulz, K. F., \u0026amp; Sterne, J. A. (2011). The Cochrane Collaboration\u0026rsquo;s tool for assessing risk of bias in randomized trials. \u003cem\u003eBritish Medical Journal\u003c/em\u003e, 343. 1\u0026ndash;9.\u003c/li\u003e\n\u003cli\u003eHill, H. C., Beisiegel, M., \u0026amp; Jacob, R. (2013). Professional development research: Consensus, crossroads, and challenges. \u003cem\u003eEducational Researcher\u003c/em\u003e, \u003cem\u003e42\u003c/em\u003e(9), 476\u0026ndash;487.\u003c/li\u003e\n\u003cli\u003eKarisan, D., Macalalag, A., \u0026amp; Johnson, J. (2019). The effect of methods courses on preservice teachers\u0026rsquo; awareness and intentions of teaching science, technology, engineering, and mathematics (STEM) subjects. \u003cem\u003eInternational Journal of Research in Education and Science\u003c/em\u003e,\u003cem\u003e 5\u003c/em\u003e(1), 22\u0026ndash;35.\u003c/li\u003e\n\u003cli\u003eKelcey, B., Spybrook, J., Phelps, G., Jones, N., \u0026amp; Zhang, J. (2017). Designing large-scale multisite and cluster-randomized studies of professional development. \u003cem\u003eThe Journal of Experimental Education\u003c/em\u003e,\u003cem\u003e \u003c/em\u003e\u003cem\u003e85\u003c/em\u003e(3), 389\u0026ndash;410.\u003c/li\u003e\n\u003cli\u003eKennedy, M. M. (2016). How does professional development improve teaching? \u003cem\u003eReview of Educational Research\u003c/em\u003e, \u003cem\u003e86\u003c/em\u003e(4), 945\u0026ndash;980.\u003c/li\u003e\n\u003cli\u003eKim, H. J., Miller, H. R., Herbert, B., Pedersen, S., \u0026amp; Loving, C. (2012). Using a wiki in a scientist\u0026ndash;teacher professional learning community: Impact on teacher perception changes. \u003cem\u003eJournal of Science Education and Technology\u003c/em\u003e, \u003cem\u003e21\u003c/em\u003e, 440\u0026ndash;452.\u003c/li\u003e\n\u003cli\u003eKlan, T. (2017, Nov). How to create a cost-effective PD program that impresses. https://www.eschoolnews.com/2017/11/13/cost-effective-pd-program/\u003c/li\u003e\n\u003cli\u003eKleickmann, T., Tr\u0026ouml;bst, S., Jonen, A., Vehmeyer, J., \u0026amp; M\u0026ouml;ller, K. (2016). The effects of expert scaffolding in elementary science professional development on teachers\u0026rsquo; beliefs and motivations, instructional practices, and student achievement. \u003cem\u003eJournal of Educational Psychology\u003c/em\u003e,\u003cem\u003e \u003c/em\u003e\u003cem\u003e108\u003c/em\u003e(1), 21 \u0026ndash;42.\u003c/li\u003e\n\u003cli\u003eKnight, S. L., Parker, D., Zimmerman, W., \u0026amp; Ikhlief, A. (2014). Relationship between perceived and observed student-centred learning environments in Qatari elementary mathematics and science classrooms. \u003cem\u003eLearning Environments Research\u003c/em\u003e,\u003cem\u003e \u003c/em\u003e\u003cem\u003e17\u003c/em\u003e(1), 29\u0026ndash;47.\u003c/li\u003e\n\u003cli\u003eKraft, M. A., Blazar, D., \u0026amp; Hogan, D. (2018). The effect of teacher coaching on instruction and achievement: A meta-analysis of the causal evidence. \u003cem\u003eReview of Educational Research\u003c/em\u003e, \u003cem\u003e88\u003c/em\u003e(4), 547\u0026ndash;588.\u003c/li\u003e\n\u003cli\u003eKuehnert, E., Cason, M., Young, J., \u0026amp; Pratt, S. (2019). A meta-analysis of reform-based professional development in STEM: Implications for effective praxis. \u003cem\u003eInternational Journal of Technology in Education\u003c/em\u003e,\u003cem\u003e 2\u003c/em\u003e(1), 60\u0026ndash;68.\u003c/li\u003e\n\u003cli\u003eLawless, K. A., \u0026amp; Pellegrino, J. W. (2007). Professional development in integrating technology into teaching and learning: Knowns, unknowns, and ways to pursue better questions and answers. \u003cem\u003eReview of Educational Research\u003c/em\u003e,\u003cem\u003e 77\u003c/em\u003e(4), 575\u0026ndash;614.\u003c/li\u003e\n\u003cli\u003eLevitt, K. E. (2002). An analysis of elementary teachers\u0026rsquo; beliefs regarding the teaching and learning of science. \u003cem\u003eScience Education\u003c/em\u003e, \u003cem\u003e86\u003c/em\u003e(1), 1\u0026ndash;22.\u003c/li\u003e\n\u003cli\u003eLiao, Y. C. (2018). \u003cem\u003eCoaching in teacher professional development for technology integration: Examining teacher practices and perceptions\u003c/em\u003e (Doctoral dissertation, Indiana University).\u003c/li\u003e\n\u003cli\u003eLindvall, J., \u0026amp; Ryve, A. (2019). Coherence and the positioning of teachers in professional development programs. A systematic review. \u003cem\u003eEducational Research Review\u003c/em\u003e, \u003cem\u003e27\u003c/em\u003e, 140\u0026ndash;154.\u003c/li\u003e\n\u003cli\u003eLipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W., Roberts, M., Anthony, K. S., \u0026amp; Busick, M. D. (2012). \u003cem\u003eTranslating the statistical representation of the effects of\u003c/em\u003e\u003c/li\u003e\n\u003cli\u003e\u003cem\u003e \u003c/em\u003e\u003c/li\u003e\n\u003cli\u003e\u003cem\u003e education interventions into more readily interpretable forms.\u003c/em\u003e National Center for Special Education Research.\u003c/li\u003e\n\u003cli\u003eLotter, C. R., Thompson, S., Dickenson, T. S., Smiley, W. F., Blue, G., \u0026amp; Rea, M. (2018). The impact of a practice-teaching professional development model on teachers\u0026rsquo; inquiry instruction and inquiry efficacy beliefs. \u003cem\u003eInternational Journal of Science and Mathematics Education\u003c/em\u003e, \u003cem\u003e16\u003c/em\u003e(2), 255\u0026ndash;273.\u003c/li\u003e\n\u003cli\u003eLoucks-Horsley, S., Stiles, K. E., Mundry, S., Love, N., \u0026amp; Hewson, P. W. (2009). \u003cem\u003eDesigning professional development for teachers of science and mathematics\u003c/em\u003e\u003cem\u003e.\u003c/em\u003e Corwin.\u003c/li\u003e\n\u003cli\u003eLynch, K., Hill, H. C., Gonzalez, K. E., \u0026amp; Pollard, C. (2019). Strengthening the research base that informs STEM instructional improvement efforts: A meta-analysis. \u003cem\u003eEducational Evaluation and Policy Analysis\u003c/em\u003e, \u003cem\u003e41\u003c/em\u003e(3), 260\u0026ndash;293.\u003c/li\u003e\n\u003cli\u003eMaeng, J. L., Whitworth, B. A., Bell, R. L., \u0026amp; Sterling, D. R. (2020). The effect of professional development on elementary science teachers\u0026rsquo; understanding, confidence, and classroom implementation of reform‐based science instruction\u003cem\u003e.\u003c/em\u003e\u003cem\u003e Science Education\u003c/em\u003e,\u003cem\u003e 104\u003c/em\u003e(2), 326\u0026ndash;353.\u003c/li\u003e\n\u003cli\u003eMalanson, K., Jacque, B., Faux, R., \u0026amp; Meiri, K. F. (2014). Modeling for fidelity: Virtual mentorship by scientists fosters teacher self-efficacy and promotes implementation of novel high school biomedical curricula. \u003cem\u003ePLoS One\u003c/em\u003e,\u003cem\u003e 9\u003c/em\u003e(12), e114929.\u003c/li\u003e\n\u003cli\u003eMarra, R. M., Arbaugh, F., Lannin, J., Abell, S., Ehlert, M., Smith, R., Merle-Johnson, D., \u0026amp; Rogers, M. P. (2011). Orientations to professional development design and implementation: Understanding their relationship to PD outcomes across multiple projects. \u003cem\u003eInternational Journal of Science and Mathematics Education\u003c/em\u003e,\u003cem\u003e \u003c/em\u003e\u003cem\u003e9\u003c/em\u003e(4), 793\u0026ndash;816.\u003c/li\u003e\n\u003cli\u003eMcNaull, A. (2014, March). Federal STEM educator professional development programs: A discussion of funding, approaches, and implementation. In \u003cem\u003eAPS April Meeting Abstracts\u003c/em\u003e (Vol. 2014, ID. S10-008).\u003c/li\u003e\n\u003cli\u003eMillar, M. G., \u0026amp; Tesser, A. (1989). The effects of affective\u0026ndash;cognitive consistency and thought on the attitude\u0026ndash;behavior relation. \u003cem\u003eJournal of Experimental Social Psychology\u003c/em\u003e, \u003cem\u003e25\u003c/em\u003e(2), 189\u0026ndash;202.\u003c/li\u003e\n\u003cli\u003eNadelson, L. S., Pfiester, J., Callahan, J., \u0026amp; Pyke, P. (2015). Who is doing the engineering, the student or the teacher? The development and use of a rubric to categorize level of design for the elementary classroom. \u003cem\u003eJournal of Technology Education\u003c/em\u003e, \u003cem\u003e26\u003c/em\u003e(2), 22\u0026ndash;45.\u003c/li\u003e\n\u003cli\u003ePark, B., Plass, J. L., \u0026amp; Br\u0026uuml;nken, R. (2014). Cognitive and affective processes in multimedia learning. \u003cem\u003eLearning and Instruction\u003c/em\u003e, \u003cem\u003e29\u003c/em\u003e, 125\u0026ndash;127.\u003c/li\u003e\n\u003cli\u003eRamey, S., Crowell, N. A., Ramey, C. T., Grace, C., Timraz, N., \u0026amp; Davis, L. E. (2011). The dosage of professional development for early childhood professionals: How the amount and density of professional development may influence its effectiveness. In \u003cem\u003eThe early childhood educator professional development grant: Research and practice\u003c/em\u003e (pp. 11\u0026ndash;32). Emerald Group.\u003c/li\u003e\n\u003cli\u003ePolgampala, A. S. V., Shen, H., \u0026amp; Huang, F. (2017). STEM teacher education and professional development and training: Challenges and trends. \u003cem\u003eAmerican Journal of Applied Psychology\u003c/em\u003e, \u003cem\u003e6\u003c/em\u003e(5), 93\u0026ndash;97.\u003c/li\u003e\n\u003cli\u003eRobinson, A., Dailey, D., Hughes, G., \u0026amp; Cotabish, A. (2014). The effects of a science-focused STEM intervention on gifted elementary students\u0026rsquo; science knowledge and skills. \u003cem\u003eJournal of Advanced Academics\u003c/em\u003e,\u003cem\u003e \u003c/em\u003e\u003cem\u003e25\u003c/em\u003e(3), 189\u0026ndash;213.\u003c/li\u003e\n\u003cli\u003eRoman, A. F. (2019). Identifying the preschool teachers\u0026rsquo; needs on transversal competences training career using the questionnaire. \u003cem\u003eEducația Plus\u003c/em\u003e,\u003cem\u003e 23\u003c/em\u003e(SP IS), 164\u0026ndash;170.\u003c/li\u003e\n\u003cli\u003eRoth, K. J., Wilson, C. D., Taylor, J. A., Stuhlsatz, M. A., \u0026amp; Hvidsten, C. (2019). Comparing the effects of analysis-of-practice and content-based professional development on teacher and student outcomes in science. \u003cem\u003eAmerican Educational Research Journal\u003c/em\u003e, \u003cem\u003e56\u003c/em\u003e(4), 1217\u0026ndash;1253.\u003c/li\u003e\n\u003cli\u003eRStudio Team (2020). \u003cem\u003eRStudio: Integrated Development for R. RStudio\u003c/em\u003e, PBC, Boston, MA. http://www.rstudio.com\u003c/li\u003e\n\u003cli\u003eRussell, J. A., \u0026amp; Pratt, G. (1980). A description of the affective quality attributed to environments. \u003cem\u003eJournal of Personality and Social Psychology\u003c/em\u003e, \u003cem\u003e38\u003c/em\u003e(2), 311 \u0026ndash;322.\u003c/li\u003e\n\u003cli\u003eShulman, L. S. (1986). Those who understand: Knowledge growth in teaching. \u003cem\u003eEducational Researcher\u003c/em\u003e,\u003cem\u003e 15\u003c/em\u003e(2), 4\u0026ndash;14.\u003c/li\u003e\n\u003cli\u003eTanner-Smith, E. E., Tipton, E., \u0026amp; Polanin, J. R. (2016). Handling complex meta-analytic data structures using robust variance estimates: A tutorial in R. \u003cem\u003eJournal of Developmental and Life-Course Criminology\u003c/em\u003e,\u003cem\u003e 2\u003c/em\u003e(1), 85\u0026ndash;112.\u003c/li\u003e\n\u003cli\u003eTobin, R. G., Crissman, S., Doubler, S., Gallagher, H., Goldstein, G., Lacy, S., Rogers, C. B., Schwartz, J., \u0026amp; Wagoner, P. (2012). Teaching teachers about energy: Lessons from an inquiry-based workshop for K\u0026ndash;8 teachers. \u003cem\u003eJournal of Science Education and Technology\u003c/em\u003e,\u003cem\u003e 21\u003c/em\u003e, 631\u0026ndash;639.\u003c/li\u003e\n\u003cli\u003eTsamir, P., Tirosh, D., Levenson, E., Tabach, M., \u0026amp; Barkai, R. (2014). Developing preschool teachers\u0026rsquo; knowledge of students\u0026rsquo; number conceptions. \u003cem\u003eJournal of Mathematics Teacher Education\u003c/em\u003e,\u003cem\u003e 17\u003c/em\u003e, 61\u0026ndash;83.\u003c/li\u003e\n\u003cli\u003eTukey, J. W. (1977). \u003cem\u003eExploratory data analysis\u003c/em\u003e. Addison-Wesley.\u003c/li\u003e\n\u003cli\u003eVan Driel, J. H., Meirink, J. A., van Veen, K., \u0026amp; Zwart, R. C. (2012). Current trends and missing links in studies on teacher professional development in science education: A review of design features and quality of research. \u003cem\u003eStudies in Science Education\u003c/em\u003e,\u003cem\u003e 48\u003c/em\u003e(2), 129\u0026ndash;160.\u003c/li\u003e\n\u003cli\u003eVan den Noortgate, W., L\u0026oacute;pez-L\u0026oacute;pez, J. A., Mar\u0026iacute;n-Mart\u0026iacute;nez, F., \u0026amp; S\u0026aacute;nchez-Meca, J. (2013). Three-level meta-analyses of dependent effect sizes. \u003cem\u003eBehavior Research Methods\u003c/em\u003e, 45, 576\u0026ndash;594.\u003c/li\u003e\n\u003cli\u003eVan Driel, J. H., Beijaard, D., \u0026amp; Verloop, N. (2001). Professional development and reform in science education: The role of teachers\u0026rsquo; practical knowledge. \u003cem\u003eJournal of Research in Science Teaching\u003c/em\u003e, \u003cem\u003e38\u003c/em\u003e(2), 137\u0026ndash;158.\u003c/li\u003e\n\u003cli\u003eVan Veen, K., Zwart, R. C., \u0026amp; Meirink, J. A. (2012). What makes teacher professional development effective? A literature review. In M. Kooy \u0026amp; K. van Veen (Eds.), \u003cem\u003eTeacher learning that matters \u003c/em\u003e(pp. 3\u0026ndash;21). Routledge.\u003c/li\u003e\n\u003cli\u003eViechtbauer W (2010). Conducting meta-analyses in R with the metafor package, \u003cem\u003eJournal of statistical software, 36\u003c/em\u003e(3), 1\u0026ndash;48. \u003c/li\u003e\n\u003cli\u003eViera, A. J., \u0026amp; Garrett, J. M. (2005). Understanding interobserver agreement: The kappa statistic. \u003cem\u003eFamily Medicine\u003c/em\u003e,\u003cem\u003e 37\u003c/em\u003e(5), 360\u0026ndash;363.\u003c/li\u003e\n\u003cli\u003eWayne, A. J., Yoon, K. S., Zhu, P., Cronen, S., \u0026amp; Garet, M. S. (2008). Experimenting with teacher professional development: Motives and methods. \u003cem\u003eEducational Researcher\u003c/em\u003e,\u003cem\u003e 37\u003c/em\u003e(8), 469\u0026ndash;479.\u003c/li\u003e\n\u003cli\u003eWebster-Wright, A. (2009). Reframing professional development through understanding authentic professional learning. \u003cem\u003eReview of Educational Research, 79\u003c/em\u003e(2), 702\u0026ndash;739.\u003c/li\u003e\n\u003cli\u003eWhitworth, B. A., Maeng, J. L., \u0026amp; Bell, R. L. (2018). Exploring practices of science coordinators participating in targeted professional development. \u003cem\u003eScience Education\u003c/em\u003e,\u003cem\u003e \u003c/em\u003e\u003cem\u003e102\u003c/em\u003e(3), 474\u0026ndash;497.\u003c/li\u003e\n\u003cli\u003eYildirim, B., Topalcengiz, E. S., Arikan, G., \u0026amp; Timur, S. (2020). Using virtual reality in the classroom: Reflections of STEM teachers on the use of teaching and learning tools. \u003cem\u003eJournal of Education in Science Environment and Health\u003c/em\u003e,\u003cem\u003e 6\u003c/em\u003e(3), 231\u0026ndash;245.\u003c/li\u003e\n\u003cli\u003eYin, Y., Olson, J., Olson, M., Solvin, H., \u0026amp; Brandon, P. R. (2015). Comparing two versions of professional development for teachers using formative assessment in networked mathematics classrooms. \u003cem\u003eJournal of Research on Technology in Education\u003c/em\u003e, \u003cem\u003e47\u003c/em\u003e(1), 41\u0026ndash;70.\u003c/li\u003e\n\u003cli\u003eYoon, K. S., Duncan, T., Lee, S. W.-Y., Scarloss, B., \u0026amp; Shapley, K. (2007). Reviewing the evidence on how teacher professional development affects student achievement (Issues \u0026amp; Answers Report, REL 2007\u0026ndash;No. 033). U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southwest. https://files.eric.ed.gov/fulltext/ED498548.pdf\u003c/li\u003e\n\u003cli\u003eYoon, S. A., Anderson, E., Koehler-Yom, J., Evans, C., Park, M., Sheldon, J., Schoenfeld, L., Wendel, D., Scheintaub, H., \u0026amp; Klopfer, E. (2017). Teaching about complex systems is no simple matter: Building effective professional development for computer-supported complex systems instruction. \u003cem\u003eInstructional Science\u003c/em\u003e,\u003cem\u003e \u003c/em\u003e\u003cem\u003e45\u003c/em\u003e(1), 99\u0026ndash;121.\u003c/li\u003e\n\u003cli\u003eZaslow, M., Tout, K., Halle, T., Whittaker, J. V., \u0026amp; Lavelle, B. (2010). Toward the identification of features of effective professional development for early childhood educators. Office of Planning, Evaluation and Policy Development, U.S. Department of Education. https://eric.ed.gov/?id=ED527140\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTables 1 to 5 are available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"professional development, STEM teachers, K–12, meta-analysis","lastPublishedDoi":"10.21203/rs.3.rs-6602739/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6602739/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis study includes meta-analytic evidence from 118 studies published 2010\u0026ndash;2022 demonstrating that professional development (PD) programs for science, engineering, technology, and math teachers effectively support teacher content knowledge and pedagogical quality and improve student academic performance. We explored the overarching impact of PD programs and analyzed how various characteristics contribute to the observed effects of these programs. To identify relevant studies, we searched four databases and focused on peer-reviewed English language journal articles available in full text. A selection of articles was made that used quantitative research designs and provided adequate data to estimate effect sizes. Subsequently, we assessed the potential risk of bias to evaluate the quality of the selected studies. The significant effect size (0.739, 95%CI [0.637, 0.842]) of PD programs found in our meta-analysis aligns with that of previous meta-analyses and systematic reviews that have synthesized findings on the impact of PD at the teacher and student levels. Substantial heterogeneity of the effect sizes was moderated by PD dosage hours and grade levels. The results indicated that the pooled effect size was much larger when the program length was greater than 80 hours. Additionally, if a study focused on a specific grade level, the magnitude of the effectiveness of PD was larger than that in a combination of different grade-level groups. By using aggregations of research findings across various studies, the present overview will aid educators and education policy communities in better understanding PD research and how it can be designed and implemented most effectively.\u003c/p\u003e","manuscriptTitle":"A Meta-Analysis of the Effects of Teacher Professional Development in STEM Education: What Have We Done, and Where Are We Going?","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-06-04 19:05:11","doi":"10.21203/rs.3.rs-6602739/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"5d80c17f-e875-46da-ac68-f8984ddbd2ae","owner":[],"postedDate":"June 4th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-09-19T20:08:27+00:00","versionOfRecord":[],"versionCreatedAt":"2025-06-04 19:05:11","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6602739","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6602739","identity":"rs-6602739","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00