Teacher AI Literacy for Multilingual Learner Instruction: Scale Development and Initial Validity Evidence

doi:10.21203/rs.3.rs-9215817/v1

Teacher AI Literacy for Multilingual Learner Instruction: Scale Development and Initial Validity Evidence

2026 · doi:10.21203/rs.3.rs-9215817/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 134,092 characters · extracted from preprint-html · click to expand

Teacher AI Literacy for Multilingual Learner Instruction: Scale Development and Initial Validity Evidence | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Teacher AI Literacy for Multilingual Learner Instruction: Scale Development and Initial Validity Evidence Showrav Chowdhury This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9215817/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Higher education institutions require a methodologically sound measure of teachers’ ability to utilize GenAI in the classroom. Current measures focus on general competencies and fail to address the specific pedagogical requirements, instructional judgments and ethical considerations. The purpose of this study is to develop, test and analyze the factor structure of a measure of teacher AI literacy for ML instruction and examine its relationship with teacher responsible-use intentions and vignette-based decision making. Utilizing a convergent mixed-methods design, this study surveys 32 in-service teachers. Descriptive statistics and reliability are calculated for all quantitative scales. EFA is conducted on the AI literacy scale. Pearson correlations are calculated to examine relationships among the quantitative variables. Regression models are also estimated to predict responsible-use intentions and vignette-based decision-making. Thematic analysis is employed to analyze the qualitative responses. The results indicate moderate to high levels of AI literacy, high internal consistency reliability and high levels of responsible-use intentions. The exploratory factor analysis results in a three-factor solution explaining 69.1% of the variance. Finally, AI literacy is positively correlated with both responsible-use intentions and vignette-based decision-making. These findings provide empirical support for developing domain-specific assessments and underscore AI literacy as more than just general AI confidence. teacher AI literacy multilingual learner instruction scale development validity evidence mixed-methods TESOL-aligned pedagogy 1. Introduction Now-a-days generative artificial intelligence tools begin to integrate into regular instructional practice to aid with lesson planning, provide draft responses to student work, assist with differentiation of teaching resources and provide language-based support in multilingual learner (ML) classrooms. At the same time the U.S. Department of Education, Office of Educational Technology emphasizes the potential for AI to impact teaching and learning but also emphasizes the risk associated with the accuracy, bias, transparency and privacy of AI generated instructional products especially where those products influence high stakes educational decision-making and/or exacerbate inequities (OET, 2023). However, educating ML students is a national imperative in U.S. Educational institutions due to its alignment with issues of equity, accountability and workforce preparedness. Therefore, instruction for ML students must simultaneously promote both language acquisition and rigorous content area learning. The research synthesis literature demonstrates that teacher expertise specifically expertise in pedagogical and assessment practices informed by an understanding of language is one of the primary factors driving the opportunities and long-term outcomes for ML students (NASEM, 2017). The purpose of this study is to respond to a practical problem: higher education institutions have a growing need for a valid and useful tool to assess teachers’ ability to utilize AI responsibly for ML instruction, not simply to assess their overall technology confidence. 1.1 Problem and national relevance The ability of teachers to utilize technology effectively is critical in making AI successful within the educational environment. AI will be implemented through the educators to the students through learning activities. Educators are expected to translate the language used in policy and AI tools into meaningful and engaging learning opportunities for their students. The educator’s role has become increasingly important in recent years due to the emergence of multilingual students. These students require specific types of scaffolding to build upon their language skills. Educators are required to have clearly defined language objectives for their students. Additionally, educators must provide students with feedback that develops the students’ academic language while maintaining the content rigor for the subject matter being taught (NASEM, 2017; WIDA, 2020). Moreover, educators must consider the ethical implications associated with the implementation of AI tools. AI presents several additional layers of decision-making for educators. In the United States, the Office of Educational Technology (OET) has identified AI as a growing area that has the potential to augment instructional supports while simultaneously creating risks including hallucinations of misinformation, inconsistent performance and a lack of clarity regarding how AI makes decisions along with potential violations of a institution’s responsibility to protect students’ rights to privacy and accessibility (OET, 2023). Furthermore, the application of AI for ML instruction increases the risk and consequences associated with using AI tools for instructional purposes. ML instruction places a high degree of emphasis on language-related tasks including translation, paraphrasing, vocabulary support and providing feedback on writing and/or speaking. However, the language-related tasks required for ML instruction are highly dependent on linguistic accuracy, register and culturally relevant meaning. As such, if educators treat the output generated by AI as accurate and reliable without applying pedagogical judgment, ML students may receive incorrect explanations or feedback that appears fluent yet subtly alters the intended meaning of the original message. On the other hand, when educators apply pedagogical judgment to the use of AI, AI may assist in increasing access to comprehensible input, differentiated practice with formative feedback at-scale (Miao & Holmes, 2023). Finally, the national relevance of AI in education extends beyond its direct impact on instruction. There are significant governance and compliance implications that exist for public educational institutions as they attempt to implement AI-enabled tools. Public institutions are bound by federal regulations designed to protect student education records and AI enabled tools create challenges for institutions in terms of data sharing, data storage and third-party access to student data when educators upload student work (U.S. Department of Education, n.d.). Online services used for young learners also create concerns relative to collecting data and obtaining parental consent for such collection, thereby emphasizing the need for educators to have a clear understanding of the tool(s) they select for classroom use and establish clear classroom norms. Furthermore, there is a need for higher education institutions in the U.S. to develop measures of educator’s capacity in order to provide sustained professional learning, monitor improvements in educator capacity and avoid a one size fits all approach to AI training. The 2024 National Educational Technology Plan identifies a “digital design divide” tied to unequal access to sustained professional learning and capacity building for educators which suggests that the effectiveness of new technologies will depend on the human infrastructure that supports instructional design (OET, 2024). A practical and validated measure of AI literacy for ML instruction would allow them to identify areas of strength and weakness focus on supporting educators in those areas and assess whether the professional development provided leads to improved, safe and more instructionally effective use of AI. 1.2 Gap Although AI literacy has been discussed more and more, existing measures and frameworks usually measure the level of AI knowledge or digital competence in general without indicating the specific teaching requirements of ML contexts. Thus, a teacher can score high as an AI literate person generally but still lack the capacity to develop AI-based learning activities which teach to judge whether AI outputs are linguistically and culturally valid or to apply safeguarding measures which are important if MLs are the primary users of AI-based support (OECD, 2023). The second gap relates to the ethics of using AI in classrooms. Guidance at the highest levels emphasizes AI should be used in a human-centered manner and respecting rights. However, teachers require practical knowledge relating ethical principles to daily instructional decisions, for example, when it would be better to avoid using AI, how to communicate the use of AI to students and families, how to prevent reliance on AI and how to ensure both fairness and transparency (OSTP, 2022). In addition, if institutions use scales to place teachers into professional development (PD) pathways, compare the needs of teachers across different grade bands or track teacher growth after training, the scores from those scales must be interpreted based upon a validity argument that has supporting evidence from multiple sources. Testing standards today have emphasized that validity is not a characteristic of the instrument alone rather the interpretations and uses of the scores from the instrument which are supported by evidence such as content representation, response processes, internal structure and relationships to other variables (AERA et al., 2014). 1.3 Purpose and contributions The goal of this research project is to design and test a valid and useful assessment tool for measuring Teacher AI literacy for multilingual learner instruction. This is considered as the extent to which teachers are able to identify, assess and utilize AI tools in their instruction of ML learners who are aligned with TESOL pedagogies and ethics. To meet the goals of this research, a sequential explanatory mixed-methods strategy is employed. Three interrelated contributions is resulted from this research: (1) a theoretical construct that links the concepts of AI literacy, TESOL pedagogical practices and classroom ethics, (2) a theoretically grounded and empirically tested measure of Teacher AI Literacy along with a scoring system to aid in professional development planning and tracking and (3) guidelines for implementing AI responsibly in classrooms that prioritize the quality of instructional decisions made by teachers. 1.4 Research questions What factor structure best represents AI literacy for ML instruction? Does the scale predict responsible AI use intentions and instructional decision quality? Do measurement properties hold across grade bands and contexts? 2 Conceptual foundation and construct definition This research has defined teacher AI literacy for multilingual learner instruction as a situated form of professional knowledge and judgement which arises from the confluence of instructional expertise for teaching language and content, technology integration knowledge and ethical as well as governance competencies required to ensure the responsible use of AI in educational environments. The conceptual foundation for this work begins with the well-established claim that successful teaching requires much more than generic pedagogy or discrete content knowledge. Rather, effective teaching is dependent upon special, contextualized knowledge that enables teachers to translate disciplinary and linguistic goals into forms that learners can engage with. Shulman’s formulation of Pedagogical Content Knowledge (PCK) posits that teachers have the capacity to select representations, explanations and examples that will enable content to be learnable for specific students (Shulman, 1986). While digital tools are becoming increasingly central to instruction, researchers continue to expand upon this line of thinking by conceptualizing how teachers coordinate technology, pedagogy and content as opposed to viewing technology as an independent skillset. The TPACK framework posits that technology integration is most relevant when teachers comprehend how technology functions as a complement to their pedagogical choices and content demands within a specific context (Mishra & Koehler, 2006). These ideas are particularly salient for AI-enabled tools as they can impact not only how teachers represent content to students, but also the language authority, feedback, explanations with examples that students experience as authoritative. The U.S. Department of Education’s Office of Educational Technology asserts that AI systems can autonomously make decisions and produce output that may be inaccurate, improper or biased. Therefore, governance and educator judgment are central to ensuring the safe and responsible educational use of AI (OET, 2023). The construct is further supported by emerging literature on AI literacy that views literacy not as proficiency in programming but as a set of competencies necessary to understand, evaluate and utilize AI systems in human activity. Long and Magerko define AI literacy in terms of practical competences (recognizing limits, interpreting the behavior of the system and utilizing AI in an informed manner) and emphasize that the design and educational contexts determine the nature of being AI-literate (Long and Magerko, 2020). For teachers, AI literacy becomes pedagogically significant: literacy must be expressed through instructional planning, adaptation and assessment routines that provide learners with protection while promoting learning goals. WIDA’s English Language Development Standards Framework foregrounds equity for multilingual learners and the integration of language and content objectives as central tenets of instruction (WIDA, 2020). Additionally, national guidelines highlight that educators’ decisions regarding the amount of scaffolding provided, the development of academic language and the opportunity to learn structures are critical factors in determining the outcomes of ML instruction (NASEM, 2017). Given that AI tools can mediate texts, tasks and feedback that affect language exposure and participation, teacher capacity must include the ability to link AI-assisted activities with TESOL-informed instructional intentions rather than using AI as a generic productivity aid. Therefore, this study uses the following working definition: AI literacy for ML instruction is teachers’ ability to select, evaluate and utilize AI tools to support ML learning with TESOL aligned pedagogy and ethical safeguards. This definition seeks to combine instructional efficacy with responsible use. Furthermore, this definition places bounds on what the scale is and is not intended to measure. First, the construct is not analogous to general comfort with technology or the frequency of tool usage; a teacher can use AI frequently without evidence of pedagogical alignment or rigorous evaluation procedures. Second, the construct is not intended to measure the deeper technical expertise associated with developing AI, rather than the professional competencies that teachers require to utilize AI tools as instructional resources and decision aids. Finally, this construct does not supplant broader assessments of ML pedagogy. Rather, it focuses on the AI-mediated enactment of ML supportive instruction. Above all, the construct explicitly includes ethical safeguards since the use of AI in classrooms raises issues related to student privacy, transparency and fairness. In U.S. education system, protecting students’ education records and personal identifiable information is a foundational requirement for compliance (U.S. Department of Education, n.d.). Further, international and national frameworks emphasize that AI governance must address issues such as bias, accountability, transparency and human centered safeguards (OSTP, 2022). For MLs, these concerns are intensified by the likelihood that language, accent, dialect and immigration-related vulnerabilities can interact with biased outputs or surveillance-like uses, making ethics inseparable from pedagogy in a defensible construct definition. 2.2 Domain model The four dimensions for the proposed initial domain model are developed from the working definition which will be further developed during the qualitative component of this research. However, they will serve as an explicit framework for developing measurement items and subsequently examining their theoretical structure via factor analysis. 1) Pedagogical Integration of AI Tools for MLs : This dimension represents teachers’ capacity to utilize AI tools to enhance their teaching of ML learning objectives (scaffolding, differentiation and feedback relevant to language acquisition). This dimension has roots in the constructivist model of Pedagogical Content Knowledge (PCK)/Technology Pedagogical Content Knowledge (TPACK) where teachers need to develop curricular objectives into learning opportunities for their students, while utilizing tools appropriately (Shulman, 1986; Mishra & Koehler, 2006). As a result, this includes developing AI-supported activities that are directly aligned with specific language objectives, selecting texts at the appropriate level of linguistic complexity for each student using AI-produced supports (sentence frames, glossaries, modeled responses) in a way that maintains the same academic rigor as traditional assignments while also allowing students to experience the process of productive struggle. Furthermore, this dimension will include the capacity to assess whether AI-based feedback enhances language development and does not merely correct student language by limiting the student’s agency or masking their developmental trajectory. This focus on the integration of language and content instruction for ML students is consistent with WIDA’s vision of providing equitable ML instruction through the integration of language and content (WIDA, 2020). 2) Evaluation and Design of AI Tasks/Prompts : This dimension reflects teachers’ ability to create AI-assisted tasks and to verify the outputs of AI systems through systematic verification routines. This dimension is based on the practical realities that AI generated text can be fluent but incorrect and that AI systems can automate decisions that incorporate bias or generate unacceptable outputs (OET, 2023). Competency in this area includes selecting prompt types that will elicit pedagogically useful output, iteratively modifying prompts to improve clarity and specificity and evaluating the output of AI systems for accuracy, relevance and alignment with the learning objectives. When evaluating ML outputs, teachers must also evaluate the linguistic appropriateness of the output (the vocabulary load, the syntactical complexity, the pragmatic fit) and ensure that the examples generated by the AI system do not include culturally insensitivities, deficit narratives or misrepresentations of language models. Since this dimension refers to interpretive judgments rather than tool proficiency, items in this dimension should reflect routine practices and decision-making criteria that teachers employ when using AI for materials, examples, translations or feedback. 3) Ethical, Fairness, Privacy and Compliance Issues Related to ML Instruction : This dimension reflects teachers’ understanding and enactment of measures to protect students’ rights and interests, ensure the responsible use of data, mitigate biases in AI systems, ensure transparency and assign credit properly. Both the normative and policy aspects of this conceptualization underlie this dimension. AI governance frameworks have identified the potential for AI systems to discriminate against certain groups of people, to operate in a non-transparent manner and to exploit the personal data of users, thus necessitating human-centered protection mechanisms and accountability structures (OSTP, 2022). In educational settings, privacy laws and regulations restrict the types of student data that can be collected and stored by vendors and limit how that data can be accessed and shared by teachers. Teachers who work with ML students also face unique ethical challenges when using AI systems. Because it can potentially reveal a learner’s identity, home language or immigration status. Therefore, confidentiality and purpose limitation are practical requirements rather than abstract ideals in these situations. Thus, this dimension includes familiarity with common compliance boundaries, strategies for limiting data exposure and fairness-oriented practices including monitoring for biased language, stereotyping and/or treating ML students unfairly in AI-assisted recommendations and feedback. 4) Professional Judgment and Classroom Governance : The fourth dimension examines teachers’ capacity to manage AI use in the classroom including determining when to use, establishing norms for student use, communicating expectations and rationales to students as well as families and documenting their rationale. The U.S. Department of Education has noted that educators already understand the risks and emphasizes that educators must remain in the loop as decisions enabled by AI enter classrooms (OET, 2023). Governance competency includes establishing clear boundaries for acceptable student use creating assignments that maintain academic integrity while using AI productively and communicating transparency to students and families regarding when AI is used and why. This dimension also encompasses teachers’ professional responsibilities to document their rationale for choosing tools, linking AI use to instructional goals and engaging in institution wide decision-making processes. In ML contexts, governance must also be linguistically responsive: norms and rationales must be accessible to students with varying levels of proficiency and policies must not unintentionally penalize ML students for seeking language assistance while still maintaining fair academic expectations. The four dimensions of this domain model are designed to provide a comprehensive model for understanding teacher performance regarding their ability to teach effectively using technology. This model can support the development of assessments that will be useful in supporting teachers’ professional growth while being grounded in a theory of what it means to be effective with technology. 2.3 Validity argument The purpose of this study is to create an instrument that higher educational institutions as well as educators can utilize to inform professional development decisions. Therefore, the validity argument must identify how the results of the assessment would be interpreted, what uses would be made of the results and the limitations of those interpretations and uses. Kane (2013) and Messick (1995) provide contemporary validity theory and indicate that validity relates to the extent to which there is empirical and theoretical evidence to support the interpretation and utilization of test scores. Validity cannot be attributed solely to the test itself, but to the inferences that are made from test scores within a particular context. The Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) provide criteria for evaluating evidence related to test validity. Specifically, they identify six types of evidence: test content, response processes, internal structure, relations to other variables, test consequences and test fairness. Proposed Score Interpretation and Use : The proposed interpretation of the scale score is that it represents the teacher’s current level of AI literacy relative to ML instruction, specifically their competence along the four identified dimensions. The intended primary use of the results is to provide formative information about professional development needs, inform specific professional development opportunities, monitor teacher growth and evaluate programs at the aggregate level. Argument-Based Structure of the Validity Claim : An argument-based approach to validating the results of an assessment involves making explicit the inferential chain from the observed responses to the meaning and use of the scores. Specifically, an argument-based approach requires that the assumptions underlying each link in the chain be supported by empirical evidence. Simplified, the inferential chain for the proposed scale is: (1) Teachers interpret each item as intended and respond to the items using stable practice-relevant judgment, (2) the item scores combine to reflect coherent latent dimensions that are consistent with the domain model, (3) the composite and subscale scores demonstrate equivalence across all relevant subgroups and do not exhibit systematic bias, (4) the scores are associated with externally available indicators of responsible AI use intentions and instructional decision quality. Thereby it provides evidence to support the scales of practical utility for directing professional development efforts. Consequential Considerations : As Messick (1995) points out, a valid argument for the consequences of utilizing an assessment is essential particularly if the assessment directs actions that impact the distribution of resources or labels of teacher capacity. While the consequences of utilizing the proposed scale are relatively limited, it is necessary to consider them. The intent of the scale is to facilitate equity in building capacity to utilize AI with MLs. The claims that would support the consequence-related aspects of validity include the requirement of transparency in reporting scores, the need for clear guidelines on interpreting results and the importance of cautioning against the use of scores for high-stakes purposes without sufficient additional evidence. 3. Methods 3.1 Overall design This study employs a convergent mixed-methods approach to explore the use of a survey in order to collect both qualitative and quantitative data. Creswell & Plano-Clark (2018) state that a convergent mixed methods design is ideal for studies that seek both psychometric evidence of an instrument’s structure relative to other variables and contextual explanations of how participants interpret and act upon the construct in their practice. In this study, the quantitative strand is developed to estimate the internal structure together with reliability of the new measure of Teacher AI Literacy for multilingual learner instruction, the association between Teacher AI Literacy, the teacher’s reported intent to use AI responsibly and scenario-based instructional decision quality. The qualitative strand consists of three open-ended prompts embedded in the same survey as these are intended to capture how teachers describe the use of AI to support ML instruction, how they verify or adapt AI output and what ethical concerns, support needs shape their use of AI responsibly. This methodology is consistent with an argument-based view of validation in which evidence is collected to support the intended interpretations and uses of scores rather than viewing validity as a characteristic of the instrument. The quantitative component provides evidence regarding the internal structure, reliability and relationships to other variables whereas the qualitative responses provide supporting interpretation of the meaning of scores as they illustrate how teachers conceptualize competent and risky AI use in ML instruction as well as identify contextual constraints that may explain the results of the quantitative analyses. 3.2 Instrument development and construct operationalization To develop an instrument for measuring the construct: AI literacy for ML instruction as the ability of teachers to choose, assess and use AI tools to help students learn about ML using TESOL aligned pedagogies with ethical standards; this operationalization of the construct is based on domain specific frameworks of teacher knowledge regarding the integration of pedagogy, content and affordances of tools (Mishra & Koehler, 2006); research on AI literacy as a competence-based understanding of practice rather than just programming (Long & Magerko, 2020); and scale development guidelines emphasizing construct definition, iterative refinement of item representativeness (Boateng et al., 2018). The items are developed to measure observable instructional judgments and routines that teachers may reasonably report on themselves.Items are also designed to be specific to the context of ML. All items utilize a consistent five-point Likert response format to decrease respondent burden. Finally, the survey includes embedded open-ended response process questions which allow the researcher to take a pilot validation stance. The qualitative data can provide insight into item meaning and potential sources of misunderstanding while the quantitative analysis provides structure and reliability evidence. 3.3 Participants and sampling In-service teachers of higher educational institutions comprise the population for this study and respondents’ eligibility is determined by a screening question that asked if they are currently teaching classes containing MLs. The recruitment strategy uses online education forums as well as professional networks to distribute surveys and to find potential participants. The data set contains thirty-two responses from participants who complete the survey. Non-probability samples are subject to the risk of over-representing those who have an interest in both AI and continuing professional development. Therefore, the study provides detailed descriptions of the sample and caution when interpreting the findings. In addition, contextual information including respondent roles, years of teaching experience and estimated percentages of MLs included in the focus class is included to support exploratory subgroup analysis related to the third research question. 3.4 Measures All measures are administered via an online survey. The survey consists of five blocks: screening and teaching context, the core AI literacy scale, responsible AI-use intentions, scenario-based decision quality vignettes and open-ended qualitative prompts. Responses to each item are coded in such a manner that higher values represent greater endorsement of competent practice. When internal structure evidence supports them, domain subscales are calculated as the mean of their constituent items. Since Likert items are ordinal, instead of treating them as continuous normally distributed variables, all psychometric analyses are planned using estimators designed for use with ordinal data. 3.5 Procedure The survey begins with a participant informed consent statement. Participants also have the option to withdraw from the study at any point. In addition, screening procedures are used to evaluate the quality of the data collected using the survey. 3.6 Quantitative analysis plan The quantitative analysis is performed to provide evidence that supports the psychometric properties of the main scale (RQ1), assesses relationships with external criterion measures (RQ2) and conducts exploratory assessments of the scale within various contexts (RQ3). The analytical techniques used are based on the type of Likert-type ordinal data collected and the size of the samples included in each study. Before conducting an analysis, descriptive statistics and missingness are evaluated for each item of the instrument. In addition, straight lining and implausible speed of completion is identified as possible indicators of inattentive response behavior and the number of cases meeting the criteria for removal is documented for removal. 3.7 Qualitative analysis plan Beginning with the identification of open-ended response patterns and meaning, thematic analysis is planned for use to identify meaning patterns in short written responses from teachers who have responded to the survey instrument. As a flexible qualitative research method, thematic analysis provides the researchers with flexibility to allow for the exploration of identified meanings and patterns. This research method begins by moving the data into initial coding.The coding process for this project follows a hybrid approach: the deductive codebook developed from the four domain model provides the initial organization for the coding process and inductive codes are used to capture emerging themes that are not anticipated within the four domains of the model, including time constraints, ambiguities of policies or perceived risks associated with students’ information privacy.The qualitative outcomes provide detailed descriptions of how teachers engage their students in learning about AI literacy in ML contexts, how teachers assess the accuracy of output from their students, what types of ethical concerns do teachers perceive when working in ML environments and what institutional supports do teachers believe would be beneficial to support them as they teach students about the potential applications and implications of ML. 3.8 Mixed-methods integration plan The integration process is based on the convergent design and is structured to follow a merger strategy to develop meta-conclusions from qualitative and quantitative data related to the research questions. The primary output of the integration process is a joint display of data, grouped by domain and paired with the qualitative thematic representations of how teachers understand or implement the same domain in their classrooms. 3.9 Ethics and transparency The study complies with established ethics guidelines for low-risk educational research using surveys. Respondents are able to participate voluntarily. A written informed consent includes information on the purpose of the study and how the respondent’s identity remains confidential. No identifiable student or institutional data is collected as part of this study. All data is maintained securely and reported in aggregate form. 4. Results 4.1 Sample description In addition to the qualitative aspects, the quantitative aspects of this study include the data set of 32 participants. The participants’ role positions are divided into four categories including ESL/EFL specialist (n = 11, 34.4%), content teacher (n = 2, 6.3%), instructor (n = 12, 37.5%) and others (n = 7, 21.9%). Respondents report a wide range of teaching experience (from 0 to 7 years), with a large number of respondents reporting 2 years of teaching experience (n = 7, 21.9%) and 3 years of teaching experience (n = 8, 25.0%). With regard to Multilingual Learner (ML) exposure, there is considerable variation in the percentage of MLs that each respondent had taught including 0-10% (n = 5, 15.6%), 11-25% (n = 7, 21.9%), 26-50% (n = 10, 31.3%), 51-75% (n = 3, 9.4%) and 76-100% (n = 7, 21.9%). The majority of the respondents indicate that they have received no prior training or professional development related to Artificial Intelligence (n = 27, 84.4%) and most also state that their institution do not have a clear AI-related policy (n = 27, 84.4%). Thematic analyses using open-ended responses include all n = 32 respondents. 4.2 Descriptive results and internal consistency The means of the items for Teacher AI Literacy which make up the composite AI Literacy Total Score indicate a general endorsement of AI-supported instructional practices for ML at moderate to high levels (M = 3.39; SD = 0.67 on a 1-5 scale). Patterns in item-level responses suggest the greatest level of support among respondents for those practices aligned with the use of AI in a manner that is both responsible and sensitive to MLs. For example, respondents express strong support for the use of AI to ensure that language learning objectives are appropriately aligned with the proficiency level of ML learners (M = 3.88; SD = 0.75) and for intending to use AI responsibly. In addition, the reliability estimates for the 12-item AI Literacy Scale are Cronbach’s alpha = .88 suggesting a high degree of reliability. Finally, the reliability estimate for the Responsible AI Use Intentions items is Cronbach’s alpha = .82 and the mean of the four intentions items indicates an overall composite of M = 3.96, SD = 0.63 suggesting a high level of intention to use AI responsibly. Among the individual items assessing the intentions of respondents to use AI responsibly, respondents endorse the highest level of intent to avoid sharing sensitive student data (M = 4.25; SD = 0.62) and to verify the accuracy of AI-generated content before using it in the classroom (M = 4.09; SD = 0.89) followed by endorsing institutional policies even if inconvenient (M = 3.94; SD = 0.67) and teaching students about responsible AI use as part of instructional practice (M = 3.59; SD = 1.01). Finally regarding criterion performance, the total scores for the vignette-based decision quality assessments of respondents’ ability to make decisions based on best-practices for AI use ranged from M = 3.38; SD = 1.19 and thus, respondents choose best-practice options in many scenarios with considerable variability in terms of individual differences. 4.3 Scale structure: exploratory factor analysis Exploratory Factor Analysis (using principal axis factoring with an Oblimin rotation) indicate that sampling adequacy is within the acceptable range (KMO = .655) and that there is a statistically significant Bartlett’s test of sphericity indicating that the data are suitable for factorability. Three factors emerge in the solution (eigenvalues greater than 1) which explain about 69.1% of the variance in the solution. The first factor include items that represent how ML oriented instructional use and appropriateness checks are. The second factor represent the items related to reflective and pedagogical governance of AI use. The third factor represents the responsible-use intentions of items along with one other instructional item. Overall, the factor solution appear to suggest that the respondents’ responses regarding AI organized into (a) the focus on ML instructional applications and evaluations, (b) the reflective/ethical governance and classroom norms related to the use of AI and (c) the responsible-use intentions and alignment with institutional policies. 4.4 Validity evidence: associations with intentions and decision quality Initial validity expectations are supported based on the associations observed among AI literacy, responsible-use intentions and the quality of decisions made when using AI to make decisions in the vignettes. The Spearman correlation coefficients demonstrate that there is a moderate positive relationship between the Total AI Literacy and the Total Responsible AI-use intentions (ρ = .56, p = .001); thus, those who have a higher total AI literacy practice also have stronger intentions to verify output, protect students’ data and follow policy. Similarly, Total AI Literacy is positively correlated with Vignette Decision Quality (ρ = .38, p = .032); therefore, the higher the reported AI literacy, the higher the likelihood that a teacher would select the best-practices instructional decisions in the applied scenarios. Additionally, Total Responsible-use intentions are strongly positively correlated with Vignette Decision Quality (ρ = .63, p < .001); therefore, the stronger the commitment expressed by teachers to responsible use, the stronger their applied decision making in the ML relevant vignettes. The regression analyses provide additional support for these relationships. Total AI Literacy predict Total Responsible AI-use intentions (R² = .32); therefore, the higher the self-reported literacy score, the higher the intentions to responsibly use AI. Additionally, Total AI Literacy predict Vignette Decision Quality (R² = .13); therefore, while the relationship is relatively small compared to the relationship between literacy and intentions, it is still statistically significant and demonstrate that self-reporting of literacy is predictive of applied best-practices selections. 4.5 Preliminary subgroup checks Due to the unevenly sized subgroups and the small number of participants in several categories, the subgroup analyses should be considered exploratory. A Mann-Whitney U Test comparing the dichotomized Role Grouping Variable finds a statistically significant difference in Total AI Literacy between the two role groups (U = 33.00, Z = -2.63, p = .009) with the larger role group (n = 25) having higher literacy scores than the smaller role group (n = 7). However, the subgroup differences in Total Decision Quality are not statistically significant for either the ML Exposure Grouping (p = .793) or the Prior AI Professional Development Grouping (p = .172). However, these analyses have limited power due to the small sample size in both the High ML Exposure Group (n = 4) and the “any AI PD” Group (n = 5). Therefore, these results can be viewed as preliminary indicators and may not be generalizable. 4.6 Qualitative thematic findings from open-ended responses The thematic analysis of the open-ended responses reveals five major themes that define and extend the quantitative patterns described above. First, teachers view AI as a tool for ML specific scaffolding/differentiation, particularly vocabulary support, simplifying complex content and reducing cognitive load while maintaining rigorous discipline. One respondent notes that he has used AI to generate tier 2 vocabulary. However, he revises the generated output because the output is a bit too academic and adds simplified language along with primary language translations to support beginning level MLs. Another respondent explains how he/she takes an AI-generated math explanation and breakes it down into five short, bulleted steps to reduce the cognitive load as the original output is too dense. Second, respondents stress the importance of verification, safety and accuracy check prior to classroom use. This mirrors the quantitative responses related to verification intentions that are highly endorsed by respondents. A science teacher reports when AI generates a lab summary, he fact-checks the data against a textbook to ensure that it is not hallucinating, referring to an incident in which the model describes an experiment that is not physically possible. Similarily, another respondent describes how he/she requests multiple explanations to test consistency, stating: “If the model provides conflicting explanations, I know I will need to exercise caution when using the output”. Third, teachers repeatedly cite cultural sensitivity/awareness, bias and loss of linguistic nuance as significant areas of concern, especially with regards to multilingual/international students. Participants describe being vigilant to identify “stereotypes” and “missing perspectives” and express frustration regarding the inability of AI to maintain pragmatic meaning. Another respondent frame this issue as a larger equity issue citing the possibility of AI imposing a standard American format that does not consider diverse rhetorical styles. Fourth, participants cite privacy, data governance and institutional clarity as necessary conditions for responsibly using AI in education. These sentiments mirror the quantitative responses in relation to protecting sensitive student data. One Participant states that he is genuinely terrified about what happens to student data after it is inputted into those boxes and asked for a clearly defined departmental “Green List” of trusted tools that protect student privacy. Another respondent directly link responsible AI implementation to institutional safeguards citing that institutions should provide “designated, trustworthy AI tools that prioritize student privacy” in conjunction with providing clear guidelines to both students and faculty. Fifth, respondents describe pedagogical boundaries to limit dependency on AI, frequently connected to academic integrity and the student learning process. As such, one teacher worries that students are losing prewriting skills like brainstorming and outlining and suggested policy requiring handwritten notes or early drafts alongside any AI assisted work. Another concern is that excessive reliance on AI can lead to fossilization of grammar errors and advocate for training on hybrid assignments where AI is limited to the outlining phase, not the final draft. 4.7 Mixed-methods integration The qualitative themes across all datasets correlate positively with the quantitative response profile. Quantitatively, respondents report high intentions to verify AI content and not share sensitive student data. These same commitments are reflected qualitatively in respondents describing fact-checking routines, textbook triangulation and explicit privacy concerns. The positive association between AI literacy scores, responsible use intentions and vignette decision quality are further contextualized by teachers’ narratives that effective AI use for MLs requires both instructional adaptation (simplification/chunking/addition of visual aids/register adjustment) and professional judgment (bias checks/safety screening/policy adherence). The qualitative findings extend the quantitative results in defining why the practices matter in ML contexts, specifically the threat of cultural and pragmatic mismatches, the potential erasure of multilingual rhetorical identities and the necessity for institutional supports to support equitable and responsible classroom integration of AI. 5. Discussion and implications 5.1 Answering the research questions directly The domain model of AI literacy for ML instructional applications has an interpretable, correlated multi-factor structure. It resembles the proposed domain model of AI literacy. However, the results of this study provide evidence of practical refinements to the model. Instead of being completely separate and distinct, the data supports a more consolidated structure of AI literacy for ML instructional applications where (a) pedagogical ML application, (b) critical evaluation and professional judgment along with (c) responsible-use orientation and governance all appear to be very closely related dimensions. Teachers in the qualitative strand further support this consolidation of dimensions. Teachers describe their understanding of AI literacy as “doing” (designing ML scaffolds), “checking” (verification and appropriateness routines) and “governing” (privacy, norms, bounded-use decisions). Therefore, the combined evidence from both strands of the study indicates that AI literacy for ML instructional applications can be conceptualized as a single competency system and not merely as general AI confidence. There is an association found between higher AI literacy scores and higher levels of responsible-use intentions and better vignette decision quality. This association is supported by teachers who describe robust verification routines, privacy safeguards and classroom norms. They also describe more restricted boundaries around when and how AI can be used with MLs which are consistent with their higher intention scores and better scenario-based decisions. It should be noted, however, that these results are best considered as preliminary/directional evidence of validity due to the pilot sample size, self-report measurement and cross-sectional design. This study has provided only preliminary/exploratory information about the context portability of the instrument. There are educators across roles and ML exposure levels included in the sample. Exploratory subgroup checks indicate that there may be group differences within subgroups and/or there may be no differences. However, subgroup size is too small to draw conclusions. Therefore, the most conservative statement that can be made is that the instrument has potential for use in different contexts. However, the equivalency of meaning and measurement of the instrument across contexts needs to be tested in larger samples. 5.2 Contributions to U.S. education This research will help develop a domain-specific measure of ML-based on TESOL pedagogy, ethics and governance for measuring AI Literacy in U.S. multilingual education contexts. The issue of the problem in the multilingual education context is not simply whether teachers feel comfortable with using an AI tool, but rather whether teachers are able to use an AI tool to assist students in developing their languages and accessing content while protecting student data/privacy, promoting fair and transparent uses of ML systems by students and/or faculty and ultimately supporting ethical decision-making about AI in classrooms. Therefore, operationalizing AI Literacy as ML-related competence (scaffolding design, verification and bias detection, privacy aware decision-making and classroom governance) provides a means to help institutions move beyond general measures of Technology Readiness to provide instructionally and ethically actionable indicators. 5.3 Practical implications These results provide many immediately applicable uses for the instrument in areas of teacher education, professional development and program evaluation, so long as the instrument is viewed as a developmental assessment rather than a static measure. Baseline → targeted PD → growth monitoring Use the instrument to create a baseline of teacher capacity, identify the areas of greatest need and reassess after the intervention to determine how much teachers have grown. The qualitative results also provide examples of what growing looks like: verifying, adapting and governing their use of AI and ML with more deliberate routines and more consistent protection of student data when using these technologies. Dimensional PD Mapping Since the quantitative and qualitative analysis suggest a doing/checking/governing paradigm, professional development can be organized around modules based on these dimensions. Verification Routines and Quality Control: Fact checking, Cross tool triangulation, Readability checks, Detecting fabricated references, Auditing outputs for pragmatic/register fit. Designing ML Scaffolding for Use with AI: Transforming drafts from AI into leveled supports (sentence frames, vocabulary supports, chunking, bilingual scaffolds) while preserving disciplinary rigor and alignment to language objectives. Safeguards for Privacy, Bias and Fairness: De-identifying practices, Tool vetting, Stereotype/bias screening, Cultural responsiveness Checks and translation/nuance risk management (dialect/low resource issues). Establishing Classroom Norms and Governance: Transparency and disclosure practices regarding “AI as Tutor” or “AI as Ghostwriter”, Designing assignments that preserve productive struggle and learner voice, Documenting/reflecting habits for professional accountability. By organizing professional development in this manner, the instrument provides precision PD which helps educators and administrators develop targeted training programs that do not rely on one-size-fits all training and allocate resources according to the educator’s weaknesses in competence and the level of risk associated with their students’ exposure to AI and ML. 5.4 Equity and ethics implications The researchers emphasize that the context of ML learning requires more than just attention to ethics since the language status of students’ input can increase their vulnerability to privacy violation, bias/stereotyping in content and the pragmatic and cultural mismatch of how they interact with AI. Qualitative findings especially highlight two areas of potential inequity to consider in terms of implementation: (1) the possibility that the way AI standardizes language erases the multilingual voices and cultural nuances and (2) unequal access to high-quality tools and instruction which could exacerbate the existing opportunity gap. Therefore, this tool needs to be used as a capacity-building tool for educators to assess their ability to protect their students’ rights and to advocate policy and infrastructure to provide them with the necessary support and resources. Ethically using this tool would include being transparent. Educators and educational institutions should view developing AI literacy as part of ongoing professional growth and development with clear expectations, supportive coaching and measures to prevent surveillance-style evaluation. 5.5 Limitations and future research The study design relies on a single administration of a survey. AI literacy and intention to adopt AI are self-reporting variables, therefore the results may be influenced by social desirability bias as well as respondent’s interpretation of the individual items. Besides, the decision quality of the vignettes has its own limitations due to expert-defined best options and the fact that decision quality is highly dependent on the specific context. However, future studies should recruit a larger, more diverse sample to enable robust CFA, evaluate alternative models (higher-order or bifactor structures) and test measurement invariance across roles, policy contexts as well as ML exposure. They may examine sensitivity to change via pre/post designs around targeted PD. They also may strengthen the validity argument by linking scores to classroom artifacts (lesson plans, prompts, feedback samples, modified AI outputs) and, where feasible, observational indicators of verification and governance routines. 6. Conclusion The study finds evidence for each of the hypotheses stated above, as well as for the construct's reliability and validity as an assessment of teacher AI literacy. Overall, the study finds evidence for the feasibility and potential utility of the proposed measure of teacher AI literacy while also identifying several important methodological and theoretical challenges that will be addressed in future studies. These findings provide preliminary evidence for the scale's coherence, interpretability and applicability in a variety of contexts. In addition, the study finds that teachers who are more literate about AI tend to report greater intention to use AI responsibly in their teaching and also tend to make higher-quality decisions about whether or how to use AI in vignettes depicting common scenarios in ML classrooms. The study also finds strong preliminary evidence for the internal structure of the proposed measure of teacher AI literacy as well as for the relationships between teacher AI literacy and two key aspects of teacher practice: responsible AI-use intentions and instructional decision-making quality. The study therefore provides a starting point for further research into the measurement of teacher AI literacy and it suggests several avenues for such research. More specifically, the study supports the use of the instrument as a baseline for measuring teacher AI literacy prior to professional development as a means of evaluating the effects of targeted professional development on teacher AI literacy and as a means of tracking the growth of teacher AI literacy over time. While the study provides some initial validation evidence for the scale and offers some preliminary insights into what teacher AI literacy looks like in practice, there are still many unanswered questions regarding the measurement of teacher AI literacy as well as its relationship to other constructs and its impact on teacher practice and student learning. Future studies should include larger and more diverse samples of teachers, additional sources of validity evidence and more rigorous tests of scale stability and invariance. Declarations Data and/or Code availability : De-identified data, codebook and analysis scripts are available in figshare at https://doi.org/10.6084/m9.figshare.31427369 Ethical statement This study involved an anonymous, minimal-risk survey of adult educators. No directly identifying information was collected and participation was voluntary. Prior to beginning the survey, participants reviewed an informed-consent statement describing the purpose of the study, procedures, risks/benefits and their right to stop at any time without penalty. Data were stored securely and analyzed in de-identified form and only aggregated results are reported. Conflict of interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Funding The authors received no external financial support for this research, authorship, and/or publication of this article. Acknowledgments The authors thank the educators who participated in the survey, shared their experiences and perspectives on responsible generative AI use in multilingual learner instruction. The authors also appreciate the colleagues who offered constructive feedback on the study materials and instrument wording during development. Any remaining errors or limitations are the responsibility of the authors. No external grant funding was reported for this work. References A call to action for closing the digital access, design, and use divides. 2024 National Educational Technology Plan. Office of Educational Technology, US Department of Education. (2023, December 31). American Educational Research Association (AERA), American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing . Boateng, G. O., Neilands, T. B., Frongillo, E. A., Melgar-Quiñonez, H. R., & Young, S. L. (2018). Best practices for developing and validating scales for health, social, and behavioral research: A primer. Frontiers in Public Health , 6 , 1–18. https://doi.org/10.3389/fpubh.2018.00149 Creswell, J. W., & Plano Clark, V. L. (2018). Designing and conducting mixed methods research (3rd ed.). SAGE Publications. DeVellis, R. F. (2017). Scale development: Theory and applications (4th ed.). SAGE. Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement , 50 (1), 1–73. https://doi.org/10.1111/jedm.12000 Long, D., & Magerko, B. (2020). What is AI literacy? Competencies and design considerations. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , 1–16. https://doi.org/10.1145/3313831.3376727 Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist , 50 (9), 741–749. https://doi.org/10.1037/0003-066X.50.9.741 Miao. (n.d.). Guidance for generative AI in Education and research . UNESCO.org. https://www.unesco.org/en/articles/guidance-generative-ai-education-and-research Mishra, P., & Koehler, M. J. (2006). Technological pedagogical content knowledge: A framework for teacher knowledge. Teachers College Record, 108 (6), 1017–1054. https://doi.org/10.1111/j.1467-9620.2006.00684.x National Academies of Sciences, Engineering, and Medicine. (2017). Promoting the educational success of children and youth learning English: Promising futures . The National Academies Press. https://doi.org/10.17226/24677 Nowell, L. S., Norris, J. M., White, D. E., & Moules, N. J. (2017). Thematic analysis: Striving to meet the trustworthiness criteria. International Journal of Qualitative Methods , 16 , 1–13. https://doi.org/10.1177/1609406917733847 OECD. (2023). OECD digital education outlook 2023: Towards an effective digital education ecosystem . OECD Publishing. https://doi.org/10.1787/c74f03de-en Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher , 15 (2), 4–14. https://doi.org/10.2307/1175860 U.S. Department of Education, Office of Educational Technology (OET). (2023). Artificial intelligence and the future of teaching and learning: Insights and recommendations . U.S. Department of Education. (n.d.). FERPA (Family Educational Rights and Privacy Act) . https://studentprivacy.ed.gov/ferpa White House Office of Science and Technology Policy (OSTP). (2022, October). Blueprint for an AI Bill of Rights: Making automated systems work for the American people . The White House. https://www.whitehouse.gov/ostp/ai-bill-of-rights/ WIDA. (2020). WIDA English language development standards framework, 2020 edition: Kindergarten–grade 12 . Board of Regents of the University of Wisconsin System. Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9215817","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":611600384,"identity":"cfa81863-9ea9-4746-af4b-e0e145a52e91","order_by":0,"name":"Showrav Chowdhury","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABJElEQVRIiWNgGAWjYDACCQiVwMDA2MzAYPCfh5+9Acg3sCBWSwGzjGTPAZAWCWK0MDAzMHxgtjG4kYAkjgXIz24+Jl1RcziPf3ZzswGDARuPwc3nVzf8KJBg4G/vTsCmxeDOsTTJM8cOF0vcOdicwGDAwyN5O6fsZg/QYRJnzm7AqkUix+xmA1taYsONxOYDQC4P3+2ctBs8QC0GErlYtcjPAGn5l5Y4H6LFgIfh5pm0m3/waGG4AdTS2GaTuAGoBeiwBB6BG+zHbuOzxeBGWvrPxj6bYkOgFoMEgwM8kj05bLdlgC7E5Rf5GcmHDRu+SeTJ3Uh/LPHhzwF7fvbjz26++WMjx9/ei91hKCABTPIYgEnCyhGA/QEpqkfBKBgFo2D4AwAcjGaENpJCUwAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0009-0000-1356-0232","institution":"University of Louisiana at Lafayette","correspondingAuthor":true,"prefix":"","firstName":"Showrav","middleName":"","lastName":"Chowdhury","suffix":""}],"badges":[],"createdAt":"2026-03-24 20:13:39","currentVersionCode":1,"declarations":{"humanSubjects":true,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":true,"humanSubjectConsent":true,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9215817/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9215817/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105566112,"identity":"a90f043c-e4e6-4af9-8287-e9c5bd898f21","added_by":"auto","created_at":"2026-03-27 12:55:21","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":920567,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9215817/v1/87005f5c-08e5-4891-82a5-71c6c1c84964.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eTeacher AI Literacy for Multilingual Learner Instruction: Scale Development and Initial Validity Evidence\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eNow-a-days generative artificial intelligence tools begin to integrate into regular instructional practice to aid with lesson planning, provide draft responses to student work, assist with differentiation of teaching resources and provide language-based support in multilingual learner (ML) classrooms. At the same time the U.S. Department of Education, Office of Educational Technology emphasizes the potential for AI to impact teaching and learning but also emphasizes the risk associated with the accuracy, bias, transparency and privacy of AI generated instructional products especially where those products influence high stakes educational decision-making and/or exacerbate inequities (OET, 2023). However, educating ML students is a national imperative in U.S. Educational institutions due to its alignment with issues of equity, accountability and workforce preparedness. Therefore, instruction for ML students must simultaneously promote both language acquisition and rigorous content area learning. The research synthesis literature demonstrates that teacher expertise specifically expertise in pedagogical and assessment practices informed by an understanding of language is one of the primary factors driving the opportunities and long-term outcomes for ML students (NASEM, 2017). The purpose of this study is to respond to a practical problem: higher education institutions have a growing need for a valid and useful tool to assess teachers\u0026rsquo; ability to utilize AI responsibly for ML instruction, not simply to assess their overall technology confidence.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e1.1 Problem and national relevance\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThe ability of teachers to utilize technology effectively is critical in making AI successful within the educational environment. AI will be implemented through the educators to the students through learning activities. Educators are expected to translate the language used in policy and AI tools into meaningful and engaging learning opportunities for their students. The educator\u0026rsquo;s role has become increasingly important in recent years due to the emergence of multilingual students. These students require specific types of scaffolding to build upon their language skills. Educators are required to have clearly defined language objectives for their students. Additionally, educators must provide students with feedback that develops the students\u0026rsquo; academic language while maintaining the content rigor for the subject matter being taught (NASEM, 2017; WIDA, 2020).\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003eMoreover, educators must consider the ethical implications associated with the implementation of AI tools. AI presents several additional layers of decision-making for educators. In the United States, the Office of Educational Technology (OET) has identified AI as a growing area that has the potential to augment instructional supports while simultaneously creating risks including hallucinations of misinformation, inconsistent performance and a lack of clarity regarding how AI makes decisions along with potential violations of a institution\u0026rsquo;s responsibility to protect students\u0026rsquo; rights to privacy and accessibility (OET, 2023). Furthermore, the application of AI for ML instruction increases the risk and consequences associated with using AI tools for instructional purposes. ML instruction places a high degree of emphasis on language-related tasks including translation, paraphrasing, vocabulary support and providing feedback on writing and/or speaking. However, the language-related tasks required for ML instruction are highly dependent on linguistic accuracy, register and culturally relevant meaning. As such, if educators treat the output generated by AI as accurate and reliable without applying pedagogical judgment, ML students may receive incorrect explanations or feedback that appears fluent yet subtly alters the intended meaning of the original message. On the other hand, when educators apply pedagogical judgment to the use of AI, AI may assist in increasing access to comprehensible input, differentiated practice with formative feedback at-scale (Miao \u0026amp; Holmes, 2023).\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003eFinally, the national relevance of AI in education extends beyond its direct impact on instruction. There are significant governance and compliance implications that exist for public educational institutions as they attempt to implement AI-enabled tools. Public institutions are bound by federal regulations designed to protect student education records and AI enabled tools create challenges for institutions in terms of data sharing, data storage and third-party access to student data when educators upload student work (U.S. Department of Education, n.d.). Online services used for young learners also create concerns relative to collecting data and obtaining parental consent for such collection, thereby emphasizing the need for educators to have a clear understanding of the tool(s) they select for classroom use and establish clear classroom norms.\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003eFurthermore, there is a need for higher education institutions in the U.S. to develop measures of educator\u0026rsquo;s capacity in order to provide sustained professional learning, monitor improvements in educator capacity and avoid a one size fits all approach to AI training. The 2024 National Educational Technology Plan identifies a \u0026ldquo;digital design divide\u0026rdquo; tied to unequal access to sustained professional learning and capacity building for educators which suggests that the effectiveness of new technologies will depend on the human infrastructure that supports instructional design (OET, 2024). A practical and validated measure of AI literacy for ML instruction would allow them to identify areas of strength and weakness focus on supporting educators in those areas and assess whether the professional development provided leads to improved, safe and more instructionally effective use of AI.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e1.2 Gap\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eAlthough AI literacy has been discussed more and more, existing measures and frameworks usually measure the level of AI knowledge or digital competence in general without indicating the specific teaching requirements of ML contexts. Thus, a teacher can score high as an AI literate person generally but still lack the capacity to develop AI-based learning activities which teach to judge whether AI outputs are linguistically and culturally valid or to apply safeguarding measures which are important if MLs are the primary users of AI-based support (OECD, 2023).\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003eThe second gap relates to the ethics of using AI in classrooms. Guidance at the highest levels emphasizes AI should be used in a human-centered manner and respecting rights. However, teachers require practical knowledge relating ethical principles to daily instructional decisions, for example, when it would be better to avoid using AI, how to communicate the use of AI to students and families, how to prevent reliance on AI and how to ensure both fairness and transparency (OSTP, 2022). In addition, if institutions use scales to place teachers into professional development (PD) pathways, compare the needs of teachers across different grade bands or track teacher growth after training, the scores from those scales must be interpreted based upon a validity argument that has supporting evidence from multiple sources. Testing standards today have emphasized that validity is not a characteristic of the instrument alone rather the interpretations and uses of the scores from the instrument which are supported by evidence such as content representation, response processes, internal structure and relationships to other variables (AERA et al., 2014).\u0026nbsp;\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e1.3 Purpose and contributions\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThe goal of this research project is to design and test a valid and useful assessment tool for measuring Teacher AI literacy for multilingual learner instruction. This is considered as the extent to which teachers are able to identify, assess and utilize AI tools in their instruction of ML learners who are aligned with TESOL pedagogies and ethics. To meet the goals of this research, a sequential explanatory mixed-methods strategy is employed. Three interrelated contributions is resulted from this research: (1) a theoretical construct that links the concepts of AI literacy, TESOL pedagogical practices and classroom ethics, (2) a theoretically grounded and empirically tested measure of Teacher AI Literacy along with a scoring system to aid in professional development planning and tracking and (3) guidelines for implementing AI responsibly in classrooms that prioritize the quality of instructional decisions made by teachers.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e1.4 Research questions\u003c/strong\u003e\u003c/h3\u003e\n\u003col start=\"1\" type=\"1\"\u003e\n \u003cli\u003eWhat factor structure best represents AI literacy for ML instruction?\u003c/li\u003e\n \u003cli\u003eDoes the scale predict responsible AI use intentions and instructional decision quality?\u003c/li\u003e\n \u003cli\u003eDo measurement properties hold across grade bands and contexts?\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"2 Conceptual foundation and construct definition","content":"\u003cp\u003eThis research has defined teacher AI literacy for multilingual learner instruction as a situated form of professional knowledge and judgement which arises from the confluence of instructional expertise for teaching language and content, technology integration knowledge and ethical as well as governance competencies required to ensure the responsible use of AI in educational environments. The conceptual foundation for this work begins with the well-established claim that successful teaching requires much more than generic pedagogy or discrete content knowledge. Rather, effective teaching is dependent upon special, contextualized knowledge that enables teachers to translate disciplinary and linguistic goals into forms that learners can engage with. Shulman’s formulation of Pedagogical Content Knowledge (PCK) posits that teachers have the capacity to select representations, explanations and examples that will enable content to be learnable for specific students (Shulman, 1986).\u003c/p\u003e\n\u003cp\u003eWhile digital tools are becoming increasingly central to instruction, researchers continue to expand upon this line of thinking by conceptualizing how teachers coordinate technology, pedagogy and content as opposed to viewing technology as an independent skillset. The TPACK framework posits that technology integration is most relevant when teachers comprehend how technology functions as a complement to their pedagogical choices and content demands within a specific context (Mishra \u0026amp; Koehler, 2006). These ideas are particularly salient for AI-enabled tools as they can impact not only how teachers represent content to students, but also the language authority, feedback, explanations with examples that students experience as authoritative. The U.S. Department of Education’s Office of Educational Technology asserts that AI systems can autonomously make decisions and produce output that may be inaccurate, improper or biased. Therefore, governance and educator judgment are central to ensuring the safe and responsible educational use of AI (OET, 2023). The construct is further supported by emerging literature on AI literacy that views literacy not as proficiency in programming but as a set of competencies necessary to understand, evaluate and utilize AI systems in human activity. Long and Magerko define AI literacy in terms of practical competences (recognizing limits, interpreting the behavior of the system and utilizing AI in an informed manner) and emphasize that the design and educational contexts determine the nature of being AI-literate (Long and Magerko, 2020). For teachers, AI literacy becomes pedagogically significant: literacy must be expressed through instructional planning, adaptation and assessment routines that provide learners with protection while promoting learning goals. WIDA’s English Language Development Standards Framework foregrounds equity for multilingual learners and the integration of language and content objectives as central tenets of instruction (WIDA, 2020). Additionally, national guidelines highlight that educators’ decisions regarding the amount of scaffolding provided, the development of academic language and the opportunity to learn structures are critical factors in determining the outcomes of ML instruction (NASEM, 2017). Given that AI tools can mediate texts, tasks and feedback that affect language exposure and participation, teacher capacity must include the ability to link AI-assisted activities with TESOL-informed instructional intentions rather than using AI as a generic productivity aid.\u003c/p\u003e\n\u003cp\u003eTherefore, this study uses the following working definition: AI literacy for ML instruction is teachers’ ability to select, evaluate and utilize AI tools to support ML learning with TESOL aligned pedagogy and ethical safeguards. This definition seeks to combine instructional efficacy with responsible use. Furthermore, this definition places bounds on what the scale is and is not intended to measure. First, the construct is not analogous to general comfort with technology or the frequency of tool usage; a teacher can use AI frequently without evidence of pedagogical alignment or rigorous evaluation procedures. Second, the construct is not intended to measure the deeper technical expertise associated with developing AI, rather than the professional competencies that teachers require to utilize AI tools as instructional resources and decision aids. Finally, this construct does not supplant broader assessments of ML pedagogy. Rather, it focuses on the AI-mediated enactment of ML supportive instruction.\u003c/p\u003e\n\u003cp\u003eAbove all, the construct explicitly includes ethical safeguards since the use of AI in classrooms raises issues related to student privacy, transparency and fairness. In U.S. education system, protecting students’ education records and personal identifiable information is a foundational requirement for compliance (U.S. Department of Education, n.d.). Further, international and national frameworks emphasize that AI governance must address issues such as bias, accountability, transparency and human centered safeguards (OSTP, 2022). For MLs, these concerns are intensified by the likelihood that language, accent, dialect and immigration-related vulnerabilities can interact with biased outputs or surveillance-like uses, making ethics inseparable from pedagogy in a defensible construct definition.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e2.2 Domain model\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe four dimensions for the proposed initial domain model are developed from the working definition which will be further developed during the qualitative component of this research. However, they will serve as an explicit framework for developing measurement items and subsequently examining their theoretical structure via factor analysis.\u003c/p\u003e\n\u003cp\u003e1) \u003cstrong\u003ePedagogical Integration of AI Tools for MLs\u003c/strong\u003e: This dimension represents teachers’ capacity to utilize AI tools to enhance their teaching of ML learning objectives (scaffolding, differentiation and feedback relevant to language acquisition). This dimension has roots in the constructivist model of Pedagogical Content Knowledge (PCK)/Technology Pedagogical Content Knowledge (TPACK) where teachers need to develop curricular objectives into learning opportunities for their students, while utilizing tools appropriately (Shulman, 1986; Mishra \u0026amp; Koehler, 2006). As a result, this includes developing AI-supported activities that are directly aligned with specific language objectives, selecting texts at the appropriate level of linguistic complexity for each student using AI-produced supports (sentence frames, glossaries, modeled responses) in a way that maintains the same academic rigor as traditional assignments while also allowing students to experience the process of productive struggle. Furthermore, this dimension will include the capacity to assess whether AI-based feedback enhances language development and does not merely correct student language by limiting the student’s agency or masking their developmental trajectory. This focus on the integration of language and content instruction for ML students is consistent with WIDA’s vision of providing equitable ML instruction through the integration of language and content (WIDA, 2020).\u003c/p\u003e\n\u003cp\u003e2) \u003cstrong\u003eEvaluation and Design of AI Tasks/Prompts\u003c/strong\u003e: This dimension reflects teachers’ ability to create AI-assisted tasks and to verify the outputs of AI systems through systematic verification routines. This dimension is based on the practical realities that AI generated text can be fluent but incorrect and that AI systems can automate decisions that incorporate bias or generate unacceptable outputs (OET, 2023). Competency in this area includes selecting prompt types that will elicit pedagogically useful output, iteratively modifying prompts to improve clarity and specificity and evaluating the output of AI systems for accuracy, relevance and alignment with the learning objectives. When evaluating ML outputs, teachers must also evaluate the linguistic appropriateness of the output (the vocabulary load, the syntactical complexity, the pragmatic fit) and ensure that the examples generated by the AI system do not include culturally insensitivities, deficit narratives or misrepresentations of language models. Since this dimension refers to interpretive judgments rather than tool proficiency, items in this dimension should reflect routine practices and decision-making criteria that teachers employ when using AI for materials, examples, translations or feedback.\u003c/p\u003e\n\u003cp\u003e3) \u003cstrong\u003eEthical, Fairness, Privacy and Compliance Issues Related to ML Instruction\u003c/strong\u003e: This dimension reflects teachers’ understanding and enactment of measures to protect students’ rights and interests, ensure the responsible use of data, mitigate biases in AI systems, ensure transparency and assign credit properly. Both the normative and policy aspects of this conceptualization underlie this dimension. AI governance frameworks have identified the potential for AI systems to discriminate against certain groups of people, to operate in a non-transparent manner and to exploit the personal data of users, thus necessitating human-centered protection mechanisms and accountability structures (OSTP, 2022). In educational settings, privacy laws and regulations restrict the types of student data that can be collected and stored by vendors and limit how that data can be accessed and shared by teachers. Teachers who work with ML students also face unique ethical challenges when using AI systems. Because it can potentially reveal a learner’s identity, home language or immigration status. Therefore, confidentiality and purpose limitation are practical requirements rather than abstract ideals in these situations. Thus, this dimension includes familiarity with common compliance boundaries, strategies for limiting data exposure and fairness-oriented practices including monitoring for biased language, stereotyping and/or treating ML students unfairly in AI-assisted recommendations and feedback.\u003c/p\u003e\n\u003cp\u003e4) \u003cstrong\u003eProfessional Judgment and Classroom Governance\u003c/strong\u003e: The fourth dimension examines teachers’ capacity to manage AI use in the classroom including determining when to use, establishing norms for student use, communicating expectations and rationales to students as well as families and documenting their rationale. The U.S. Department of Education has noted that educators already understand the risks and emphasizes that educators must remain in the loop as decisions enabled by AI enter classrooms (OET, 2023). Governance competency includes establishing clear boundaries for acceptable student use creating assignments that maintain academic integrity while using AI productively and communicating transparency to students and families regarding when AI is used and why. This dimension also encompasses teachers’ professional responsibilities to document their rationale for choosing tools, linking AI use to instructional goals and engaging in institution wide decision-making processes. In ML contexts, governance must also be linguistically responsive: norms and rationales must be accessible to students with varying levels of proficiency and policies must not unintentionally penalize ML students for seeking language assistance while still maintaining fair academic expectations.\u003c/p\u003e\n\u003cp\u003eThe four dimensions of this domain model are designed to provide a comprehensive model for understanding teacher performance regarding their ability to teach effectively using technology. This model can support the development of assessments that will be useful in supporting teachers’ professional growth while being grounded in a theory of what it means to be effective with technology.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e2.3 Validity argument\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe purpose of this study is to create an instrument that higher educational institutions as well as educators can utilize to inform professional development decisions. Therefore, the validity argument must identify how the results of the assessment would be interpreted, what uses would be made of the results and the limitations of those interpretations and uses. Kane (2013) and Messick (1995) provide contemporary validity theory and indicate that validity relates to the extent to which there is empirical and theoretical evidence to support the interpretation and utilization of test scores. Validity cannot be attributed solely to the test itself, but to the inferences that are made from test scores within a particular context.\u003c/p\u003e\n\u003cp\u003eThe Standards for Educational and Psychological Testing (AERA, APA, \u0026amp; NCME, 2014) provide criteria for evaluating evidence related to test validity. Specifically, they identify six types of evidence: test content, response processes, internal structure, relations to other variables, test consequences and test fairness.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eProposed Score Interpretation and Use\u003c/strong\u003e: The proposed interpretation of the scale score is that it represents the teacher’s current level of AI literacy relative to ML instruction, specifically their competence along the four identified dimensions. The intended primary use of the results is to provide formative information about professional development needs, inform specific professional development opportunities, monitor teacher growth and evaluate programs at the aggregate level.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eArgument-Based Structure of the Validity Claim\u003c/strong\u003e: An argument-based approach to validating the results of an assessment involves making explicit the inferential chain from the observed responses to the meaning and use of the scores. Specifically, an argument-based approach requires that the assumptions underlying each link in the chain be supported by empirical evidence. Simplified, the inferential chain for the proposed scale is: (1) Teachers interpret each item as intended and respond to the items using stable practice-relevant judgment, (2) the item scores combine to reflect coherent latent dimensions that are consistent with the domain model, (3) the composite and subscale scores demonstrate equivalence across all relevant subgroups and do not exhibit systematic bias, (4) the scores are associated with externally available indicators of responsible AI use intentions and instructional decision quality. Thereby it provides evidence to support the scales of practical utility for directing professional development efforts.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsequential Considerations\u003c/strong\u003e: As Messick (1995) points out, a valid argument for the consequences of utilizing an assessment is essential particularly if the assessment directs actions that impact the distribution of resources or labels of teacher capacity. While the consequences of utilizing the proposed scale are relatively limited, it is necessary to consider them. The intent of the scale is to facilitate equity in building capacity to utilize AI with MLs. The claims that would support the consequence-related aspects of validity include the requirement of transparency in reporting scores, the need for clear guidelines on interpreting results and the importance of cautioning against the use of scores for high-stakes purposes without sufficient additional evidence.\u003c/p\u003e"},{"header":"3. Methods","content":"\u003ch3\u003e\u003cstrong\u003e3.1 Overall design\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThis study employs a convergent mixed-methods approach to explore the use of a survey in order to collect both qualitative and quantitative data. Creswell \u0026amp; Plano-Clark (2018) state that a convergent mixed methods design is ideal for studies that seek both psychometric evidence of an instrument\u0026rsquo;s structure relative to other variables and contextual explanations of how participants interpret and act upon the construct in their practice. In this study, the quantitative strand is developed to estimate the internal structure together with reliability of the new measure of Teacher AI Literacy for multilingual learner instruction, the association between Teacher AI Literacy, the teacher\u0026rsquo;s reported intent to use AI responsibly and scenario-based instructional decision quality. The qualitative strand consists of three open-ended prompts embedded in the same survey as these are intended to capture how teachers describe the use of AI to support ML instruction, how they verify or adapt AI output and what ethical concerns, support needs shape their use of AI responsibly.\u003c/p\u003e\n\u003cp\u003eThis methodology is consistent with an argument-based view of validation in which evidence is collected to support the intended interpretations and uses of scores rather than viewing validity as a characteristic of the instrument. The quantitative component provides evidence regarding the internal structure, reliability and relationships to other variables whereas the qualitative responses provide supporting interpretation of the meaning of scores as they illustrate how teachers conceptualize competent and risky AI use in ML instruction as well as identify contextual constraints that may explain the results of the quantitative analyses.\u0026nbsp;\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.2 Instrument development and construct operationalization\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eTo develop an instrument for measuring the construct: AI literacy for ML instruction as the ability of teachers to choose, assess and use AI tools to help students learn about ML using TESOL aligned pedagogies with ethical standards; this operationalization of the construct is based on domain specific frameworks of teacher knowledge regarding the integration of pedagogy, content and affordances of tools (Mishra \u0026amp; Koehler, 2006); research on AI literacy as a competence-based understanding of practice rather than just programming (Long \u0026amp; Magerko, 2020); and scale development guidelines emphasizing construct definition, iterative refinement of item representativeness (Boateng et al., 2018). The items are developed to measure observable instructional judgments and routines that teachers may reasonably report on themselves.Items are also designed to be specific to the context of ML. All items utilize a consistent five-point Likert response format to decrease respondent burden. Finally, the survey includes embedded open-ended response process questions which allow the researcher to take a pilot validation stance. The qualitative data can provide insight into item meaning and potential sources of misunderstanding while the quantitative analysis provides structure and reliability evidence.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.3 Participants and sampling\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eIn-service teachers of higher educational institutions comprise the population for this study and respondents\u0026rsquo; eligibility is determined by a screening question that asked if they are currently teaching classes containing MLs. The recruitment strategy uses online education forums as well as professional networks to distribute surveys and to find potential participants. The data set contains thirty-two responses from participants who complete the survey. Non-probability samples are subject to the risk of over-representing those who have an interest in both AI and continuing professional development. Therefore, the study provides detailed descriptions of the sample and caution when interpreting the findings. In addition, contextual information including respondent roles, years of teaching experience and estimated percentages of MLs included in the focus class is included to support exploratory subgroup analysis related to the third research question.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.4 Measures\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eAll measures are administered via an online survey. The survey consists of five blocks: screening and teaching context, the core AI literacy scale, responsible AI-use intentions, scenario-based decision quality vignettes and open-ended qualitative prompts. Responses to each item are coded in such a manner that higher values represent greater endorsement of competent practice. When internal structure evidence supports them, domain subscales are calculated as the mean of their constituent items. Since Likert items are ordinal, instead of treating them as continuous normally distributed variables, all psychometric analyses are planned using estimators designed for use with ordinal data.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.5 Procedure\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThe survey begins with a participant informed consent statement. Participants also have the option to withdraw from the study at any point. In addition, screening procedures are used to evaluate the quality of the data collected using the survey.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.6 Quantitative analysis plan\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThe quantitative analysis is performed to provide evidence that supports the psychometric properties of the main scale (RQ1), assesses relationships with external criterion measures (RQ2) and conducts exploratory assessments of the scale within various contexts (RQ3). The analytical techniques used are based on the type of Likert-type ordinal data collected and the size of the samples included in each study. Before conducting an analysis, descriptive statistics and missingness are evaluated for each item of the instrument. In addition, straight lining and implausible speed of completion is identified as possible indicators of inattentive response behavior and the number of cases meeting the criteria for removal is documented for removal.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.7 Qualitative analysis plan\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eBeginning with the identification of open-ended response patterns and meaning, thematic analysis is planned for use to identify meaning patterns in short written responses from teachers who have responded to the survey instrument. As a flexible qualitative research method, thematic analysis provides the researchers with flexibility to allow for the exploration of identified meanings and patterns. This research method begins by moving the data into initial coding.The coding process for this project follows a hybrid approach: the deductive codebook developed from the four domain model provides the initial organization for the coding process and inductive codes are used to capture emerging themes that are not anticipated within the four domains of the model, including time constraints, ambiguities of policies or perceived risks associated with students\u0026rsquo; information privacy.The qualitative outcomes provide detailed descriptions of how teachers engage their students in learning about AI literacy in ML contexts, how teachers assess the accuracy of output from their students, what types of ethical concerns do teachers perceive when working in ML environments and what institutional supports do teachers believe would be beneficial to support them as they teach students about the potential applications and implications of ML.\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.8 Mixed-methods integration plan\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThe integration process is based on the convergent design and is structured to follow a merger strategy to develop meta-conclusions from qualitative and quantitative data related to the research questions. The primary output of the integration process is a joint display of data, grouped by domain and paired with the qualitative thematic representations of how teachers understand or implement the same domain in their classrooms.\u0026nbsp;\u003c/p\u003e\n\u003ch3\u003e\u003cstrong\u003e3.9 Ethics and transparency\u003c/strong\u003e\u003c/h3\u003e\n\u003cp\u003eThe study complies with established ethics guidelines for low-risk educational research using surveys. Respondents are able to participate voluntarily. A written informed consent includes information on the purpose of the study and how the respondent\u0026rsquo;s identity remains confidential. No identifiable student or institutional data is collected as part of this study. All data is maintained securely and reported in aggregate form.\u003c/p\u003e"},{"header":"4. Results","content":"\u003cp\u003e\u003cstrong\u003e4.1 Sample description\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn addition to the qualitative aspects, the quantitative aspects of this study include the data set of 32 participants. The participants’ role positions are divided into four categories including ESL/EFL specialist (n = 11, 34.4%), content teacher (n = 2, 6.3%), instructor (n = 12, 37.5%) and others (n = 7, 21.9%). Respondents report a wide range of teaching experience (from 0 to 7 years), with a large number of respondents reporting 2 years of teaching experience (n = 7, 21.9%) and 3 years of teaching experience (n = 8, 25.0%). With regard to Multilingual Learner (ML) exposure, there is considerable variation in the percentage of MLs that each respondent had taught including 0-10% (n = 5, 15.6%), 11-25% (n = 7, 21.9%), 26-50% (n = 10, 31.3%), 51-75% (n = 3, 9.4%) and 76-100% (n = 7, 21.9%). The majority of the respondents indicate that they have received no prior training or professional development related to Artificial Intelligence (n = 27, 84.4%) and most also state that their institution do not have a clear AI-related policy (n = 27, 84.4%). Thematic analyses using open-ended responses include all n = 32 respondents.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.2 Descriptive results and internal consistency\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe means of the items for Teacher AI Literacy which make up the composite AI Literacy Total Score indicate a general endorsement of AI-supported instructional practices for ML at moderate to high levels (M = 3.39; SD = 0.67 on a 1-5 scale). Patterns in item-level responses suggest the greatest level of support among respondents for those practices aligned with the use of AI in a manner that is both responsible and sensitive to MLs. For example, respondents express strong support for the use of AI to ensure that language learning objectives are appropriately aligned with the proficiency level of ML learners (M = 3.88; SD = 0.75) and for intending to use AI responsibly. In addition, the reliability estimates for the 12-item AI Literacy Scale are Cronbach’s alpha = .88 suggesting a high degree of reliability. Finally, the reliability estimate for the Responsible AI Use Intentions items is Cronbach’s alpha = .82 and the mean of the four intentions items indicates an overall composite of M = 3.96, SD = 0.63 suggesting a high level of intention to use AI responsibly. Among the individual items assessing the intentions of respondents to use AI responsibly, respondents endorse the highest level of intent to avoid sharing sensitive student data (M = 4.25; SD = 0.62) and to verify the accuracy of AI-generated content before using it in the classroom (M = 4.09; SD = 0.89) followed by endorsing institutional policies even if inconvenient (M = 3.94; SD = 0.67) and teaching students about responsible AI use as part of instructional practice (M = 3.59; SD = 1.01). Finally regarding criterion performance, the total scores for the vignette-based decision quality assessments of respondents’ ability to make decisions based on best-practices for AI use ranged from M = 3.38; SD = 1.19 and thus, respondents choose best-practice options in many scenarios with considerable variability in terms of individual differences.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.3 Scale structure: exploratory factor analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eExploratory Factor Analysis (using principal axis factoring with an Oblimin rotation) indicate that sampling adequacy is within the acceptable range (KMO = .655) and that there is a statistically significant Bartlett’s test of sphericity indicating that the data are suitable for factorability. Three factors emerge in the solution (eigenvalues greater than 1) which explain about 69.1% of the variance in the solution. The first factor include items that represent how ML oriented instructional use and appropriateness checks are. The second factor represent the items related to reflective and pedagogical governance of AI use. The third factor represents the responsible-use intentions of items along with one other instructional item. Overall, the factor solution appear to suggest that the respondents’ responses regarding AI organized into (a) the focus on ML instructional applications and evaluations, (b) the reflective/ethical governance and classroom norms related to the use of AI and (c) the responsible-use intentions and alignment with institutional policies.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.4 Validity evidence: associations with intentions and decision quality\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eInitial validity expectations are supported based on the associations observed among AI literacy, responsible-use intentions and the quality of decisions made when using AI to make decisions in the vignettes. The Spearman correlation coefficients demonstrate that there is a moderate positive relationship between the Total AI Literacy and the Total Responsible AI-use intentions (ρ = .56, p = .001); thus, those who have a higher total AI literacy practice also have stronger intentions to verify output, protect students’ data and follow policy. Similarly, Total AI Literacy is positively correlated with Vignette Decision Quality (ρ = .38, p = .032); therefore, the higher the reported AI literacy, the higher the likelihood that a teacher would select the best-practices instructional decisions in the applied scenarios. Additionally, Total Responsible-use intentions are strongly positively correlated with Vignette Decision Quality (ρ = .63, p \u0026lt; .001); therefore, the stronger the commitment expressed by teachers to responsible use, the stronger their applied decision making in the ML relevant vignettes. The regression analyses provide additional support for these relationships. Total AI Literacy predict Total Responsible AI-use intentions (R² = .32); therefore, the higher the self-reported literacy score, the higher the intentions to responsibly use AI. Additionally, Total AI Literacy predict Vignette Decision Quality (R² = .13); therefore, while the relationship is relatively small compared to the relationship between literacy and intentions, it is still statistically significant and demonstrate that self-reporting of literacy is predictive of applied best-practices selections.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.5 Preliminary subgroup checks\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDue to the unevenly sized subgroups and the small number of participants in several categories, the subgroup analyses should be considered exploratory. A Mann-Whitney U Test comparing the dichotomized Role Grouping Variable finds a statistically significant difference in Total AI Literacy between the two role groups (U = 33.00, Z = -2.63, p = .009) with the larger role group (n = 25) having higher literacy scores than the smaller role group (n = 7). However, the subgroup differences in Total Decision Quality are not statistically significant for either the ML Exposure Grouping (p = .793) or the Prior AI Professional Development Grouping (p = .172). However, these analyses have limited power due to the small sample size in both the High ML Exposure Group (n = 4) and the “any AI PD” Group (n = 5). Therefore, these results can be viewed as preliminary indicators and may not be generalizable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.6 Qualitative thematic findings from open-ended responses\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe thematic analysis of the open-ended responses reveals five major themes that define and extend the quantitative patterns described above. First, teachers view AI as a tool for ML specific scaffolding/differentiation, particularly vocabulary support, simplifying complex content and reducing cognitive load while maintaining rigorous discipline. One respondent notes that he has used AI to generate tier 2 vocabulary. However, he revises the generated output because the output is a bit too academic and adds simplified language along with primary language translations to support beginning level MLs. Another respondent explains how he/she takes an AI-generated math explanation and breakes it down into five short, bulleted steps to reduce the cognitive load as the original output is too dense.\u0026nbsp;\u003cbr\u003e\u0026nbsp;Second, respondents stress the importance of verification, safety and accuracy check prior to classroom use. This mirrors the quantitative responses related to verification intentions that are highly endorsed by respondents. A science teacher reports when AI generates a lab summary, he fact-checks the data against a textbook to ensure that it is not hallucinating, referring to an incident in which the model describes an experiment that is not physically possible. Similarily, another respondent describes how he/she requests multiple explanations to test consistency, stating: “If the model provides conflicting explanations, I know I will need to exercise caution when using the output”.\u003cbr\u003e\u0026nbsp;Third, teachers repeatedly cite cultural sensitivity/awareness, bias and loss of linguistic nuance as significant areas of concern, especially with regards to multilingual/international students. Participants describe being vigilant to identify “stereotypes” and “missing perspectives” and express frustration regarding the inability of AI to maintain pragmatic meaning. Another respondent frame this issue as a larger equity issue citing the possibility of AI imposing a standard American format that does not consider diverse rhetorical styles.\u003cbr\u003e\u0026nbsp;Fourth, participants cite privacy, data governance and institutional clarity as necessary conditions for responsibly using AI in education. These sentiments mirror the quantitative responses in relation to protecting sensitive student data. One Participant states that he is genuinely terrified about what happens to student data after it is inputted into those boxes and asked for a clearly defined departmental “Green List” of trusted tools that protect student privacy. Another respondent directly link responsible AI implementation to institutional safeguards citing that institutions should provide “designated, trustworthy AI tools that prioritize student privacy” in conjunction with providing clear guidelines to both students and faculty.\u003cbr\u003e\u0026nbsp;Fifth, respondents describe pedagogical boundaries to limit dependency on AI, frequently connected to academic integrity and the student learning process. As such, one teacher worries that students are losing prewriting skills like brainstorming and outlining and suggested policy requiring handwritten notes or early drafts alongside any AI assisted work. Another concern is that excessive reliance on AI can lead to fossilization of grammar errors and advocate for training on hybrid assignments where AI is limited to the outlining phase, not the final draft.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e4.7 Mixed-methods integration\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe qualitative themes across all datasets correlate positively with the quantitative response profile. Quantitatively, respondents report high intentions to verify AI content and not share sensitive student data. These same commitments are reflected qualitatively in respondents describing fact-checking routines, textbook triangulation and explicit privacy concerns. The positive association between AI literacy scores, responsible use intentions and vignette decision quality are further contextualized by teachers’ narratives that effective AI use for MLs requires both instructional adaptation (simplification/chunking/addition of visual aids/register adjustment) and professional judgment (bias checks/safety screening/policy adherence). The qualitative findings extend the quantitative results in defining why the practices matter in ML contexts, specifically the threat of cultural and pragmatic mismatches, the potential erasure of multilingual rhetorical identities and the necessity for institutional supports to support equitable and responsible classroom integration of AI.\u003c/p\u003e"},{"header":"5. Discussion and implications","content":"\u003cp\u003e\u003cstrong\u003e5.1 Answering the research questions directly\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe domain model of AI literacy for ML instructional applications has an interpretable, correlated multi-factor structure. It resembles the proposed domain model of AI literacy. However, the results of this study provide evidence of practical refinements to the model. Instead of being completely separate and distinct, the data supports a more consolidated structure of AI literacy for ML instructional applications where (a) pedagogical ML application, (b) critical evaluation and professional judgment along with (c) responsible-use orientation and governance all appear to be very closely related dimensions. Teachers in the qualitative strand further support this consolidation of dimensions. Teachers describe their understanding of AI literacy as “doing” (designing ML scaffolds), “checking” (verification and appropriateness routines) and “governing” (privacy, norms, bounded-use decisions). Therefore, the combined evidence from both strands of the study indicates that AI literacy for ML instructional applications can be conceptualized as a single competency system and not merely as general AI confidence.\u003c/p\u003e\n\u003cp\u003eThere is an association found between higher AI literacy scores and higher levels of responsible-use intentions and better vignette decision quality. This association is supported by teachers who describe robust verification routines, privacy safeguards and classroom norms. They also describe more restricted boundaries around when and how AI can be used with MLs which are consistent with their higher intention scores and better scenario-based decisions. It should be noted, however, that these results are best considered as preliminary/directional evidence of validity due to the pilot sample size, self-report measurement and cross-sectional design.\u003c/p\u003e\n\u003cp\u003eThis study has provided only preliminary/exploratory information about the context portability of the instrument. There are educators across roles and ML exposure levels included in the sample. Exploratory subgroup checks indicate that there may be group differences within subgroups and/or there may be no differences. However, subgroup size is too small to draw conclusions. Therefore, the most conservative statement that can be made is that the instrument has potential for use in different contexts. However, the equivalency of meaning and measurement of the instrument across contexts needs to be tested in larger samples.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5.2 Contributions to U.S. education\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research will help develop a domain-specific measure of ML-based on TESOL pedagogy, ethics and governance for measuring AI Literacy in U.S. multilingual education contexts. The issue of the problem in the multilingual education context is not simply whether teachers feel comfortable with using an AI tool, but rather whether teachers are able to use an AI tool to assist students in developing their languages and accessing content while protecting student data/privacy, promoting fair and transparent uses of ML systems by students and/or faculty and ultimately supporting ethical decision-making about AI in classrooms. Therefore, operationalizing AI Literacy as ML-related competence (scaffolding design, verification and bias detection, privacy aware decision-making and classroom governance) provides a means to help institutions move beyond general measures of Technology Readiness to provide instructionally and ethically actionable indicators.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5.3 Practical implications\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThese results provide many immediately applicable uses for the instrument in areas of teacher education, professional development and program evaluation, so long as the instrument is viewed as a developmental assessment rather than a static measure.\u003c/p\u003e\n\u003cp\u003eBaseline → targeted PD → growth monitoring\u003c/p\u003e\n\u003cp\u003eUse the instrument to create a baseline of teacher capacity, identify the areas of greatest need and reassess after the intervention to determine how much teachers have grown. The qualitative results also provide examples of what growing looks like: verifying, adapting and governing their use of AI and ML with more deliberate routines and more consistent protection of student data when using these technologies.\u003c/p\u003e\n\u003cp\u003eDimensional PD Mapping\u003c/p\u003e\n\u003cp\u003eSince the quantitative and qualitative analysis suggest a doing/checking/governing paradigm, professional development can be organized around modules based on these dimensions.\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eVerification Routines and Quality Control: Fact checking, Cross tool triangulation, Readability checks, Detecting fabricated references, Auditing outputs for pragmatic/register fit.\u003c/li\u003e\n \u003cli\u003eDesigning ML Scaffolding for Use with AI: Transforming drafts from AI into leveled supports (sentence frames, vocabulary supports, chunking, bilingual scaffolds) while preserving disciplinary rigor and alignment to language objectives.\u003c/li\u003e\n \u003cli\u003eSafeguards for Privacy, Bias and Fairness: De-identifying practices, Tool vetting, Stereotype/bias screening, Cultural responsiveness Checks and translation/nuance risk management (dialect/low resource issues).\u003c/li\u003e\n \u003cli\u003eEstablishing Classroom Norms and Governance: Transparency and disclosure practices regarding “AI as Tutor” or “AI as Ghostwriter”, Designing assignments that preserve productive struggle and learner voice, Documenting/reflecting habits for professional accountability.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eBy organizing professional development in this manner, the instrument provides precision PD which helps educators and administrators develop targeted training programs that do not rely on one-size-fits all training and allocate resources according to the educator’s weaknesses in competence and the level of risk associated with their students’ exposure to AI and ML.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5.4 Equity and ethics implications\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe researchers emphasize that the context of ML learning requires more than just attention to ethics since the language status of students’ input can increase their vulnerability to privacy violation, bias/stereotyping in content and the pragmatic and cultural mismatch of how they interact with AI. Qualitative findings especially highlight two areas of potential inequity to consider in terms of implementation: (1) the possibility that the way AI standardizes language erases the multilingual voices and cultural nuances and (2) unequal access to high-quality tools and instruction which could exacerbate the existing opportunity gap. Therefore, this tool needs to be used as a capacity-building tool for educators to assess their ability to protect their students’ rights and to advocate policy and infrastructure to provide them with the necessary support and resources. Ethically using this tool would include being transparent. Educators and educational institutions should view developing AI literacy as part of ongoing professional growth and development with clear expectations, supportive coaching and measures to prevent surveillance-style evaluation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e5.5 Limitations and future research\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe study design relies on a single administration of a survey. AI literacy and intention to adopt AI are self-reporting variables, therefore the results may be influenced by social desirability bias as well as respondent’s interpretation of the individual items. Besides, the decision quality of the vignettes has its own limitations due to expert-defined best options and the fact that decision quality is highly dependent on the specific context. However, future studies should recruit a larger, more diverse sample to enable robust CFA, evaluate alternative models (higher-order or bifactor structures) and test measurement invariance across roles, policy contexts as well as ML exposure. They may examine sensitivity to change via pre/post designs around targeted PD. They also may strengthen the validity argument by linking scores to classroom artifacts (lesson plans, prompts, feedback samples, modified AI outputs) and, where feasible, observational indicators of verification and governance routines.\u003c/p\u003e"},{"header":"6. Conclusion","content":"\u003cp\u003eThe study finds evidence for each of the hypotheses stated above, as well as for the construct's reliability and validity as an assessment of teacher AI literacy. Overall, the study finds evidence for the feasibility and potential utility of the proposed measure of teacher AI literacy while also identifying several important methodological and theoretical challenges that will be addressed in future studies.\u003c/p\u003e\n\u003cp\u003eThese findings provide preliminary evidence for the scale's coherence, interpretability and applicability in a variety of contexts. In addition, the study finds that teachers who are more literate about AI tend to report greater intention to use AI responsibly in their teaching and also tend to make higher-quality decisions about whether or how to use AI in vignettes depicting common scenarios in ML classrooms. The study also finds strong preliminary evidence for the internal structure of the proposed measure of teacher AI literacy as well as for the relationships between teacher AI literacy and two key aspects of teacher practice: responsible AI-use intentions and instructional decision-making quality. The study therefore provides a starting point for further research into the measurement of teacher AI literacy and it suggests several avenues for such research. More specifically, the study supports the use of the instrument as a baseline for measuring teacher AI literacy prior to professional development as a means of evaluating the effects of targeted professional development on teacher AI literacy and as a means of tracking the growth of teacher AI literacy over time.\u003c/p\u003e\n\u003cp\u003eWhile the study provides some initial validation evidence for the scale and offers some preliminary insights into what teacher AI literacy looks like in practice, there are still many unanswered questions regarding the measurement of teacher AI literacy as well as its relationship to other constructs and its impact on teacher practice and student learning. Future studies should include larger and more diverse samples of teachers, additional sources of validity evidence and more rigorous tests of scale stability and invariance.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData and/or Code availability\u003c/strong\u003e:\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eDe-identified data, codebook and analysis scripts are available in figshare at https://doi.org/10.6084/m9.figshare.31427369\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthical statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study involved an anonymous, minimal-risk survey of adult educators. No directly identifying information was collected and participation was voluntary. Prior to beginning the survey, participants reviewed an informed-consent statement describing the purpose of the study, procedures, risks/benefits and their right to stop at any time without penalty. Data were stored securely and analyzed in de-identified form and only aggregated results are reported.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflict of interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors received no external financial support for this research, authorship, and/or publication of this article.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors thank the educators who participated in the survey, shared their experiences and perspectives on responsible generative AI use in multilingual learner instruction. The authors also appreciate the colleagues who offered constructive feedback on the study materials and instrument wording during development. Any remaining errors or limitations are the responsibility of the authors. No external grant funding was reported for this work.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003e\u003cem\u003eA call to action for closing the digital access, design, and use divides. 2024 National Educational Technology Plan.\u003c/em\u003e Office of Educational Technology, US Department of Education. (2023, December 31).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eAmerican Educational Research Association (AERA), American Psychological Association, \u0026amp; National Council on Measurement in Education. (2014). \u003cem\u003eStandards for educational and psychological testing\u003c/em\u003e.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eBoateng, G. O., Neilands, T. B., Frongillo, E. A., Melgar-Qui\u0026ntilde;onez, H. R., \u0026amp; Young, S. L. (2018). Best practices for developing and validating scales for health, social, and behavioral research: A primer. \u003cem\u003eFrontiers in Public Health\u003c/em\u003e, \u003cem\u003e6\u003c/em\u003e, 1\u0026ndash;18. https://doi.org/10.3389/fpubh.2018.00149\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eCreswell, J. W., \u0026amp; Plano Clark, V. L. (2018). \u003cem\u003eDesigning and conducting mixed methods research\u003c/em\u003e (3rd ed.). SAGE Publications.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eDeVellis, R. F. (2017). \u003cem\u003eScale development: Theory and applications\u003c/em\u003e (4th ed.). SAGE.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eKane, M. T. (2013). Validating the interpretations and uses of test scores. \u003cem\u003eJournal of Educational Measurement\u003c/em\u003e, \u003cem\u003e50\u003c/em\u003e(1), 1\u0026ndash;73. https://doi.org/10.1111/jedm.12000\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eLong, D., \u0026amp; Magerko, B. (2020). What is AI literacy? Competencies and design considerations. \u003cem\u003eProceedings of the 2020 CHI Conference on Human Factors in Computing Systems\u003c/em\u003e, 1\u0026ndash;16. https://doi.org/10.1145/3313831.3376727\u003c/li\u003e\n \u003cli\u003eMessick, S. (1995). Validity of psychological assessment: Validation of inferences from persons\u0026rsquo; responses and performances as scientific inquiry into score meaning. \u003cem\u003eAmerican Psychologist\u003c/em\u003e, \u003cem\u003e50\u003c/em\u003e(9), 741\u0026ndash;749. https://doi.org/10.1037/0003-066X.50.9.741\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eMiao. (n.d.). \u003cem\u003eGuidance for generative AI in Education and research\u003c/em\u003e. UNESCO.org.\u0026nbsp;https://www.unesco.org/en/articles/guidance-generative-ai-education-and-research\u0026nbsp; \u0026nbsp;\u003c/li\u003e\n \u003cli\u003eMishra, P., \u0026amp; Koehler, M. J. (2006). Technological pedagogical content knowledge: A framework for teacher knowledge. \u003cem\u003eTeachers College Record, 108\u003c/em\u003e(6), 1017\u0026ndash;1054. https://doi.org/10.1111/j.1467-9620.2006.00684.x\u003c/li\u003e\n \u003cli\u003eNational Academies of Sciences, Engineering, and Medicine. (2017). \u003cem\u003ePromoting the educational success of children and youth learning English: Promising futures\u003c/em\u003e. The National Academies Press. https://doi.org/10.17226/24677\u0026nbsp;\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eNowell, L. S., Norris, J. M., White, D. E., \u0026amp; Moules, N. J. (2017). Thematic analysis: Striving to meet the trustworthiness criteria.\u0026nbsp;\u003cem\u003eInternational Journal of Qualitative Methods\u003c/em\u003e, \u003cem\u003e16\u003c/em\u003e, 1\u0026ndash;13.\u0026nbsp;https://doi.org/10.1177/1609406917733847\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eOECD. (2023). \u003cem\u003eOECD digital education outlook 2023: Towards an effective digital education ecosystem\u003c/em\u003e. OECD Publishing. https://doi.org/10.1787/c74f03de-en\u003c/li\u003e\n \u003cli\u003eShulman, L. S. (1986). Those who understand: Knowledge growth in teaching. \u003cem\u003eEducational Researcher\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(2), 4\u0026ndash;14. https://doi.org/10.2307/1175860\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eU.S. Department of Education, Office of Educational Technology (OET). (2023). \u003cem\u003eArtificial intelligence and the future of teaching and learning: Insights and recommendations\u003c/em\u003e. \u0026nbsp;\u003c/li\u003e\n \u003cli\u003eU.S. Department of Education. (n.d.). \u003cem\u003eFERPA (Family Educational Rights and Privacy Act)\u003c/em\u003e. https://studentprivacy.ed.gov/ferpa\u0026nbsp;\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eWhite House Office of Science and Technology Policy (OSTP). (2022, October). \u003cem\u003eBlueprint for an AI Bill of Rights: Making automated systems work for the American people\u003c/em\u003e. The White House. https://www.whitehouse.gov/ostp/ai-bill-of-rights/\u0026nbsp;\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eWIDA. (2020). \u003cem\u003eWIDA English language development standards framework, 2020 edition: Kindergarten\u0026ndash;grade 12\u003c/em\u003e. Board of Regents of the University of Wisconsin System. \u0026nbsp;\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"teacher AI literacy, multilingual learner instruction, scale development, validity evidence, mixed-methods, TESOL-aligned pedagogy","lastPublishedDoi":"10.21203/rs.3.rs-9215817/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9215817/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eHigher education institutions require a methodologically sound measure of teachers\u0026rsquo; ability to utilize GenAI in the classroom. Current measures focus on general competencies and fail to address the specific pedagogical requirements, instructional judgments and ethical considerations. The purpose of this study is to develop, test and analyze the factor structure of a measure of teacher AI literacy for ML instruction and examine its relationship with teacher responsible-use intentions and vignette-based decision making. Utilizing a convergent mixed-methods design, this study surveys 32 in-service teachers. Descriptive statistics and reliability are calculated for all quantitative scales. EFA is conducted on the AI literacy scale. Pearson correlations are calculated to examine relationships among the quantitative variables. Regression models are also estimated to predict responsible-use intentions and vignette-based decision-making. Thematic analysis is employed to analyze the qualitative responses. The results indicate moderate to high levels of AI literacy, high internal consistency reliability and high levels of responsible-use intentions. The exploratory factor analysis results in a three-factor solution explaining 69.1% of the variance. Finally, AI literacy is positively correlated with both responsible-use intentions and vignette-based decision-making. These findings provide empirical support for developing domain-specific assessments and underscore AI literacy as more than just general AI confidence.\u003c/p\u003e","manuscriptTitle":"Teacher AI Literacy for Multilingual Learner Instruction: Scale Development and Initial Validity Evidence","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-26 05:42:20","doi":"10.21203/rs.3.rs-9215817/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"116ea6a5-65d5-4d80-a707-27e72e95c47e","owner":[],"postedDate":"March 26th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-03-26T05:42:20+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-26 05:42:20","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9215817","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9215817","identity":"rs-9215817","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-4.0