Comparing Videoconferencing and Human-to-Machine Modes in Speaking Assessment: Holistic Ratings, Analytical Measures, Psychological Factors, and Washback Effects | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Comparing Videoconferencing and Human-to-Machine Modes in Speaking Assessment: Holistic Ratings, Analytical Measures, Psychological Factors, and Washback Effects Koki Sekitani, Ryotaro Mitsuta This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7530331/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This study investigates the differences between videoconferencing and human-to-machine modes in speaking assessment, focusing on holistic ratings, analytical measures, psychological factors, and washback effects (impacts of a test on the teaching and learning). Thirty-eight Japanese learners of English completed both test modes and a questionnaire. They received higher holistic rating scores in the semi-direct mode than in the videoconferencing mode. The semi-direct mode exhibited superior syntactic complexity but sacrificed accuracy and fluency. Participants strongly preferred the videoconferencing mode, and felt that it fostered better learning behaviors, whereas the semi-direct mode encouraged concrete learning strategies focused on accuracy and prepared templates. speaking test videoconferencing mode semi-direct mode CAF measures complexity accuracy fluency psychological factors washback effect Figures Figure 1 Introduction The integration of technology into language testing has accelerated rapidly in recent years, especially with the growing demand for remote testing solutions during the COVID-19 pandemic (Nakatsuhara et al., 2021). Many English proficiency tests are now offered in computer-based formats or have fully transitioned to digital delivery (Alzahrani, 2020; Qian, 2009). Researchers have examined the comparability of speaking tests across different modes of delivery, including face-to-face, videoconferencing, and human-to-machine (semi-direct) formats. For instance, studies have found general score equivalence between face-to-face and semi-direct tests such as the ACTFL OPI and SOPI (e.g., Kenyon & Tschirner, 2000; Stansfield & Kenyon, 1992), as well as between videoconferencing and face-to-face IELTS interviews (Nakatsuhara et al., 2017, 2021). Mullooly and Glasson (2023), analyzing data from the same Cambridge research series, reported subtle functional differences across modes despite overall score similarity. However, while mode equivalence has been explored in selected pairwise comparisons, direct empirical comparisons between videoconferencing and semi-direct speaking tests remain rare. Additionally, language output is known to be sensitive not only to delivery mode but also to task type (e.g., storytelling vs. question-based tasks), which is often confounded with mode (Glasson, 2022; Zhang & Jin, 2021). Prior research has shown that task familiarity and interactivity shape both linguistic complexity and interactional competence, especially when test takers encounter unfamiliar or less scaffolded formats (Glasson, 2022). Furthermore, differences in mode can influence interactional dynamics. For example, latency in videoconferencing can hinder natural turn-taking, potentially affecting discourse features and performance judgments (Seuren et al., 2021). These findings suggest that delivery mode and task type jointly shape the speaking construct being assessed, underscoring the need for systematic comparisons. Given the widespread adoption of both videoconferencing and semi-direct speaking assessments for large-scale testing and placement purposes, it is essential to examine their comparability not only in terms of scores but also in terms of language output and test-taker perceptions. This study therefore investigates two speaking tests in Japan—the Standard Speaking Test (SST; videoconference-based) and Telephone Standard Speaking Test (TSST; semi-direct, telephone-based)—to examine how mode and format influence holistic ratings, analytical measures, and test-takers’ psychological and attitudinal responses. Modes for speaking assessment Speaking assessments can be delivered in three formats: direct, semi-direct, and indirect (Clark, 1979; O’Loughlin, 2001; Qian, 2009). In an indirect test, the examiner measures underlying skills related to speaking without eliciting actual speech. For example, Lado (1961) once proposed assessing pronunciation via a written test, a discrete-point technique typical of mid-20th-century language exams (Shohamy, 1998). Such methods are now largely considered outdated due to validity concerns (O’Loughlin, 2001). Hence, truly indirect speaking tests have fallen out of common use. In contrast, the direct mode requires the test-taker to speak with a live interlocutor, performing the target speaking skills in real time (Hughes, 2003). Typically, this takes the form of a face-to-face interview or conversation with an examiner (Luoma, 2004). Many educators consider the direct interview format to closely approximate real-life communication, although the interaction in a test setting is more structured than an ordinary conversation (van Lier, 1989; Glasson, 2022). Direct speaking tests have historically been viewed as the most authentic and valid means of assessing oral ability, often enjoying high face validity with both test-takers and score users. For instance, Clark (1979) argued that the face-to-face interview yields the truest measure of speaking proficiency. Moreover, because direct tests involve reciprocal interaction, they can engage the test-taker’s interactional competence—the ability to co-construct discourse through turn-taking, topic management, feedback, and other interactive skills (Galaczi & Taylor, 2018). This interactional dimension is an important component of speaking proficiency that is not directly accessed by more mechanistic test formats. Semi-direct testing delivers the prompts through an audio/video recording or computer program instead of a human interlocutor (Clark, 1979). In a semi-direct exam, test-takers respond to pre-recorded or on-screen prompts, and their spoken responses are recorded for later evaluation by raters. Many modern speaking tests use this format: for example, the TOEFL iBT Speaking section and Cambridge Linguaskill Speaking test both present tasks via computer and require candidates to speak into a microphone, after which the recordings are scored (either by human examiners or by automated systems). Semi-direct formats make it practical to administer speaking tests to large groups under standardized conditions. However, because the candidate is essentially speaking monologically (to a microphone or computer) rather than interacting with a person, this mode cannot capture the co-constructed nature of conversation. Important interactional features like real-time turn-taking or negotiation of meaning are absent in semi-direct tasks. This lack of a live interlocutor has raised questions about the authenticity of semi-direct tests and whether they fully elicit a test-taker’s communicative competence. Semi-direct speaking tests are nevertheless widely used in high-stakes contexts due to their logistical efficiency and scoring consistency. Recently, videoconferencing has emerged as a popular medium for speaking assessments, blending aspects of direct and semi-direct modes. In a video-mediated test, the examiner and candidate engage in a real-time spoken interaction, but via webcams and microphones from different locations. This format grew rapidly during the COVID-19 pandemic as institutions sought remote testing solutions (Nakatsuhara et al., 2021). A videoconference interview preserves the live, synchronous dialogue of a direct face-to-face test while offering practical advantages such as reduced travel and easier scheduling (Nakatsuhara et al., 2017). In fact, we classify the videoconferencing format as a type of direct speaking test in our study, since it retains a person-to-person interaction (albeit through a screen) that allows for spontaneous back-and-forth exchange. Figure 1 illustrates how the videoconferencing format fits into the framework of speaking test modes alongside traditional in-person (direct) and semi-direct approaches. By enabling remote yet interactive oral exams, videoconferencing is expanding the reach of direct speaking assessment while still tapping into test-takers’ interactional skills. Holistic ratings Previous research on speaking assessment has distinguished between different test delivery modes, particularly semi-direct computer-delivered formats and videoconferencing interviews. These represent distinct points on the interaction continuum. The semi-direct format elicits monologic responses, aligning with psycholinguistic perspectives that emphasize individual processing (Van Moere, 2012). In contrast, the VC mode involves real-time dialogue, tapping socio-interactional competence, where shared understanding is co-constructed (Roever & Kasper, 2018). While both modes are operationally common, the interactions are fundamentally different. The semi-direct mode lacks real-time response or negotiation of meaning, whereas VC interviews involve live interlocutors, enabling intersubjectivity. The presence or absence of interaction and non-verbal cues (e.g., facial expressions) may affect speaking performance and rating outcomes. Research has shown that test-takers achieve similar holistic scores across formats. Nakatsuhara et al. (2021) found no statistically significant differences in IELTS Speaking scores across face-to-face and computer-mediated modes, though test-takers could only ask for clarification in live interviews. However, even with equivalent scores, delivery mode factors such as latency or absence of listener feedback could influence raters’ perceptions. For example, slight video delay in VC may disrupt turn-taking, while semi-direct tests lack responsiveness. The rating model also affects outcomes. Khabbazbashi and Galaczi (2020) showed that holistic scoring yielded different CEFR levels compared to analytic or part-by-part scoring, sometimes by 30–50% of cases. Holistic scoring may overgeneralize performance, masking task-specific strengths. Thus, we used holistic ratings as they reflect operational practice, but we also examined other evidence. CAF measures In addition to holistic scoring, prior studies have employed complexity, accuracy, and fluency (CAF) measures to provide more detailed insights into speaking performance. These objective indices capture features such as clause length, lexical diversity, error ratios, and pause time. They are not subject to rater interpretation and provide finer-grained insight into linguistic performance. CAF helps identify whether modes elicit different language patterns. For instance, the monologic semi-direct mode may encourage longer utterances, while the interactive VC format might lead to more real-time adjustments or use of formulaic language. Differences would appear in CAF results. CAF analysis also adds construct validity. Yan and Staples (2023), in an IELTS study, found that CAF measures distinguished proficiency levels and informed rater scale development. In our study, similar CAF profiles across modes would support construct comparability; differences may indicate unique demands in each mode. This dual approach—holistic scores and CAF indices—helps determine whether the presence of interaction in VC interviews affects language output or rating. It also addresses a research gap regarding how semi-direct and VC formats compare in both scoring and performance features. Psychological factors Examinees’ psychological reactions to different speaking test modes have been investigated from various perspectives, including anxiety, perceived difficulty, and mode preference. Krashen’s (1985) Affective Filter Hypothesis suggests that negative emotional states such as anxiety can hinder language performance. In several studies, examinees were found to prefer face-to-face speaking tests over technology-mediated formats (e.g., James, 1988; Kiddle & Kormos, 2011; McNamara, 1987; Shohamy et al., 1993). Regarding comparisons between face-to-face and videoconferencing modes, Glasson and Devine (2023) found that examinees favored the face-to-face mode as they could express their English ability more effectively; however, the effect sizes in that study were small. Anxiety has been proposed as a key reason for mode preference. Qian (2009) suggested that the physical presence of an interlocutor in face-to-face settings may help reduce psychological barriers. Du and Zhang (2022) argued that the affective benefits of interlocutor presence persist in videoconferencing modes, as indicated by positive affective responses. Conversely, Song (2014) noted that the absence of interlocutors in semi-direct modes may lower anxiety for some examinees. Glasson and Devine (2023) reported that examinees with lower proficiency levels tended to feel greater anxiety in face-to-face interactions. Perceptions of test difficulty also vary by mode. Elder et al. (2002) emphasized that perceived task difficulty is multidimensional and correlates with performance outcomes. Zhou (2012) reported that examinees who viewed computer-based semi-direct tests as easier generally performed better. In contrast, Glasson and Devine (2023) found no significant differences in perceived difficulty between face-to-face and videoconferencing modes. However, their study did not use identical tasks across modes, which may have influenced the comparability of perceptions. Overall, while test scores across modes may be comparable, differences in affective responses and perceived difficulty have been documented, suggesting that test mode can influence examinee experience. Washback effects Language tests can exert washback effects on learning and teaching, as learners often adjust their behavior to meet test demands. Murayama (2006) calls this phenomenon “adaptation to the test,” noting it can lead to problematic study habits and even threaten test validity. Empirical research has examined washback effects in language testing (see Baba, 2019, for a review). In speaking assessments, this influence can lead examinees to adopt test-focused strategies that result in formulaic or contrived performances. For example, Luk (2010) observed that peer-group oral test discourse became “ritualized, contrived and colluded,” as candidates aimed to “maintain the impression of being effective interlocutors for scoring purposes rather than for authentic communication.” Similarly, Lam (2015) found that what passes as interaction in some speaking tests may actually be a canned performance pre-rehearsed during preparation. Such findings illustrate how test preparation and testwiseness can distort the natural flow of communication, with examinees “talking to score” (i.e. prioritizing performance for the rating criteria) and even delivering rehearsed responses. Glasson (2022) further highlights that the issue is not whether assessment prompts contrived interaction, but how different task types modulate this effect. His conversation-analytic study showed that examinees performed markedly differently on a conventional face-to-face task versus a novel online task, raising questions about broadening assessment formats to better elicit interactional competence. These findings underscore a broader concern: speaking assessments, particularly semi-direct or monologic formats, may not fully capture interactional competence as defined in socio-interactional models (Roever & Ikeda, 2022). The direct (face-to-face and videoconferencing) and semi-direct modes rest on fundamentally different assumptions about what constitutes speaking ability—individual production versus interactive negotiation. To better understand how these underlying constructs influence test performance and perception, comparing the modes directly is essential. Notably, no prior studies have directly compared videoconferencing and semi-direct modes, highlighting a critical research gap. Study purpose This study addresses the following research questions (RQs): RQ1: How do holistic ratings, analytical measures, and their relationships differ between face-to-face and semi-direct modes of a speaking test? RQ2: How do the psychological state and washback effects of test-takers differ between face-to-face and semi-direct modes of a speaking test? We employ the SST and TSST for videoconferencing and semi-direct modes, respectively, for two primary reasons: 1) Our study is contextualized in Japan, where these tests are tailored to precisely assess the majority of Japanese learners, whose proficiency typically ranges from CEFR A1 to B1. 2) Using the SST and TSST allows us to directly compare our findings with those of Zhou (2015a), who also used these tests. Method Participants Thirty-eight senior high school students (girls: 31, boys: 7; M _age = 16.55 years, SD = 0.71) in Tokyo, Japan, who were learning English as a foreign language, voluntarily participated in the study. Examiners All speaking performances were rated by official examiners trained in the SST and TSST scoring protocols. Instruments Videoconferencing mode We used the SST for the videoconferencing mode. The SST is originally a face-to-face interview test that assesses English learners’ speaking ability. Developed by the American Council for the Teaching of Foreign Languages (ACTFL) and ALC Press in 1997, it was the first independent speaking test in Japan and was modelled on the ACTFL Oral Proficiency Interview (OPI). The SST is a 10–15-minute interview using a structured conversation between a certified interviewer and examinee; it consists of five stages (ACTFL-ALC Press, 2000). It begins with a casual chat about general topics, such as occupation and hobbies (Stage 1: warm-up), followed by three tasks: picture description (Stage 2), role-playing (Stage 3), and storytelling (Stage 4). After each task, the interviewer asks task-related questions and concludes the interview with additional casual chat (Stage 5, wind-down). This test allows for structured but flexible conversations. The interviewers assess the examinees’ level from Novice to Advanced and select tasks and topics accordingly. They are trained in elicitation techniques for accurate ratings and maintain their skills through ongoing training, testing, and norming procedures. Semi-direct mode We employed the TSST for the semi-direct mode. Developed by ALC Press in 2004 as a practical alternative to the SST, the TSST is an automated speaking test administered over the telephone, allowing for more flexible scheduling. The test is available at any time and can be taken using a landline. It consists of ten pre-set monologic tasks randomly selected from a large pool of recorded tasks. Each task is designed to elicit performance appropriate to the speech functions, discourse type, content, and context of the target level. Six tasks are aimed at the intermediate level, and four at the advanced level. Examples of tasks include “Please describe (something)” and “Please talk about the last time you (did something).” Respondents have 45 seconds to complete their answers. While both tests aim to assess oral proficiency, their task types differ in nature and interactional demands: the SST involves dynamic, interpersonal communication, whereas the TSST elicits more structured, monologic responses. The main differences between the tests are summarized in Table 1. Table 1 Summary of major features of the SST and TSST Test Standard Speaking Test (SST) Telephone Standard Speaking Test (TSST) Mode Face-to-face (online video) Telephone-based Task-type Picture description Answering questions on general topics Role-playing Questions 1–6 : Intermediate Story-telling Questions 7–10 : Advanced Task delivery Adaptive Non-adaptive Test time Approximately 15 minutes Ten 45-seconds questions Rating Nine-level holistic scale Based on five categories of assessment criteria: global tasks or functions, text type ability, accuracy, contexts and content areas Two or three trained raters Data collection The 38 participants completed both the SST and TSST on the same day. Half of the participants were randomly assigned to complete the SST first, and the other half to complete the TSST first, to counterbalance the order effects. Technical difficulties prevented some participants from completing the test or following the intended schedule, and they were excluded from the analysis. Thus, 33 participants (28 girls, 5 boys) were included in the analysis. The final sample included 18 participants who completed the SST first, and 15 who completed the TSST first. After completing both tests, participants answered questionnaires via Google Forms to collect data on psychological factors and washback effects. Holistic ratings of the SST The SST records all interviews and rates them using a nine-point holistic rating scale with two raters. Scores are assigned for Stages 1–4 as well as for overall performance. If the two raters disagreed on the overall score, a third rater’s opinion was sought. The final rating was the one agreed upon by two of the three raters. The SST scale evaluates English language proficiency on the basis of five categories of assessment criteria (ACTFL-ALC Press, 2000): (a) global tasks or functions (asking and answering simple questions, narrating, describing in major timeframes), (b) text type ability (from words and sentences to complex sentences, paragraphs, and extended discourse), (c) accuracy (the combination of grammar, vocabulary, pronunciation, fluency), (d) contexts (from common everyday situations to more complex social situations), and (e) content areas (from personal topics to a wide range of general interest topics). A holistic rating was assigned to each stage, and the overall level was determined by considering all assessment categories. Holistic ratings of the TSST Three raters evaluated each examinee’s responses to the TSST tasks using criteria identical to those of the SST evaluation. Each rater scored responses to five of the ten tasks, independently assigning a holistic level based on the examinee’s performance. The first and second raters scored odd and even numbered tasks, respectively, whereas the third rater scored both odd and even numbered items. No weighting or other adjustments were applied among each task. Then the raters assigned a level that represented the examinees’ overall performance on five tasks. The final rating of the TSST was based on the ratings given by the raters. Level Descriptions Each level of the SST and TSST shares the same criteria: Levels 1 and 2 are equivalent to CEFR A1; levels 3 and 4 to A2; levels 5 and 6 to B1; levels 7 and 8 to B2; and level 9 to C1 or higher. For example, “a Level-4 speaker can maintain simple communication by talking about familiar topics and asking simple questions. A speaker at this level can connect simple short sentences to convey his/her thoughts, but fluency is disturbed doing so. With effort, the speaker can manage to respond to what has been asked, but he/she still cannot actively interact. The speaker’s pronunciation and word choices may still be influenced by his/her native language, but the impact is insignificant and listeners used to non-native English speakers would not have trouble understanding him/her” (ALC EDUCATION INC, 2025). CAF measures We transcribed the recorded SST and TSST protocols based on the Kakiokoshi Kihondanwa Tag Fuyo Guideline Version 2.1.3 [Guidelines for Transcription and Tag Attachment Version 2.1.3] (Isahara et al., 2004). Fillers, repeated utterances, and other spoken sounds that did not appear to be directly related to the spoken content were transcribed. We aimed to measure complexity, accuracy, and fluency (CAF), the three major indices of second language proficiency (Housen & Kuiken, 2009). More specifically, following Koizumi and In’nami (2014), who investigated the factor structure of Japanese EFL learners’ CAF using structural equation modelling, we calculated several measures. These included the number of clauses per AS-unit 1 for syntactic complexity (SC), number of error-free clauses per clause for accuracy (A), number of words per minute for speed fluency (F1), and number of disfluency markers (i.e., filler words, repeated utterances, self-corrections, and cut-off utterances) per minute for repair fluency (F2). To ensure a fair comparison between the SST and TSST, test-takers’ responses to the prelude questions in the SST, such as “where do you live,” were excluded from the analysis. These questions were designed to elicit one word or phrase responses, such as “Tokyo,” giving the interviewer sufficient context to ask more in-depth, targeted questions, such as, “Can you talk about your neighborhood in Tokyo?” These questions were designed to elicit more complex and detailed utterances. By excluding responses to the prelude questions in the SST from the analysis, we created a more balanced comparison of speech data of the TSST, which does not include prelude questions. Psychological factors We used four scales: test-taking anxiety, liking for the test, perceptions of test difficulty, and perceptions of test validity. All scales were adapted from Zhou (2012) to allow results to be compared. The “test-taking anxiety” scale comprises three items and measures the degree to which learners felt nervous before and during the test. The “liking for the test” scale consists of four items and measures the extent to which learners prefer the test. The “perceptions of test difficulty” scale includes five items and measures the extent to which learners perceive difficulty and confidence. The “perceptions of test validity” scale comprises three items and measures the extent to which learners believe the test accurately assesses their speaking ability and fairness. All items were rated on a five-point scale ranging from one (“strongly disagree”) to five (“strongly agree”). Participants responded to these items for both the SST and TSST. Washback effects Participants responded to the following open-ended question: “If a test similar to the SST were introduced as part of university entrance English exams, do you think it would influence test-takers’ attitudes towards learning English, their learning behavior, and their motivation? Please describe your experience of taking SST.” They then answered the same question regarding the TSST. These questions were displayed and answered in Japanese. Analyses We conducted paired t tests to examine differences in holistic rating scores, CAF measures, and psychological factors between the two modes. Additionally, we employed correlation and multiple regression analyses to investigate the relationships between holistic rating scores and CAF measures. Furthermore, we analyzed participants’ written responses to the open-ended questions to explore their opinions on how these tests might influence test-takers’ attitudes toward learning English, their learning behavior, and their motivation. Results Holistic rating scores of the two speaking test modes Tables 2 and 3 summarize the key findings, including descriptive statistics and the results of the paired t tests and correlation analysis. Table 2 Cross-tabulation of SST and TSST ratings TSST holistic rating scores SST holistic rating scores 3 4 5 6 7 8 Row total 3 Novice high 1 3 0 0 0 0 4 4 Intermediate low 0 13 2 0 0 0 15 5 Intermediate low plus 0 0 7 1 0 0 8 6 Intermediate middle 0 0 1 4 1 0 6 7 Indermediate middle plus 0 0 0 0 0 0 0 8 Intermediate high 0 0 0 0 0 0 0 Column total 1 16 10 5 1 0 33 Table 3 Holistic rating score agreement SST TSST Rating score mean ( SD ) 4.48 (0.94) 4.67 (0.89) n (%) of SST > TSST 1 (3.0%) n (%) of SST = TSST 25 (75.8%) n (%) of SST < TSST 7 (21.2%) t value (paired) 2.248 p value .032 Effect size (Δ) 0.20 (small effect) Correlation coefficient ( r ) 0.872 The mean holistic rating score was higher for the TSST than for the SST, indicating that the semi-direct mode received higher evaluations than the videoconferencing mode when using the same holistic criteria. CAF measures of the two speaking test modes Table 4 displays the descriptive statistics and the results of the paired t test for the measures of syntactic complexity (SC), accuracy (A), speed fluency (F1), and repair fluency (F2) for the SST and TSST. For subsequent analyses, responses to the prelude questions were excluded, as previously discussed. Table 4 Descriptive statistics for CAF measures in SST and TSST CAF Test Analytical scope Mean SD Min. Max. t p d SC SST Responses to prelude questions INCLUDED 1.13 0.09 1.02 1.35 Responses to prelude questions EXCLUDED 1.19 0.13 1.03 1.57 −10.158 < .001 −1.768 TSST All Responses 1.49 0.22 1.11 1.98 A SST Responses to prelude questions INCLUDED 0.77 0.07 0.62 0.90 Responses to prelude questions EXCLUDED 0.68 0.11 0.46 0.85 3.456 .002 0.602 TSST All Responses 0.62 0.11 0.42 0.85 F1 SST Responses to prelude questions INCLUDED 83.67 15.36 61.41 129.20 Responses to prelude questions EXCLUDED 83.58 17.18 59.67 133.93 17.537 < .001 3.053 TSST All Responses 60.93 18.04 38.40 113.20 F2 SST Responses to prelude questions INCLUDED 20.27 7.63 7.60 40.51 Responses to prelude questions EXCLUDED 21.25 8.10 8.02 43.37 9.623 < .001 1.675 TSST All Responses 13.04 5.50 4.27 28.00 Relationship between holistic rating scores and CAF measures To examine the relationship between the holistic rating scores and each CAF measure, we conducted correlation analyses for both the SST and TSST. We then conducted multiple regression analyses using the holistic rating score as the dependent variable and the CAF measures as the predictor variables. Based on the correlation analyses, repair fluency was excluded because of the lack of significant positive correlations with the holistic rating score. The results are presented in Tables 5 and 6. Table 5 Correlations between holistic rating scores and CAF measures in SST and TSST SST [SC] [A] [F1] [F2] Level .578*** .550*** .740*** −.279 [SC] .343 .468** .061 [A] .281 −.197 [F1] −.324 TSST [SC] [A] [F1] [F2] Level .606*** .610*** .716*** −.065 [SC] .667*** .427* .110 [A] .369* −.025 [F1] −.155 Note : *** p < .001, ** p < .01, * p < .05 Table 6 Multiple regression analyses for SST and TSST B SE 95%CI β Lower Upper SST [SC] 1.515+ 0.852 −0.228 3.257 .209 [A] 2.707** 0.904 0.858 4.557 .324 [F1] 0.030*** 0.006 0.017 0.043 .551 R 2 = .706 Adj R 2 = .676 TSST [SC] 0.740 0.591 −0.469 1.949 .186 [A] 2.241+ 1.120 −0.050 4.532 .290 [F1] 0.026*** 0.006 0.014 0.038 .530 R 2 = .669 Adj R 2 = .634 Note : *** p < .001, ** p < .01, + p < .10 Psychological factors To confirm internal consistency, we calculated Cronbach’s alpha coefficients for each of the four scales of both the SST and TSST. However, we did not obtain values as high as those reported by Zhou (2012), and our alpha coefficients ranged from 0.345 to 0.719. Therefore, we calculated means for each item for both test modes and performed paired t tests for each item. Table 7 presents descriptive statistics and the results of the t tests. Table 7 Psychological factors associated with SST and TSST Scale Item SST TSST t p d M SD M SD Test-taking anxiety 1 I felt nervous before the SST/TSST. 4.12 0.86 3.85 1.06 1.47 .152 0.26 2 I felt nervous when I was taking the SST/TSST. 4.03 1.19 3.67 1.27 1.40 .172 0.24 3 I would have performed better if I had not got nervous. 2.91 1.33 2.94 1.22 −0.14 .889 −0.02 Liking for the test 4 I like the format of the SST/TSST. 4.30 0.77 2.64 1.19 7.42 <.001 1.29 5 The SST/TSST was interesting. 4.45 0.56 3.39 1.22 4.53 <.001 0.79 6 Taking the SST/TSST was not a pleasant experience. (R) 1.76 0.90 2.64 1.11 −4.54 <.001 −0.79 7 I am used to the format of the SST/TSST. 2.76 1.32 1.76 0.71 4.20 <.001 0.73 Perceptions of test difficulty 8 I believe I did well on the tasks. (R) 2.55 1.20 1.82 0.77 3.81 <.001 0.66 9 I felt confident when I did the SST/TSST. (R) 2.21 1.02 1.97 0.73 1.35 .187 0.24 10 I felt the SST/TSST was difficult. 3.85 0.80 4.00 0.66 −0.9 .377 −0.16 11 I would have performed better if the format of the SST/TSST had been different. 2.09 0.91 3.03 1.10 −4.84 <.001 −0.84 12 I would have performed better if I had known the topics of the test tasks better. 3.42 1.15 3.55 1.15 −0.61 .545 −0.11 Perceptions of test validity 13 I believe the format and the content of the SST/TSST was fair. 4.18 0.92 3.85 1.03 1.46 .155 0.25 14 I believe I had enough opportunity to show my ability to speak English. 4.06 0.93 3.58 1.15 2.48 .018 0.43 15 The SST/TSST reflects accurately how well I speak English. 4.18 1.01 3.73 1.13 1.90 .066 0.33 Washback effects The open-ended question for the SST included 2,407 total words, while the TSST included 1,880. The experience of taking the SST and TSST seems to have led to different opinions about how these tests would influence test-takers’ attitudes towards learning English, their learning behavior, and their motivation. Below are sample comments for the SST translated into English: If a test with such a format were introduced into examinations, I believe that school education would shift towards a focus on conversation, truly providing students with the opportunity to learn English. I think that when people realize that English is just a language and that its fundamental nature is to communicate with others, they will focus more on learning it for that purpose. This shift will not only help them pass university entrance examinations but also enable them to acquire English skills that are useful in everyday life. I think knowing that their English is understood can be motivating for those who are not very good at English. When I actually tried it, I became flustered and mixed up words and tenses that I usually understood, so I think if it were introduced, it would change the way I focused on learning. Not only that, but if we were to prepare for something like SST, I think we would develop more practical skills that are different from the grammar, vocabulary, listening, and reading we are currently learning. So, I think it should be implemented. The following are examples of feedback on the TSST, translated into English: In the case of the TSST, because it is not face-to-face, examinees tend to focus more on accuracy than on conversations with facial expressions. While I felt that emotional expressions were easier to convey on the SST, I was less nervous on the TSST. Similar to the SST, it is necessary to focus on learning speaking skills and developing the ability to organize thoughts. I believe that the TSST imposes a greater burden on test-takers, leading them to spend more time studying English. I don’t think that this has a particular impact. With computer-based processing, even if you have more to say, your answer forcibly ends when your time is up. Because you cannot see the other person’s expressions and do not know whether your message has been conveyed, it is difficult to know specifically what you need to improve. Unlike SST, TSST has predetermined questions, and the conversation does not delve deeper into what I am saying. Therefore, I think it’s becoming common to believe that preparing a rough template in advance is sufficient to complete the test. When comparing test-takers’ attitudes towards learning English, their learning behavior, and motivation between the videoconferencing SST and semi-direct TSST modes, distinct trends emerged. The SST was perceived to enhance practical speaking abilities and real-time conversational skills, leading to increased student engagement and motivation. Participants expect that this mode will motivate them to actively seek out opportunities for conversation, thereby enhancing “usable” English proficiency beyond exam preparation. Conversely, the TSST, which is perceived as lacking interactive communication, prompts a strategic and accuracy-focused approach to speaking practice. It may reduce anxiety by eliminating the dynamics of human interaction, but it may also lead to a formulaic study approach, with less emphasis on responsiveness and adaptability. The mode is seen as less stressful and more straightforward for some learners, yet potentially less effective in cultivating comprehensive communication skills. Discussion Holistic ratings and analytical measures (RQ1) Our analysis revealed a significant difference in holistic rating scores between the SST and the TSST. Previous research has suggested equivalence between face-to-face and semi-direct modes (Luoma, 1997; Kenyon & Tschirner, 2000; Shohamy, 1994; Stansfield & Kenyon, 1992; Zhou, 2012) and between face-to-face and videoconferencing modes (Clark & Hooshmand, 1992; Kim & Craig, 2012; Nakatsuhara et al., 2017). Therefore, similar results were expected in this study when comparing videoconferencing and semi-direct modes. However, differences were identified, although the effect size was not large ( d = 0.391). Using the same rating criteria, participants received higher scores in the semi-direct mode than in the videoconferencing mode. In terms of analytical measures, syntactic complexity was superior in the semi-direct mode, whereas accuracy, speed fluency, and repair fluency were higher in the videoconferencing mode. These findings are consistent with Alzahrani (2020), who observed similar outcomes when comparing face-to-face and semi-direct modes. Although Alzahrani suggested that non-native speakers might avoid grammatical mistakes in front of “native English speakers,” the first language of the examiners in the videoconferencing SST mode in this study was Japanese. Therefore, the trade-off effect provides a more convincing explanation for these findings. Specifically, constructing more syntactically complex sentences requires additional time to consider sentence structure, or the formulator 2 during the planning phase, potentially reducing fluency. Furthermore, the use of more complex sentences increases the likelihood of errors. Multiple regression analyses indicated that fluency was the most significant predictor of holistic ratings in both the SST and TSST. This trend was particularly pronounced in the TSST, where syntactic complexity was not a significant predictor, and accuracy was only partially significant. This finding is noteworthy given that test-takers in the TSST generally produced more syntactically complex utterances, but these did not contribute significantly to holistic ratings. Given that examinees’ fluency was generally lower in the TSST, one possible explanation is that fluency significantly influenced holistic ratings up to a certain level. Other measures, such as syntactic complexity and accuracy, only become meaningful when fluency is sufficiently high, particularly because fluency is crucial for the volume of content in utterances under time constraints. However, the number of participants in this study was not large enough for reliable correlational analysis; the confidence intervals were relatively wide, so firm conclusions cannot be drawn. Psychological factors and washback effects (RQ2) The questionnaire results indicated that participants generally exhibited a strong preference for videoconferencing over the semi-direct mode. This aligns with most previous studies (e.g., James, 1988; Kiddle & Kormos, 2011; McNamara, 1987; Shohamy et al., 1993; Stansfield et al., 1990; Qian, 2009), which have consistently found that test-takers prefer to interact with their interlocutors, whether face-to-face or via videoconferencing (Du & Zhang, 2022). Several attempts have been made to explain this preference through anxiety, yielding mixed opinions. Some studies suggest that the presence of examiners alleviates examinees’ anxiety (Glassen & Devine, 2023; Qian, 2009), while others argue the opposite (Song, 2014). Although some participants in this study agreed with Song’s view, the quantitative analysis did not reveal significantly different levels of anxiety between the two testing modes. In terms of perceptions of test difficulty, the results showed that test-takers felt they were better able to demonstrate their abilities in videoconferencing mode. This is reflected in the higher mean scores for item 8, “I believe I did well on the tasks,” for the SST, and item 11, “I would have performed better if the format of the SST/TSST had been different” for the TSST. Additionally, participants felt that the videoconferencing mode tested their speaking ability more fairly, as evidenced by a higher mean score on item 14, “I believe I had enough opportunity to show my ability to speak English.” Participants’ comments provide further insight into these results. They generally felt that videoconferencing facilitated interaction between examinees and examiners, creating a natural conversational environment that reflected real-life communication. They also mentioned the presence of responses such as nodding, facial expressions, and short responses, which provided feedback on whether their utterances made sense or successfully conveyed their intended meaning. In contrast, such reactions were absent in the TSST semi-direct mode, leading participants to feel that the structuring output required careful planning and more complex sentence structures. They also noted time constraints that seemed to necessitate well-organized speech. These comments align with the results showing that syntactic complexity in speech was better in the semi-direct mode, whereas fluency and accuracy were better in the videoconferencing mode. Interestingly, the holistic rating score was higher in the semi-direct mode, which contradicts Zhou’s (2012) observation that examinees who perceived the test as less difficult tended to achieve higher scores. These findings indicate that, while examinees generally prefer and feel more confident in the direct mode, believing it to be more valid than the semi-direct mode, they tend to produce more complex outputs in the latter. This emphasis on complexity, at the expense of fluency and accuracy, resulted in higher holistic rating scores in the semi-direct mode. Even more intriguing, although fluency was the primary and significant predictor of holistic rating scores in the semi-direct mode, fluency levels were generally lower in this mode. Several factors may have contributed to these results. Zhou (2015a) suggested that raters might assign higher scores to candidates who are perceived as disadvantaged by needing to speak to a machine. This hypothesis warrants further investigation into rater behavior and the psychological factors that influence rater assessments in semi-direct testing environments. Moreover, the comparison between face-to-face and semi-direct modes differs significantly from that between videoconferencing and semi-direct modes. Although many studies (e.g., Clark & Hooshmand, 1992; Kim & Craig, 2012; Lee, 2023; Nakatsuhara et al., 2017) have reported equivalent ratings between face-to-face and videoconferencing modes, Mullooly and Glasson (2023) and Nakatsuhara et al. (2017) observed notable differences in the functional outputs of test-takers. Therefore, simultaneous comparative analyses across face-to-face, videoconferencing, and semi-direct testing modes must be conducted to gain a more precise and comprehensive understanding. Washback effects show distinct differences between videoconferencing and semi-direct modes. Examinees’ comments suggest that while both testing modes generally encourage learners to improve their speaking skills, the videoconferencing mode is particularly favored for promoting conversational practice, as test-takers perceive it as a more natural communicative setting. This aligns with Brooks and Swain (2015), Fan (2014), Kiddle and Kormos (2011), and Qian (2009), who found that a lack of interaction in computer-delivered tests contributes to negative perceptions. The semi-direct mode, which is perceived as structured and less interactive, fosters a strategic and accuracy-focused speaking approach. This is likely to lead learners to focus on practicing specific topics within set time limits, which some find challenging. What kind of learning do learners think each mode promotes? Specifically, what types of beliefs would each mode generate and how would these beliefs shape learning strategies? Responses on the questionnaire suggest that if videoconferencing testing were introduced, learners would emphasize the importance of real-life use, fostering a desire to seek more opportunities for conversation. Conversely, if semi-direct testing were introduced, learners would prioritize accuracy and strategic learning, such as organizing thoughts in a short time and memorizing rough templates for expected questions in advance. Although our discussion generally suggests that the direct videoconferencing mode is more encouraging, the semi-direct mode appears to foster more specific learning behaviors. The semi-direct mode may provide clearer instructions about what learners should do. However, as Murayama (2006) cautions, “test formats” can favor certain learning strategies over true effectiveness. He demonstrated that some learners believe in the effectiveness of rote learning, even when presented with multiple-choice questions requiring deep understanding. Therefore, in the context of preparing for a speaking test, formulaic learning strategies based on students’ beliefs about the effectiveness of the semi-direct testing mode might not be as effective or might be equally useful for the direct videoconferencing mode. The types of beliefs formed and learning strategies fostered by these different testing modes should be considered, to ensure that learners maintain their belief that these strategies truly fit each mode. If learners find that their strategies do not fit the mode, their beliefs and learning strategies may change. Therefore, long-term observations are necessary. Conclusion and implications Our results indicate that examinees received better holistic rating scores in the semi-direct mode than in the videoconferencing mode. Analytical measures revealed that the semi-direct mode exhibits superior syntactic complexity at the expense of accuracy and fluency. However, fluency was the most significant predictor of holistic rating scores, a tendency that was particularly pronounced in the semi-direct mode. Regarding psychological factors, despite better performance evaluations in the semi-direct mode, examinees strongly preferred the direct videoconferencing mode. They exhibited greater confidence in their performance and a sense of demonstrating their full abilities. A key factor appears to be the presence of interaction with an interlocutor, which examinees perceived as more engaging and responsive. They noted that real-time feedback—such as backchannels, nods, and facial expressions—helped them feel understood. Additionally, they believed that the videoconferencing mode fairly and accurately tested their speaking ability. Regarding washback effects, participants felt that the videoconferencing mode fostered better learning behaviors by encouraging active engagement and spontaneous speech. In contrast, they associated the semi-direct mode with more concrete and preparatory strategies, such as focusing on grammatical accuracy and mentally preparing rough templates before responding. These differing tendencies suggest that each test mode may promote distinct learning orientations: the direct mode supports real-time communication skills, while the semi-direct mode encourages planned production. This distinction has implications for language instruction and assessment design. Educators and test developers should consider how task design influences learner behavior and ensure that assessments elicit the kinds of language use they aim to measure. This study had several methodological limitations. First, our sample was skewed, as most participants were female. Therefore, caution is needed when generalizing the results. Second, the sample size was not sufficiently large for correlation-based analyses, resulting in wider confidence intervals in the multiple regression analyses. Third, the scales for the psychological factors used in this study did not have sufficient internal consistency or replicate Zhou’s (2012) factor structures. Future research should closely examine and improve the constructs of each scale using larger and more diverse participant samples. Fourth, the analytical measures did not show the same relationships as in previous studies (e.g., Koizumi & In’nami, 2014). Speed fluency did not correlate positively with repair fluency. Future studies should further investigate the construct of CAF measures. Finally, the results on washback effects were based on participants’ feedback comments collected immediately after taking both tests, reflecting a single point in time. For more precise and detailed insights, longitudinal studies are required to understand how learners’ perceptions of the tests result in the formation of beliefs and learning strategies over time. Changes in beliefs and learning strategies over time and repeated test experiences should be considered. Abbreviations CAF Complexity, acccuracy and fluency ACTFL American Council on the Teaching of Foreign Languages IELTS International English Language Testing System OPI Oral Proficiency Interview SOPI Simulated Oral Proficiency Interview TOEFL iBT Test of English as a Foreign Language Internet-based Test SST Standard Speaking Test TSST Telephone Standard Speaking Test CEFR Common European Framework of Reference for Languages RQ1 Research question 1 RQ2 Research question 2 A Accuracy F1 Speed fluency F2 Repair fluency AS-unit Analysis of speech unit Declarations Ethics approval and consent to participate All study participants provided informed consent, and the study design was approved by the Hiroshima Jogakuin University Research Ethics Review Committee (approval number: 2019-14). Consent for publication Not applicable. Funding This work was supported by the ALC Language Education Research Support Program of ALC Press Inc., awarded to the first and second authors. ALC Press Inc. also granted permission to compare the two tests, both of which were developed and administered by the company. Author Contribution KS: conceptualization; data curation; formal analysis; funding acquisition, investigation; methodology; project administration; resources; software; validation; visualization; writing, original draft; writing, review and editing. RM: funding acquisition; project administration; resources; software; writing, review and editing. Acknowledgement We are grateful to all the participants for their contributions to this study. We would also like to offer special thanks to Saki Onchi, Kayo Shimizu, Nene Nagashima, and Asuna Ueno for helping us with data entry. Data Availability The datasets generated and analyzed during the current study are not publicly available due to the test provider’s operational policy and the protection of examinees’ and examiners’ personal information, but are available from the corresponding author on reasonable request. References ACTFL-ALC Press. (2000). SST standard speaking test manual . ACTFL-ALC Press. ALC EDUCATION INC. (2025). Level descriptions and professional applications . https://tsst.alc.co.jp/biz/en/level/. Accessed 4 June 2025. Alzahrani, N. A. (2020). A comparative study of oral proficiency in direct (OPI) and semi-direct (VOCI) testing modes: Measures of complexity, accuracy, and fluency (Publication No. 27834926) [Doctoral dissertation, Oklahoma State University]. ProQuest Dissertations Publishing. https://hdl.handle.net/11244/325439 Baba, S. (2019). How to produce beneficial washback effect by using high-stakes testing? Proposal from educational psychology. JLTA Journal, 22 , 44–64. https://doi.org/10.20622/jltajournal.22.0_44 Brooks, L., & Swain, M. (2015). Students’ voices: The challenge of measuring speaking for academic contexts. In B. Spolsky, O. Inbar, & M. Tannenbaum (Eds.), Challenges for language education and policy: Making space for people (pp. 65-80). Routledge. Brown, A. (1993). The role of test-taker feedback in the development process: Test takers’ reactions to a tape-mediated test of proficiency in spoken Japanese. Language Testing , 10 , 277–304. Clark, J. L. D. (1979). Direct versus semi-direct tests of speaking proficiency. In E. J. Briere & F. B. Hinofotis (Eds.), Concepts in language testing: Some recent studies (pp. 35–49). TESOL. Clark, J. L. D., & Hooshmand, D. (1992). “Screen-to-screen” testing: An exploratory study of oral proficiency interviewing using video conferencing. System , 20 (3), 293–304. https://doi.org/10.1016/0346-251X(92)90041-Z Du, Y., & Zhang, F. (2022). Examinees’ affective preference for online speaking assessment: Synchronous vs asynchronous. Chinese Language Teaching Methodology and Technology , 5 (1), 29–46. https://engagedscholarship.csuohio.edu/cltmt/vol5/iss1/3 Elder, C., Iwashita, N., & McNamara, T. (2002). Estimating the difficulty of oral proficiency tasks: What does the test-taker have to offer? Language Testing, 19 (4), 347–368. https://doi.org/10.1191/0265532202lt235oa Fan, J. (2014). Chinese test takers’ attitudes towards the Versant English test: A mixed-methods approach. Language Testing in Asia, 4 (1), 1–17. https://doi.org/10.1186/s40468-014-0006-9 Fan, J., & Ji, P. (2014). Test candidates’ attitudes and their test performance: The case of the Fudan English test. University of Sydney Papers in TESOL , 9 , 1–35. http://faculty.edfac.usyd.edu.au/projects/usp_in_tesol/pdf/volume09/Article01.pdf Foster, P., Tonkyn, A., & Wigglesworth, G. (2000). Measuring spoken language: A unit for all reasons. Applied Linguistics, 21 , 354–375. https://doi.org/10.1093/applin/21.3.354 Galaczi, E. D., & Taylor, L. (2018). Interactional competence: Conceptualisations, operationalisations, and outstanding questions. Language Assessment Quarterly, 15 (3), 219–236. https://doi.org/10.1080/15434303.2018.1453816 Glasson, N. (2022). Is the devil you know better? Testwiseness and eliciting evidence of interactional competence in familiar versus unfamiliar triadic speaking tasks. Studies in Language Assessment, 11 (2), 58–97. https://doi.org/10.58379/ttfe6660 Glasson, N., & Devine, A. (2023). The eye of the stakeholder: perceptions of remote speaking. In J. Savage, E. Galaczi & H.-W. Lee (Eds.) Research Notes Issue 86 (pp. 53–69. Cambridge University Press & Assessment. https://www.cambridgeenglish.org/english-research-group/published-research/research-notes/ Housen, A., & Kuiken, F. (2009). Complexity, accuracy, and fluency in second language acquisition. Applied Linguistics , 30 , 461–473. https://doi.org/10.1093/applin/amp048 Hughes, A. (2003). Testing for language teachers (2nd ed.). Cambridge University Press. Isahara, H., Uchimoto, K., & Izumi, E. (2004). Nihon jin 1200 nin no eigo speaking corpus [English speaking corpus of 1200 Japanese]. ALC Press. James, G. (1988). Development of an oral proficiency component in a test of English for academic purposes. In A. Hughes (Ed.), Testing English for university study (ELT Documents 127) (pp. 111–133). Modern English Publications and the British Council. Kenyon, D. M., & Tschirner, E. (2000). The rating of direct and semi-direct Oral Proficiency Interviews: Comparing performance at lower proficiency levels. The Modern Language Journal , 84 (1), 85–101. https://doi.org/10.1111/0026-7902.00054 Kiddle, T., & Kormos, J. (2011). The effect of mode of response on a semi-direct test of oral proficiency. Language Assessment Quarterly , 8 (4), 342–360. https://doi.org/10.1080/15434303.2011.613503 Kim, J., & Craig, D. A. (2012). Validation of a videoconferenced speaking test. Computer Assisted Language Learning , 25 (3), 257–275. https://doi.org/10.1080/09588221.2011.649482 Koizumi, R., & In’nami, Y. (2014). Modeling complexity, accuracy, and fluency of Japanese learners of English: A structural equation modeling approach. JALT Journal , 36 (1), 25–42. https://doi.org/10.37546/JALTJJ36.1-2 Khabbazbashi, N., & Galaczi, E. D. (2020). A comparison of holistic, analytic, and part marking models in speaking assessment. Language Testing, 37 (3), 333–360. https://doi.org/10.1177/0265532219881360 Krashen, S. (1985). The input hypothesis: Issues and implications . Longman. Lado, R. (1961). Language testing . Longman. Lam, D. M. K. (2015). Contriving authentic interaction: Task implementation and engagement in school-based speaking assessment in Hong Kong. In G. Yu & Y. Jin (Eds.), Assessing Chinese learners of English: Language constructs, consequences and conundrums (pp. 38–60). Palgrave Macmillan. Lee, H. (2023). Looking into an innovative test mode in paired speaking from the perspective of scores. In J. Savage, E. Galaczi, & H.-W. Lee (Eds.) Research Notes Issue 86 (pp. 13–32). Cambridge University Press & Assessment. https://www.cambridgeenglish.org/english-research-group/published-research/research-notes/ Levelt, W. J. M. (1989). Speaking: From intention to articulation . MIT Press. Luk, J. (2010). Talking to score: Impression management in L2 oral assessment and the co-construction of a test discourse genre. Language Assessment Quarterly , 7 (1), 25–53. https://doi.org/10.1080/15434300903473997 Luoma, S. (1997). Comparability of a tape-mediated and a face-to-face test of speaking: A triangulation study [Unpublished Licentiate theses]. University of Jyvaskyla, Finland. https://jyx.jyu.fi/handle/123456789/11733 Luoma, S. (2004). Assessing speaking . Cambridge University Press. McNamara, T. F. (1987). Assessing the language proficiency of health professionals: Recommendations for the reform of the Occupational English Test . A report submitted to the Council on Overseas Professional Qualifications. University of Melbourne, Department of Russian and Language Studies. Mullooly, J., & Glasson, N. (2023). Functional differences across modes in speaking performance. Cambridge English Research Notes, 86 , 45–54. https://www.cambridgeenglish.org/Images/702820-research-notes-86.pdf Murayama, K. (2006). Adaptation to the test: A review of problems and perspectives. The Japanese Journal of Educational Psychology, 54 (2), 265–279. https://doi.org/10.5926/jjep1953.54.2_265 Nakatsuhara, F., Inoue, C., Berry, V., & Galaczi, E. (2017). Exploring the use of videoconferencing technology in the assessment of spoken language: A mixed-methods study. Language Assessment Quarterly , 14( 1), 1–18. https://doi.org/10.1080/15434303.2016.1263637 Nakatsuhara, F., Inoue, C., Berry, V., & Galaczi, E. (2021). Video-conferencing speaking tests: Do they measure the same construct as face-to-face tests? Assessment in Education: Principles, Policy & Practice , 28 (4), 369–388. https://doi.org/10.1080/0969594X.2021.1951163 O’Loughlin, K. J. (1995). Lexical density in candidate output on direct and semi-direct versions of an oral proficiency test. Language Testing , 12 , 217–237. https://doi.org/10.1177/026553229501200205 O’Loughlin, K. J. (1997). Direct and semi-direct tests of spoken language (Unpublished doctoral thesis). University of Melbourne. O’Loughlin, K. J. (2001). The equivalence of direct and semi-direct speaking tests . Cambridge University Press. Qian, D. D. (2009). Comparing direct and semi-direct modes for speaking assessment: Affective effects on test takers. Language Assessment Quarterly , 6 (2), 113–125. https://doi.org/10.1080/15434300902800059 Roever, C., & Ikeda, N. (2022). What scores from monologic speaking tests can(not) tell us about interactional competence. Language Testing , 39 (1), 7–29. https://doi.org/10.1177/02655322211003332 Roever, C., & Kasper, G. (2018). Speaking in turns and sequences: Interactional competence as a target construct in testing speaking. Language Testing, 35 (3), 331–355. https://doi.org/10.1177/0265532218758128 Seuren, L. M., Wherton, J., Greenhalgh, T., & Shaw, S. E. (2021). Whose turn is it anyway? Latency and the organization of turn-taking in video-mediated interaction. Journal of Pragmatics, 172 , 63–78. https://doi.org/10.1016/j.pragma.2020.11.005 Shohamy, E. (1994). The validity of direct versus semi-direct oral tests. Language Testing , 11 (2), 99–123. https://doi.org/10.1177/026553229401100202 Shohamy, E. (1998). Alternative assessment in language testing: Applying a multiplism approach. In E. Li & G. James (Eds.), Testing and evaluation in second language education (pp. 99-114). HKUST Language Centre. http://lc.ust.hk/~center/workingpaper.html Shohamy, E., Donitze-Schmidt, S., & Waizer, R. (1993). The effect of the elicitation method on the language samples obtained on oral tests. Paper presented at the Annual Language Testing Colloquium, Cambridge, UK. Stansfield, C. W., & Kenyon, D. M. (1992). Research on the comparability of the oral proficiency interview and the simulated oral proficiency interview. System , 20 (3), 347–364. https://doi.org/10.1016/0346-251X(92)90045-5 Stansfield, C. W., Kenyon, D. M., Paiva, R., Doyle, F., Ulsh, I., & Cowles, M. A. (1990). The development and validation of the Portuguese speaking test. Hispania , 73 (3), 641–651. https://doi.org/10.2307/343942 Song, J. (2014). A study of ESL students’ performance and perceptions in face-to-face and virtual-world group oral tests [Unpublished doctoral thesis]. The University of Texas at Austin. https://doi.org/10.1111/0026-7902.00054 Van Moere, A. (2012). A psycholinguistic approach to oral language assessment. Language Testing, 29 (3), 325–344. https://doi.org/10.1177/0265532211424478 Yan, X., & Staples, S. (2023). Investigating the cognitive and social aspects of IELTS speaking performances across proficiency levels: Comparing the CAF-based and register-linguistic analyses (IELTS Research Reports Online Series, No. 2023/3). IELTS Partners. https://ielts.org/researchers/our-research/research-reports/investigating-the-cognitive-and-social-aspects-of-ielts-speaking-performances-across-proficiency-levels-comparing-the-caf-based-and-register-linguistic-analyses Zhang, L., & Jin, Y. (2021). Assessing interactional competence in the computer-based CET-SET: An investigation of the use of communication strategies. Assessment in Education: Principles, Policy & Practice, 28 (1), 1–22. https://doi.org/10.1080/0969594X.2021.1976107 Zhou, Y. (2012). Test-takers’ affective reactions to a computer-delivered speaking test and their test performance. In M. Minegishi, O. Hieda, E. Hayatsu, & Y. Kawaguchi (Eds.), Working papers in corpus-based linguistics and language education, No. 9 (pp. 295–310). Tokyo University of Foreign Studies. http://cblle.tufs.ac.jp/assets/files/publications/working_papers_09/section/295-310.pdf Zhou, Y. (2015a). Comparing ratings of a face-to-face and telephone-mediated speaking test. JACET Journal , 59 , 33–52. https://dl.ndl.go.jp/pid/10501826/1/1 Zhou, Y. (2015b). Computer-delivered or face-to-face: Effects of delivery mode on the testing of second language speaking. Language Testing in Asia , 5 (2), 1–16. https://doi.org/10.1186/s40468-014-0012-y Zhou, Y., & Yoshitomi, A. (2019). Test-taker perception of and test performance on computer-delivered speaking tests: The mediational role of test-taking motivation. Language Testing in Asia , 9 (10), 1–19. https://doi.org/10.1186/s40468-019-0086-7 Footnotes The AS-unit (Analysis of speech unit) is an augmented version of the T-unit, an independent clause, or a dependent clause connected to or embedded in an independent clause (Foster et al., 2000 ). Examples of one T-unit are: (a) “I like birds,” (b) “I liked the movie we saw yesterday,” and (c) “If it rains tomorrow, I will go to see a movie.” The AS-unit builds on the T-unit, including independent phrases that do not contain verbs, such as (d) “At the museum.” Examples (a) to (d) all contain one AS-unit, whereas (e) “I have a bird and its name is Pupu,” contains two AS-units because it has two independent clauses connected by the coordinating conjunction “and.” According to Levelt ( 1989 ), the conceptualizer is responsible for generating the communicative intention and encoding it into some kind of coherent conceptual plan. In the next component, the formulator , the organized preverbal plan activates the items in the lexicon that best correspond to the different chunks of the intended message, which will, in turn, be responsible for transforming it into a linguistic structure. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7530331","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":514419835,"identity":"0d32020d-6043-4bc2-a7e3-7a0ea602a953","order_by":0,"name":"Koki Sekitani","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABJklEQVRIie2QMUvDQBSAXymcy9WsKUjiTzi5VelfuRBIl4hrhwyBQrKUumYQ/QudnF95cF3i3sFBEeoal+Jgi9fWBodEO4rkO7h7vHsf790BNDT8SRig2blZbVCDXQ6/7lStgqWSi8OUXcVGaSXi97HcNPGoiB5PgBOJ59sPR8yGU+IRuFYMi6cKReQaEfWCQyfxlXcvpMko4hrOMoR+VVNh92NERhwsLtEo3mQeCgoZtCYAgV012N2rUdZ75WajXBUUrqFXp8CcIU4To3RGUnnxtgvQZQJenSLyQOHDmDjj2hdKS9nNtaDV2PYzqn6Lm2pZDJbUs7hP3ffIcY5nw5e3bHl+cZ2OgqofK2H74BS3hxmpzYOfjG994zI80ocpDQ0NDf+cT/vAbMUPbc9dAAAAAElFTkSuQmCC","orcid":"","institution":"Toyo Eiwa University","correspondingAuthor":true,"prefix":"","firstName":"Koki","middleName":"","lastName":"Sekitani","suffix":""},{"id":514419836,"identity":"2254e46b-c75a-4b6b-ab2e-a1c5a253083d","order_by":1,"name":"Ryotaro Mitsuta","email":"","orcid":"","institution":"The United Graduate School of Education, Tokyo Gakugei University","correspondingAuthor":false,"prefix":"","firstName":"Ryotaro","middleName":"","lastName":"Mitsuta","suffix":""}],"badges":[],"createdAt":"2025-09-03 21:38:04","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7530331/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7530331/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":91832378,"identity":"74fe8648-d7c7-4c5e-854e-8ee99b3ac365","added_by":"auto","created_at":"2025-09-22 09:19:47","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":99180,"visible":true,"origin":"","legend":"","description":"","filename":"AnonymizedManuscript.docx","url":"https://assets-eu.researchsquare.com/files/rs-7530331/v1/940af0e6e83d0e627a8ed0a4.docx"},{"id":91832264,"identity":"b5a7c6b6-ca49-4f8e-bfc3-84374bebde81","added_by":"auto","created_at":"2025-09-22 09:19:01","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4128,"visible":true,"origin":"","legend":"","description":"","filename":"63af2f94c208453eaa2298c62eeb1a80.json","url":"https://assets-eu.researchsquare.com/files/rs-7530331/v1/0a3af2bfc9550f0019c66fbb.json"},{"id":91832267,"identity":"4f8cbd0b-7926-48b7-9720-cff2d0cdb3fb","added_by":"auto","created_at":"2025-09-22 09:19:01","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":176508,"visible":true,"origin":"","legend":"","description":"","filename":"63af2f94c208453eaa2298c62eeb1a801enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7530331/v1/3627c4467e5f59b3ab81c11e.xml"},{"id":91832266,"identity":"b22f7252-7c52-469d-8e03-9e1c91a81118","added_by":"auto","created_at":"2025-09-22 09:19:01","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":10058,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7530331/v1/5e763eb4447ab727d8c0b6b5.png"},{"id":91832265,"identity":"775869ee-2687-4050-9cb6-00ac271605eb","added_by":"auto","created_at":"2025-09-22 09:19:01","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":8610,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7530331/v1/0b8920095c3eafe4f102d376.png"},{"id":91834694,"identity":"aca4f2f1-f6b2-4255-960b-bc7ffb2fa8f4","added_by":"auto","created_at":"2025-09-22 09:27:01","extension":"xml","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":173445,"visible":true,"origin":"","legend":"","description":"","filename":"63af2f94c208453eaa2298c62eeb1a801structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7530331/v1/b61722649038590c1ca93696.xml"},{"id":91836043,"identity":"a4caa55f-0023-4d62-a58a-97da50874541","added_by":"auto","created_at":"2025-09-22 09:35:01","extension":"html","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":188676,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7530331/v1/2c42b5f8ed4985ba7af38473.html"},{"id":91832263,"identity":"32dcd8b0-183b-4aab-b286-06a20bfebd04","added_by":"auto","created_at":"2025-09-22 09:19:01","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":24238,"visible":true,"origin":"","legend":"\u003cp\u003eModes of speaking tests\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7530331/v1/fc498c41cc76a37c04e2e01f.png"},{"id":94986306,"identity":"0b0918ea-0ea2-47f4-b537-0263237dec22","added_by":"auto","created_at":"2025-11-03 07:00:10","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1309975,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7530331/v1/19726345-9f07-4989-b194-855451a1adcd.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Comparing Videoconferencing and Human-to-Machine Modes in Speaking Assessment: Holistic Ratings, Analytical Measures, Psychological Factors, and Washback Effects","fulltext":[{"header":"Introduction","content":"\u003cp\u003eThe integration of technology into language testing has accelerated rapidly in recent years, especially with the growing demand for remote testing solutions during the COVID-19 pandemic (Nakatsuhara et al., 2021). Many English proficiency tests are now offered in computer-based formats or have fully transitioned to digital delivery (Alzahrani, 2020; Qian, 2009). Researchers have examined the comparability of speaking tests across different modes of delivery, including face-to-face, videoconferencing, and human-to-machine (semi-direct) formats. For instance, studies have found general score equivalence between face-to-face and semi-direct tests such as the ACTFL OPI and SOPI (e.g., Kenyon \u0026amp; Tschirner, 2000; Stansfield \u0026amp; Kenyon, 1992), as well as between videoconferencing and face-to-face IELTS interviews (Nakatsuhara et al., 2017, 2021). Mullooly and Glasson (2023), analyzing data from the same Cambridge research series, reported subtle functional differences across modes despite overall score similarity.\u003c/p\u003e\n\u003cp\u003eHowever, while mode equivalence has been explored in selected pairwise comparisons, direct empirical comparisons between videoconferencing and semi-direct speaking tests remain rare. Additionally, language output is known to be sensitive not only to delivery mode but also to task type (e.g., storytelling vs. question-based tasks), which is often confounded with mode (Glasson, 2022; Zhang \u0026amp; Jin, 2021). Prior research has shown that task familiarity and interactivity shape both linguistic complexity and interactional competence, especially when test takers encounter unfamiliar or less scaffolded formats (Glasson, 2022).\u003c/p\u003e\n\u003cp\u003eFurthermore, differences in mode can influence interactional dynamics. For example, latency in videoconferencing can hinder natural turn-taking, potentially affecting discourse features and performance judgments (Seuren et al., 2021). These findings suggest that delivery mode and task type jointly shape the speaking construct being assessed, underscoring the need for systematic comparisons.\u003c/p\u003e\n\u003cp\u003eGiven the widespread adoption of both videoconferencing and semi-direct speaking assessments for large-scale testing and placement purposes, it is essential to examine their comparability not only in terms of scores but also in terms of language output and test-taker perceptions. This study therefore investigates two speaking tests in Japan\u0026mdash;the Standard Speaking Test (SST; videoconference-based) and Telephone Standard Speaking Test (TSST; semi-direct, telephone-based)\u0026mdash;to examine how mode and format influence holistic ratings, analytical measures, and test-takers\u0026rsquo; psychological and attitudinal responses.\u003c/p\u003e\n\u003ch2\u003eModes for speaking assessment\u003c/h2\u003e\n\u003cp\u003eSpeaking assessments can be delivered in three formats: direct, semi-direct, and indirect (Clark, 1979; O\u0026rsquo;Loughlin, 2001; Qian, 2009). In an indirect test, the examiner measures underlying skills related to speaking without eliciting actual speech. For example, Lado (1961) once proposed assessing pronunciation via a written test, a discrete-point technique typical of mid-20th-century language exams (Shohamy, 1998). Such methods are now largely considered outdated due to validity concerns (O\u0026rsquo;Loughlin, 2001). Hence, truly indirect speaking tests have fallen out of common use.\u003c/p\u003e\n\u003cp\u003eIn contrast, the direct mode requires the test-taker to speak with a live interlocutor, performing the target speaking skills in real time (Hughes, 2003). Typically, this takes the form of a face-to-face interview or conversation with an examiner (Luoma, 2004). Many educators consider the direct interview format to closely approximate real-life communication, although the interaction in a test setting is more structured than an ordinary conversation (van Lier, 1989; Glasson, 2022). Direct speaking tests have historically been viewed as the most authentic and valid means of assessing oral ability, often enjoying high face validity with both test-takers and score users. For instance, Clark (1979) argued that the face-to-face interview yields the truest measure of speaking proficiency. Moreover, because direct tests involve reciprocal interaction, they can engage the test-taker\u0026rsquo;s interactional competence\u0026mdash;the ability to co-construct discourse through turn-taking, topic management, feedback, and other interactive skills (Galaczi \u0026amp; Taylor, 2018). This interactional dimension is an important component of speaking proficiency that is not directly accessed by more mechanistic test formats.\u003c/p\u003e\n\u003cp\u003eSemi-direct testing delivers the prompts through an audio/video recording or computer program instead of a human interlocutor (Clark, 1979). In a semi-direct exam, test-takers respond to pre-recorded or on-screen prompts, and their spoken responses are recorded for later evaluation by raters. Many modern speaking tests use this format: for example, the TOEFL iBT Speaking section and Cambridge Linguaskill Speaking test both present tasks via computer and require candidates to speak into a microphone, after which the recordings are scored (either by human examiners or by automated systems). Semi-direct formats make it practical to administer speaking tests to large groups under standardized conditions. However, because the candidate is essentially speaking monologically (to a microphone or computer) rather than interacting with a person, this mode cannot capture the co-constructed nature of conversation. Important interactional features like real-time turn-taking or negotiation of meaning are absent in semi-direct tasks. This lack of a live interlocutor has raised questions about the authenticity of semi-direct tests and whether they fully elicit a test-taker\u0026rsquo;s communicative competence. Semi-direct speaking tests are nevertheless widely used in high-stakes contexts due to their logistical efficiency and scoring consistency.\u003c/p\u003e\n\u003cp\u003eRecently, videoconferencing has emerged as a popular medium for speaking assessments, blending aspects of direct and semi-direct modes. In a video-mediated test, the examiner and candidate engage in a real-time spoken interaction, but via webcams and microphones from different locations. This format grew rapidly during the COVID-19 pandemic as institutions sought remote testing solutions (Nakatsuhara et al., 2021). A videoconference interview preserves the live, synchronous dialogue of a direct face-to-face test while offering practical advantages such as reduced travel and easier scheduling (Nakatsuhara et al., 2017). In fact, we classify the videoconferencing format as a type of direct speaking test in our study, since it retains a person-to-person interaction (albeit through a screen) that allows for spontaneous back-and-forth exchange. Figure 1 illustrates how the videoconferencing format fits into the framework of speaking test modes alongside traditional in-person (direct) and semi-direct approaches. By enabling remote yet interactive oral exams, videoconferencing is expanding the reach of direct speaking assessment while still tapping into test-takers\u0026rsquo; interactional skills.\u003c/p\u003e\n\u003ch2\u003eHolistic ratings\u003c/h2\u003e\n\u003cp\u003ePrevious research on speaking assessment has distinguished between different test delivery modes, particularly semi-direct computer-delivered formats and videoconferencing interviews. These represent distinct points on the interaction continuum. The semi-direct format elicits monologic responses, aligning with psycholinguistic perspectives that emphasize individual processing (Van Moere, 2012). In contrast, the VC mode involves real-time dialogue, tapping socio-interactional competence, where shared understanding is co-constructed (Roever \u0026amp; Kasper, 2018).\u003c/p\u003e\n\u003cp\u003eWhile both modes are operationally common, the interactions are fundamentally different. The semi-direct mode lacks real-time response or negotiation of meaning, whereas VC interviews involve live interlocutors, enabling intersubjectivity. The presence or absence of interaction and non-verbal cues (e.g., facial expressions) may affect speaking performance and rating outcomes. Research has shown that test-takers achieve similar holistic scores across formats. Nakatsuhara et al. (2021) found no statistically significant differences in IELTS Speaking scores across face-to-face and computer-mediated modes, though test-takers could only ask for clarification in live interviews. However, even with equivalent scores, delivery mode factors such as latency or absence of listener feedback could influence raters\u0026rsquo; perceptions. For example, slight video delay in VC may disrupt turn-taking, while semi-direct tests lack responsiveness.\u003c/p\u003e\n\u003cp\u003eThe rating model also affects outcomes. Khabbazbashi and Galaczi (2020) showed that holistic scoring yielded different CEFR levels compared to analytic or part-by-part scoring, sometimes by 30\u0026ndash;50% of cases. Holistic scoring may overgeneralize performance, masking task-specific strengths. Thus, we used holistic ratings as they reflect operational practice, but we also examined other evidence.\u003c/p\u003e\n\u003ch2\u003eCAF measures\u003c/h2\u003e\n\u003cp\u003eIn addition to holistic scoring, prior studies have employed complexity, accuracy, and fluency (CAF) measures to provide more detailed insights into speaking performance. These objective indices capture features such as clause length, lexical diversity, error ratios, and pause time. They are not subject to rater interpretation and provide finer-grained insight into linguistic performance.\u003c/p\u003e\n\u003cp\u003eCAF helps identify whether modes elicit different language patterns. For instance, the monologic semi-direct mode may encourage longer utterances, while the interactive VC format might lead to more real-time adjustments or use of formulaic language. Differences would appear in CAF results.\u003c/p\u003e\n\u003cp\u003eCAF analysis also adds construct validity. Yan and Staples (2023), in an IELTS study, found that CAF measures distinguished proficiency levels and informed rater scale development. In our study, similar CAF profiles across modes would support construct comparability; differences may indicate unique demands in each mode.\u003c/p\u003e\n\u003cp\u003eThis dual approach\u0026mdash;holistic scores and CAF indices\u0026mdash;helps determine whether the presence of interaction in VC interviews affects language output or rating. It also addresses a research gap regarding how semi-direct and VC formats compare in both scoring and performance features.\u003c/p\u003e\n\u003ch2\u003ePsychological factors\u003c/h2\u003e\n\u003cp\u003eExaminees\u0026rsquo; psychological reactions to different speaking test modes have been investigated from various perspectives, including anxiety, perceived difficulty, and mode preference. Krashen\u0026rsquo;s (1985) Affective Filter Hypothesis suggests that negative emotional states such as anxiety can hinder language performance. In several studies, examinees were found to prefer face-to-face speaking tests over technology-mediated formats (e.g., James, 1988; Kiddle \u0026amp; Kormos, 2011; McNamara, 1987; Shohamy et al., 1993). Regarding comparisons between face-to-face and videoconferencing modes, Glasson and Devine (2023) found that examinees favored the face-to-face mode as they could express their English ability more effectively; however, the effect sizes in that study were small.\u003c/p\u003e\n\u003cp\u003eAnxiety has been proposed as a key reason for mode preference. Qian (2009) suggested that the physical presence of an interlocutor in face-to-face settings may help reduce psychological barriers. Du and Zhang (2022) argued that the affective benefits of interlocutor presence persist in videoconferencing modes, as indicated by positive affective responses. Conversely, Song (2014) noted that the absence of interlocutors in semi-direct modes may lower anxiety for some examinees. Glasson and Devine (2023) reported that examinees with lower proficiency levels tended to feel greater anxiety in face-to-face interactions.\u003c/p\u003e\n\u003cp\u003ePerceptions of test difficulty also vary by mode. Elder et al. (2002) emphasized that perceived task difficulty is multidimensional and correlates with performance outcomes. Zhou (2012) reported that examinees who viewed computer-based semi-direct tests as easier generally performed better. In contrast, Glasson and Devine (2023) found no significant differences in perceived difficulty between face-to-face and videoconferencing modes. However, their study did not use identical tasks across modes, which may have influenced the comparability of perceptions.\u003c/p\u003e\n\u003cp\u003eOverall, while test scores across modes may be comparable, differences in affective responses and perceived difficulty have been documented, suggesting that test mode can influence examinee experience.\u003c/p\u003e\n\u003ch2\u003eWashback effects\u003c/h2\u003e\n\u003cp\u003eLanguage tests can exert \u003cem\u003ewashback\u0026nbsp;\u003c/em\u003e\u003cem\u003eeffects\u003c/em\u003e on learning and teaching, as learners often adjust their behavior to meet test demands. Murayama (2006) calls this phenomenon \u0026ldquo;adaptation to the test,\u0026rdquo; noting it can lead to problematic study habits and even threaten test validity.\u003c/p\u003e\n\u003cp\u003eEmpirical research has examined washback effects in language testing (see Baba, 2019, for a review). In speaking assessments, this influence can lead examinees to adopt test-focused strategies that result in formulaic or contrived performances. For example, Luk (2010) observed that peer-group oral test discourse became \u0026ldquo;ritualized, contrived and colluded,\u0026rdquo; as candidates aimed to \u0026ldquo;maintain the impression of being effective interlocutors for scoring purposes rather than for authentic communication.\u0026rdquo; Similarly, Lam (2015) found that what passes as interaction in some speaking tests may actually be a \u003cem\u003ecanned\u003c/em\u003e performance pre-rehearsed during preparation. Such findings illustrate how test preparation and \u003cem\u003etestwiseness\u003c/em\u003e can distort the natural flow of communication, with examinees \u0026ldquo;talking to score\u0026rdquo; (i.e. prioritizing performance for the rating criteria) and even delivering rehearsed responses. Glasson (2022) further highlights that the issue is not \u003cem\u003ewhether\u003c/em\u003e assessment prompts contrived interaction, but how different task types modulate this effect. His conversation-analytic study showed that examinees performed markedly differently on a conventional face-to-face task versus a novel online task, raising questions about broadening assessment formats to better elicit interactional competence.\u003c/p\u003e\n\u003cp\u003eThese findings underscore a broader concern: speaking assessments, particularly semi-direct or monologic formats, may not fully capture interactional competence as defined in socio-interactional models (Roever \u0026amp; Ikeda, 2022). The direct (face-to-face and videoconferencing) and semi-direct modes rest on fundamentally different assumptions about what constitutes speaking ability\u0026mdash;individual production versus interactive negotiation. To better understand how these underlying constructs influence test performance and perception, comparing the modes directly is essential. Notably, no prior studies have directly compared videoconferencing and semi-direct modes, highlighting a critical research gap.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eStudy purpose\u003c/h2\u003e\n\u003cp\u003eThis study addresses the following research questions (RQs):\u003c/p\u003e\n\u003cul class=\"decimal_type\"\u003e\n \u003cli\u003eRQ1: How do holistic ratings, analytical measures, and their relationships differ between face-to-face and semi-direct modes of a speaking test?\u003c/li\u003e\n \u003cli\u003eRQ2: How do the psychological state and washback effects of test-takers differ between face-to-face and semi-direct modes of a speaking test?\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eWe employ the SST and TSST for videoconferencing and semi-direct modes, respectively, for two primary reasons: 1) Our study is contextualized in Japan, where these tests are tailored to precisely assess the majority of Japanese learners, whose proficiency typically ranges from CEFR A1 to B1. 2) Using the SST and TSST allows us to directly compare our findings with those of Zhou (2015a), who also used these tests.\u003c/p\u003e"},{"header":"Method","content":"\u003ch2\u003eParticipants\u003c/h2\u003e\n\u003cp\u003eThirty-eight senior high school students (girls: 31, boys: 7; \u003cem\u003eM\u003c/em\u003e_age = 16.55 years, \u003cem\u003eSD\u003c/em\u003e = 0.71) in Tokyo, Japan, who were learning English as a foreign language, voluntarily participated in the study.\u003c/p\u003e\n\u003ch2\u003eExaminers\u003c/h2\u003e\n\u003cp\u003eAll speaking performances were rated by official examiners trained in the SST and TSST scoring protocols.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eInstruments\u003c/h2\u003e\n\u003ch3\u003eVideoconferencing mode\u0026nbsp;\u003c/h3\u003e\n\u003cp\u003eWe used the SST for the videoconferencing mode. The SST is originally a face-to-face interview test that assesses English learners\u0026rsquo; speaking ability. Developed by the American Council for the Teaching of Foreign Languages (ACTFL) and ALC Press in 1997, it was the first independent speaking test in Japan and was modelled on the ACTFL Oral Proficiency Interview (OPI). The SST is a 10\u0026ndash;15-minute interview using a structured conversation between a certified interviewer and examinee; it consists of five stages (ACTFL-ALC Press, 2000). It begins with a casual chat about general topics, such as occupation and hobbies (Stage 1: warm-up), followed by three tasks: picture description (Stage 2), role-playing (Stage 3), and storytelling (Stage 4). After each task, the interviewer asks task-related questions and concludes the interview with additional casual chat (Stage 5, wind-down). This test allows for structured but flexible conversations.\u0026nbsp;The\u0026nbsp;interviewers assess the examinees\u0026rsquo; level from Novice to Advanced and select tasks and topics accordingly. They are trained in elicitation techniques for accurate ratings and maintain their skills through ongoing training, testing, and norming procedures.\u003c/p\u003e\n\u003ch3\u003eSemi-direct mode\u0026nbsp;\u003c/h3\u003e\n\u003cp\u003eWe employed the TSST for the semi-direct mode. Developed by ALC Press in 2004 as a practical alternative to the SST, the TSST is an automated speaking test administered over the telephone, allowing for more flexible scheduling. The test is available at any time and can be taken using a landline. It consists of ten pre-set monologic tasks randomly selected from a large pool of recorded tasks. Each task is designed to elicit performance appropriate to the speech functions, discourse type, content, and context of the target level. Six tasks are aimed at the intermediate level, and four at the advanced level. Examples of tasks include \u0026ldquo;Please describe (something)\u0026rdquo; and \u0026ldquo;Please talk about the last time you (did something).\u0026rdquo; Respondents have 45 seconds to complete their answers.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWhile both tests aim to assess oral proficiency, their task types differ in nature and interactional demands: the SST involves dynamic, interpersonal communication, whereas the TSST elicits more structured, monologic responses. The main differences between the tests are summarized in Table 1.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 1 Summary of major features of the SST and TSST\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"609\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 87px;\"\u003e\n \u003cp\u003eTest\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003eStandard Speaking Test (SST)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003eTelephone Standard Speaking Test (TSST)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 87px;\"\u003e\n \u003cp\u003eMode\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003eFace-to-face (online video)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003eTelephone-based\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" style=\"width: 87px;\"\u003e\n \u003cp\u003eTask-type\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003ePicture description\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003eAnswering questions on general topics\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003eRole-playing\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003e\u0026nbsp; Questions 1\u0026ndash;6 : Intermediate\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003eStory-telling\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003e\u0026nbsp; Questions 7\u0026ndash;10 : Advanced\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 87px;\"\u003e\n \u003cp\u003eTask delivery\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003eAdaptive\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003eNon-adaptive\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 87px;\"\u003e\n \u003cp\u003eTest time\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003eApproximately 15 minutes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 261px;\"\u003e\n \u003cp\u003eTen 45-seconds questions\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"4\" style=\"width: 87px;\"\u003e\n \u003cp\u003eRating\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" style=\"width: 523px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; Nine-level holistic scale\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" style=\"width: 523px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; Based on five categories of assessment criteria:\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" style=\"width: 523px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;global tasks or functions, text type ability, accuracy, contexts and\u0026nbsp;\u003cbr\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; content areas\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" style=\"width: 523px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; Two or three trained raters\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eData collection\u003c/h2\u003e\n\u003cp\u003eThe 38 participants completed both the SST and TSST on the same day. Half of the participants were randomly assigned to complete the SST first, and the other half to complete the TSST first, to counterbalance the order effects. Technical difficulties prevented some participants from completing the test or following the intended schedule, and they were excluded from the analysis. Thus, 33 participants (28 girls, 5 boys) were included in the analysis.\u0026nbsp;The final sample included 18 participants who completed the SST first, and 15 who completed the TSST first.\u0026nbsp;After completing both tests, participants answered questionnaires via Google Forms to collect data on psychological factors and washback effects.\u003c/p\u003e\n\u003ch2\u003eHolistic ratings of the SST\u0026nbsp;\u003c/h2\u003e\n\u003cp\u003eThe SST records all interviews and rates them using a nine-point holistic rating scale with two raters. Scores are assigned for Stages 1\u0026ndash;4 as well as for overall performance. If the two raters disagreed on the overall score, a third rater\u0026rsquo;s opinion was sought. The final rating was the one agreed upon by two of the three raters.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe SST scale evaluates English language proficiency on the basis of five categories of assessment criteria (ACTFL-ALC Press, 2000): (a) global tasks or functions (asking and answering simple questions, narrating, describing in major timeframes), (b) text type ability (from words and sentences to complex sentences, paragraphs, and extended discourse), (c) accuracy (the combination of grammar, vocabulary, pronunciation, fluency), (d) contexts (from common everyday situations to more complex social situations), and (e) content areas (from personal topics to a wide range of general interest topics). A holistic rating was assigned to each stage, and the overall level was determined by considering all assessment categories.\u0026nbsp;\u003c/p\u003e\n\u003ch3\u003eHolistic ratings of the TSST\u0026nbsp;\u003c/h3\u003e\n\u003cp\u003eThree raters evaluated each examinee\u0026rsquo;s responses to the TSST tasks using criteria identical to those of the SST evaluation. Each rater scored responses to five of the ten tasks, independently assigning a holistic level based on the examinee\u0026rsquo;s performance. The first and second raters scored odd and even numbered tasks, respectively, whereas the third rater scored both odd and even numbered items. No weighting or other adjustments were applied among each task. Then the raters assigned a level that represented the examinees\u0026rsquo; overall performance on five tasks. The final rating of the TSST was based on the ratings given by the raters.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eLevel Descriptions\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eEach level of the SST and TSST shares the same criteria: Levels 1 and 2 are equivalent to CEFR A1; levels 3 and 4 to A2; levels 5 and 6 to B1; levels 7 and 8 to B2; and level 9 to C1 or higher. For example, \u0026ldquo;a Level-4 speaker can maintain simple communication by talking about familiar topics and asking simple questions. A speaker at this level can connect simple short sentences to convey his/her thoughts, but fluency is disturbed doing so. With effort, the speaker can manage to respond to what has been asked, but he/she still cannot actively interact. The speaker\u0026rsquo;s pronunciation and word choices may still be influenced by his/her native language, but the impact is insignificant and listeners used to non-native English speakers would not have trouble understanding him/her\u0026rdquo; (ALC EDUCATION INC, 2025).\u003c/p\u003e\n\u003ch3\u003eCAF measures\u0026nbsp;\u003c/h3\u003e\n\u003cp\u003eWe transcribed the recorded SST and TSST protocols based on the \u003cem\u003eKakiokoshi Kihondanwa Tag Fuyo Guideline Version 2.1.3\u003c/em\u003e [Guidelines for Transcription and Tag Attachment Version 2.1.3] (Isahara et al., 2004). Fillers, repeated utterances, and other spoken sounds that did not appear to be directly related to the spoken content were transcribed. We aimed to measure complexity, accuracy, and fluency (CAF), the three major indices of second language proficiency (Housen \u0026amp; Kuiken, 2009). More specifically, following Koizumi and In\u0026rsquo;nami (2014), who investigated the factor structure of Japanese EFL learners\u0026rsquo; CAF using structural equation modelling, we calculated several measures. These included the number of clauses per AS-unit\u003ca href=\"#_ftn1\" name=\"_ftnref1\" title=\"\"\u003e\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e for syntactic complexity (SC), number of error-free clauses per clause for accuracy (A), number of words per minute for speed fluency (F1), and number of disfluency markers (i.e., filler words, repeated utterances, self-corrections, and cut-off utterances) per minute for repair fluency (F2).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTo ensure a fair comparison between the SST and TSST, test-takers\u0026rsquo; responses to\u0026nbsp;the prelude questions in the SST, such as \u0026ldquo;where do you live,\u0026rdquo; were excluded from the analysis. These questions\u0026nbsp;were designed to elicit one word or phrase responses, such as \u0026ldquo;Tokyo,\u0026rdquo; giving the interviewer sufficient context to ask more in-depth, targeted questions, such as, \u0026ldquo;Can you talk about your neighborhood in Tokyo?\u0026rdquo; These questions\u0026nbsp;were designed to elicit more complex and detailed utterances. By excluding responses to\u0026nbsp;the prelude questions in the SST from the analysis,\u0026nbsp;we created a more balanced comparison of speech data of the TSST, which does not include prelude questions.\u003c/p\u003e\n\u003ch3\u003ePsychological factors\u0026nbsp;\u003c/h3\u003e\n\u003cp\u003eWe used four scales: test-taking anxiety, liking for the test, perceptions of test difficulty, and perceptions of test validity. All scales were adapted from Zhou (2012) to allow results to be compared. The \u0026ldquo;test-taking anxiety\u0026rdquo; scale comprises three items and measures the degree to which learners felt nervous before and during the test. The \u0026ldquo;liking for the test\u0026rdquo; scale consists of four items and measures the extent to which learners prefer the test. The \u0026ldquo;perceptions of test difficulty\u0026rdquo; scale includes five items and measures the extent to which learners perceive difficulty and confidence. The \u0026ldquo;perceptions of test validity\u0026rdquo; scale comprises three items and measures the extent to which learners believe the test accurately assesses their speaking ability and fairness. All items were rated\u0026nbsp;on a five-point scale ranging from one (\u0026ldquo;strongly disagree\u0026rdquo;) to five (\u0026ldquo;strongly agree\u0026rdquo;). Participants responded to these items for both the SST and TSST.\u003c/p\u003e\n\u003ch3\u003eWashback effects\u0026nbsp;\u003c/h3\u003e\n\u003cp\u003eParticipants responded to the following open-ended question: \u0026ldquo;If a test similar to the SST were introduced as part of university entrance English exams, do you think it would influence test-takers\u0026rsquo; attitudes towards learning English, their learning behavior, and their motivation? Please describe\u0026nbsp;your experience\u0026nbsp;of taking\u0026nbsp;SST.\u0026rdquo; They then answered the same question regarding the TSST. These questions were displayed and answered in Japanese.\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eAnalyses\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003eWe conducted paired \u003cem\u003et\u003c/em\u003e tests to examine differences in holistic rating scores, CAF measures, and psychological factors between the two modes. Additionally, we employed correlation and multiple regression analyses to investigate the relationships between holistic rating scores and CAF measures. Furthermore, we analyzed participants\u0026rsquo; written responses to the open-ended questions to explore their opinions on how these tests might influence test-takers\u0026rsquo; attitudes toward learning English, their learning behavior, and their motivation.\u003c/p\u003e\n\u003cdiv id=\"ftn1\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Results","content":"\u003ch2\u003eHolistic rating scores of the two speaking test modes\u003c/h2\u003e\n\u003cp\u003eTables 2 and 3 summarize the key findings, including descriptive statistics and the results of the paired \u003cem\u003et\u003c/em\u003e tests and correlation analysis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 2 Cross-tabulation of SST and TSST ratings\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"534\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 21px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"7\" style=\"width: 305px;\"\u003e\n \u003cp\u003eTSST holistic rating scores\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" style=\"width: 229px;\"\u003e\n \u003cp\u003eSST holistic rating scores\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 136px;\"\u003e\n \u003cp\u003eRow total\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 21px;\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003eNovice high\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 136px;\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 21px;\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003eIntermediate low\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 136px;\"\u003e\n \u003cp\u003e15\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 21px;\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003eIntermediate low plus\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 136px;\"\u003e\n \u003cp\u003e8\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 21px;\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003eIntermediate middle\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 136px;\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 21px;\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003eIndermediate middle plus\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 136px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 21px;\"\u003e\n \u003cp\u003e8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 208px;\"\u003e\n \u003cp\u003eIntermediate high\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 136px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"2\" style=\"width: 229px;\"\u003e\n \u003cp\u003eColumn total\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e16\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 22px;\"\u003e\n \u003cp\u003e0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 136px;\"\u003e\n \u003cp\u003e33\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cem\u003e\u0026nbsp;\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 3 Holistic rating score agreement\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"347\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 176px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003eSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003eTSST\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 176px;\"\u003e\n \u003cp\u003eRating score mean (\u003cem\u003eSD\u003c/em\u003e)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e4.48 (0.94)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e4.67 (0.89)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 176px;\"\u003e\n \u003cp\u003e\u003cem\u003en\u003c/em\u003e (%) of SST \u0026gt; TSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e\u0026nbsp; (3.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 176px;\"\u003e\n \u003cp\u003e\u003cem\u003en\u0026nbsp;\u003c/em\u003e(%) of SST = TSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e25\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e(75.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 176px;\"\u003e\n \u003cp\u003e\u003cem\u003en\u003c/em\u003e (%) of SST \u0026lt; TSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 85px;\"\u003e\n \u003cp\u003e(21.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 176px;\"\u003e\n \u003cp\u003e\u003cem\u003et value\u003c/em\u003e (paired)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" style=\"width: 171px;\"\u003e\n \u003cp\u003e2.248\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 176px;\"\u003e\n \u003cp\u003e\u003cem\u003ep value\u0026nbsp;\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" style=\"width: 171px;\"\u003e\n \u003cp\u003e\u0026nbsp; .032\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 176px;\"\u003e\n \u003cp\u003eEffect size (\u0026Delta;)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" style=\"width: 171px;\"\u003e\n \u003cp\u003e0.20\u0026nbsp;(small effect)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 176px;\"\u003e\n \u003cp\u003eCorrelation coefficient (\u003cem\u003er\u003c/em\u003e)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" style=\"width: 171px;\"\u003e\n \u003cp\u003e0.872\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eThe mean holistic rating score was higher for the TSST than for the SST, indicating that the semi-direct mode received higher evaluations than the videoconferencing mode when using the same holistic criteria.\u003c/p\u003e\n\u003ch2\u003eCAF measures of the two speaking test modes\u003c/h2\u003e\n\u003cp\u003eTable 4 displays the descriptive statistics and the results of the paired \u003cem\u003et\u003c/em\u003e\u003cem\u003e\u0026nbsp;\u003c/em\u003etest for the measures of syntactic complexity (SC), accuracy (A), speed fluency (F1), and repair fluency (F2) for the SST and TSST. For subsequent analyses, responses to the prelude questions were excluded, as previously discussed.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 4 Descriptive statistics for CAF measures in SST and TSST\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"614\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 40px;\"\u003e\n \u003cp\u003eCAF\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003eTest\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eAnalytical scope\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003eMean\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003eSD\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003eMin.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003eMax.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 66px;\"\u003e\n \u003cp\u003e\u003cem\u003et\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e\u003cem\u003ep\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u003cem\u003ed\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" style=\"width: 40px;\"\u003e\n \u003cp\u003eSC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 47px;\"\u003e\n \u003cp\u003eSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eResponses to prelude questions INCLUDED\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e1.13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e1.02\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e1.35\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 66px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eResponses to prelude questions EXCLUDED\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e1.19\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e1.03\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e1.57\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 66px;\"\u003e\n \u003cp\u003e\u0026minus;10.158\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 51px;\"\u003e\n \u003cp\u003e\u0026lt; .001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 53px;\"\u003e\n \u003cp\u003e\u0026minus;1.768\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003eTSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eAll Responses\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e1.49\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.22\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e1.11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e1.98\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" style=\"width: 40px;\"\u003e\n \u003cp\u003eA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 47px;\"\u003e\n \u003cp\u003eSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eResponses to prelude questions INCLUDED\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.77\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.07\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.62\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 66px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eResponses to prelude questions EXCLUDED\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.68\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.46\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.85\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 66px;\"\u003e\n \u003cp\u003e3.456\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 51px;\"\u003e\n \u003cp\u003e.002\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 53px;\"\u003e\n \u003cp\u003e0.602\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003eTSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eAll Responses\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.62\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e0.42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.85\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" style=\"width: 40px;\"\u003e\n \u003cp\u003eF1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 47px;\"\u003e\n \u003cp\u003eSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eResponses to prelude questions INCLUDED\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e83.67\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e15.36\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e61.41\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e129.20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 66px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eResponses to prelude questions EXCLUDED\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e83.58\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e17.18\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e59.67\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e133.93\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 66px;\"\u003e\n \u003cp\u003e17.537\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 51px;\"\u003e\n \u003cp\u003e\u0026lt; .001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 53px;\"\u003e\n \u003cp\u003e3.053\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003eTSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003e\u0026nbsp;All Responses\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e60.93\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e18.04\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e38.40\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e113.20\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" style=\"width: 40px;\"\u003e\n \u003cp\u003eF2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 47px;\"\u003e\n \u003cp\u003eSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eResponses to prelude questions INCLUDED\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e20.27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e7.63\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e7.60\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e40.51\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 66px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eResponses to prelude questions EXCLUDED\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e21.25\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e8.10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e8.02\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e43.37\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 66px;\"\u003e\n \u003cp\u003e9.623\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 51px;\"\u003e\n \u003cp\u003e\u0026lt; .001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 53px;\"\u003e\n \u003cp\u003e1.675\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003eTSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 159px;\"\u003e\n \u003cp\u003eAll Responses\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e13.04\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e5.50\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e4.27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e28.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003ch2\u003eRelationship between holistic rating scores and CAF measures\u003c/h2\u003e\n\u003cp\u003eTo examine the relationship between the holistic rating scores and each CAF measure, we conducted correlation analyses for both the SST and TSST. We then conducted multiple regression analyses using the holistic rating score as the dependent variable and the CAF measures as the predictor variables. Based on the correlation analyses, repair fluency was excluded because of the lack of significant positive correlations with the holistic rating score. The results are presented in Tables 5 and 6.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 5 Correlations between holistic rating scores and CAF measures in SST and TSST\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"322\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"5\" style=\"width: 53px;\"\u003e\n \u003cp\u003eSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e[SC]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e[A]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e[F1]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e[F2]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003eLevel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.578***\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.550***\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.740***\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u0026minus;.279\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e[SC]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.343\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.468**\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u0026nbsp; .061\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e[A]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.281\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u0026minus;.197\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e[F1]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u0026minus;.324\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"5\" style=\"width: 53px;\"\u003e\n \u003cp\u003eTSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e[SC]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e[A]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e[F1]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e[F2]\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003eLevel\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.606***\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.610***\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.716***\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u0026minus;.065\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e[SC]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.667***\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.427*\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u0026nbsp; .110\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e[A]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e.369*\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u0026minus;.025\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 51px;\"\u003e\n \u003cp\u003e[F1]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 55px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e\u0026minus;.155\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cem\u003eNote\u003c/em\u003e: *** \u003cem\u003ep\u003c/em\u003e \u0026lt; .001, ** \u003cem\u003ep\u003c/em\u003e \u0026lt; .01, * \u003cem\u003ep\u003c/em\u003e \u0026lt; .05\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 6 Multiple regression analyses for SST and TSST\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"358\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 64px;\"\u003e\n \u003cp\u003e\u003cem\u003eB\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 53px;\"\u003e\n \u003cp\u003e\u003cem\u003eSE\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" style=\"width: 107px;\"\u003e\n \u003cp\u003e95%CI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 48px;\"\u003e\n \u003cp\u003e\u0026beta;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 47px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e \u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003eLower\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 50px;\"\u003e\n \u003cp\u003eUpper\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"5\" style=\"width: 47px;\"\u003e\n \u003cp\u003eSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e[SC]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 64px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp;1.515+\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e0.852\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e\u0026minus;0.228\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 50px;\"\u003e\n \u003cp\u003e3.257\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 48px;\"\u003e\n \u003cp\u003e.209\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e[A]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 64px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp;2.707**\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e0.904\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.858\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 50px;\"\u003e\n \u003cp\u003e4.557\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 48px;\"\u003e\n \u003cp\u003e.324\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e[F1]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 64px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp;0.030***\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e0.006\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.017\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 50px;\"\u003e\n \u003cp\u003e0.043\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 48px;\"\u003e\n \u003cp\u003e.551\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"6\" style=\"width: 311px;\"\u003e\n \u003cp\u003e \u003cem\u003eR\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e = .706\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"6\" style=\"width: 311px;\"\u003e\n \u003cp\u003e Adj \u003cem\u003eR\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e = .676\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"5\" style=\"width: 47px;\"\u003e\n \u003cp\u003eTSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e[SC]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 64px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp;0.740\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e0.591\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e\u0026minus;0.469\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 50px;\"\u003e\n \u003cp\u003e1.949\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 48px;\"\u003e\n \u003cp\u003e.186\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e[A]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 64px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp;2.241+\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e1.120\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e\u0026minus;0.050\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 50px;\"\u003e\n \u003cp\u003e4.532\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 48px;\"\u003e\n \u003cp\u003e.290\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e[F1]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 64px;\"\u003e\n \u003cp\u003e\u0026nbsp; \u0026nbsp; \u0026nbsp;0.026***\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 53px;\"\u003e\n \u003cp\u003e0.006\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 57px;\"\u003e\n \u003cp\u003e0.014\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 50px;\"\u003e\n \u003cp\u003e0.038\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 48px;\"\u003e\n \u003cp\u003e.530\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"6\" style=\"width: 311px;\"\u003e\n \u003cp\u003e \u003cem\u003eR\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e = .669\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"6\" style=\"width: 311px;\"\u003e\n \u003cp\u003e Adj \u003cem\u003eR\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e = .634\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cem\u003eNote\u003c/em\u003e: *** \u003cem\u003ep\u003c/em\u003e \u0026lt; .001, ** \u003cem\u003ep\u003c/em\u003e \u0026lt; .01, + \u003cem\u003ep\u003c/em\u003e \u0026lt; .10\u003c/p\u003e\n\u003ch2\u003ePsychological factors\u003c/h2\u003e\n\u003cp\u003eTo confirm internal consistency, we calculated Cronbach\u0026rsquo;s alpha coefficients for each of the four scales of both the SST and TSST. However, we did not obtain values as high as those reported by Zhou (2012), and our alpha coefficients ranged from 0.345 to 0.719. Therefore, we calculated means for each item for both test modes and performed paired \u003cem\u003et\u003c/em\u003e tests for each item. Table 7 presents descriptive statistics and the results of the \u003cem\u003et\u003c/em\u003e tests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 7 Psychological factors associated with SST and TSST\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"585\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"2\" style=\"width: 77px;\"\u003e\n \u003cp\u003eScale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" rowspan=\"2\" style=\"width: 221px;\"\u003e\n \u003cp\u003eItem\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" style=\"width: 83px;\"\u003e\n \u003cp\u003eSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd colspan=\"2\" style=\"width: 83px;\"\u003e\n \u003cp\u003eTSST\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 41px;\"\u003e\n \u003cp\u003e\u003cem\u003et\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 41px;\"\u003e\n \u003cp\u003e\u003cem\u003ep\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd rowspan=\"2\" style=\"width: 39px;\"\u003e\n \u003cp\u003e\u003cem\u003ed\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u003cem\u003eM\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u003cem\u003eSD\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u003cem\u003eM\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u003cem\u003eSD\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" style=\"width: 77px;\"\u003e\n \u003cp\u003eTest-taking anxiety\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI felt nervous before the SST/TSST.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e4.12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.86\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e3.85\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.47\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e.152\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e0.26\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI felt nervous when I was taking the SST/TSST.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e4.03\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.19\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e3.67\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.27\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.40\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e.172\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e0.24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI would have performed better if I had not got nervous.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e2.91\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.33\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e2.94\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.22\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u0026minus;0.14\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e.889\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e\u0026minus;0.02\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"4\" style=\"width: 77px;\"\u003e\n \u003cp\u003eLiking for the test\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI like the format of the SST/TSST.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e4.30\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.77\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e2.64\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.19\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e7.42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u0026lt;.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e1.29\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eThe SST/TSST was interesting.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e4.45\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.56\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e3.39\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.22\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e4.53\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u0026lt;.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e0.79\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eTaking the SST/TSST was not a pleasant experience. (R)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.76\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e2.64\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u0026minus;4.54\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u0026lt;.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e\u0026minus;0.79\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI am used to the format of the SST/TSST.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e2.76\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.32\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.76\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.71\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e4.20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u0026lt;.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e0.73\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"5\" style=\"width: 77px;\"\u003e\n \u003cp\u003ePerceptions of test difficulty\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI believe I did well on the tasks. (R)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e2.55\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.20\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.82\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.77\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e3.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u0026lt;.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e0.66\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI felt confident when I did the SST/TSST. (R)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e2.21\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.02\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.97\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.73\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.35\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e.187\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e0.24\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI felt the SST/TSST was difficult.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e3.85\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e4.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.66\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u0026minus;0.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e.377\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e\u0026minus;0.16\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e11\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI would have performed better if the format of the SST/TSST had been different.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e2.09\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.91\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e3.03\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.10\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u0026minus;4.84\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u0026lt;.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e\u0026minus;0.84\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e12\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI would have performed better if I had known the topics of the test tasks better.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e3.42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e3.55\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e\u0026minus;0.61\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e.545\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e\u0026minus;0.11\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd rowspan=\"3\" style=\"width: 77px;\"\u003e\n \u003cp\u003ePerceptions of test validity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI believe the format and the content of the SST/TSST was fair.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e4.18\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.92\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e3.85\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.03\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.46\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e.155\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e0.25\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e14\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eI believe I had enough opportunity to show my ability to speak English.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e4.06\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e0.93\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e3.58\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e2.48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e.018\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e0.43\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd style=\"width: 24px;\"\u003e\n \u003cp\u003e15\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 197px;\"\u003e\n \u003cp\u003eThe SST/TSST reflects accurately how well I speak English.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e4.18\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.01\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e3.73\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e1.90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 41px;\"\u003e\n \u003cp\u003e.066\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd style=\"width: 39px;\"\u003e\n \u003cp\u003e0.33\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003ch2\u003eWashback effects\u003c/h2\u003e\n\u003cp\u003eThe open-ended question for the SST included 2,407 total words, while the TSST included 1,880. The experience of taking the SST and TSST seems to have led to different opinions about how these tests would influence test-takers\u0026rsquo; attitudes towards learning English, their learning behavior, and their motivation.\u0026nbsp;Below are sample comments for the SST translated into English:\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eIf a test with such a format were introduced into examinations, I believe that school education would shift towards a focus on conversation, truly providing students with the opportunity to learn English.\u003c/li\u003e\n \u003cli\u003eI think that\u0026nbsp;when people realize that English is just a language and\u0026nbsp;that its fundamental nature is to communicate with others,\u0026nbsp;they will focus\u0026nbsp;more on\u0026nbsp;learning it for that purpose. This shift will not only help them pass university entrance examinations but also enable them to acquire English skills that are useful in everyday life.\u003c/li\u003e\n \u003cli\u003eI think knowing that their English is understood can be motivating for those who are not very good at English.\u003c/li\u003e\n \u003cli\u003eWhen I actually tried it, I became flustered and mixed up words and tenses that I usually understood, so I think if it were introduced, it would change the way I focused on learning. Not only that, but if we were to prepare for something like SST, I think we would develop more practical skills that are different from the grammar, vocabulary, listening, and reading we are currently learning. So, I think it should be implemented.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eThe following are examples of feedback on the TSST, translated into English:\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eIn the case of the TSST, because it is not face-to-face, examinees tend to focus more on accuracy than on conversations with facial expressions. While I felt that emotional expressions were easier to convey on the SST, I was less nervous on the TSST.\u003c/li\u003e\n \u003cli\u003eSimilar to the SST, it is necessary to focus on learning speaking skills and developing the ability to organize thoughts. I believe\u0026nbsp;that the\u0026nbsp;TSST imposes a greater burden on test-takers, leading them to spend more time studying English.\u003c/li\u003e\n \u003cli\u003eI don\u0026rsquo;t think that\u0026nbsp;this has a particular\u0026nbsp;impact. With computer-based processing, even if you have more to say, your answer forcibly ends when your time is up. Because you cannot see the other person\u0026rsquo;s expressions and do not know whether your message has been conveyed, it is difficult to know specifically what you need to improve.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eUnlike SST, TSST has predetermined questions, and the conversation does not delve deeper into what I am saying. Therefore, I think it\u0026rsquo;s becoming common to believe that preparing a rough template in advance is sufficient to complete the test.\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eWhen comparing test-takers\u0026rsquo; attitudes towards learning English, their learning behavior, and motivation between the videoconferencing SST\u0026nbsp;and semi-direct TSST modes, distinct trends emerged. The SST\u0026nbsp;was perceived to enhance practical speaking abilities and real-time conversational skills, leading to increased student engagement and motivation. Participants expect that this mode will motivate them to actively seek out opportunities for conversation, thereby enhancing \u0026ldquo;usable\u0026rdquo; English proficiency beyond exam preparation.\u003c/p\u003e\n\u003cp\u003eConversely, the TSST, which is perceived as lacking interactive communication, prompts a strategic and accuracy-focused approach to speaking practice. It may reduce anxiety by eliminating the dynamics of human interaction, but it may also lead to a formulaic study approach, with less emphasis on responsiveness and adaptability. The mode is seen as less stressful and more straightforward for some learners, yet potentially less effective in cultivating comprehensive communication skills.\u003c/p\u003e"},{"header":"Discussion","content":"\u003ch2\u003eHolistic ratings and analytical measures (RQ1)\u003c/h2\u003e\n\u003cp\u003eOur analysis revealed a significant difference in holistic rating scores between the SST and the TSST. Previous research has suggested equivalence between face-to-face and semi-direct modes (Luoma, 1997; Kenyon \u0026amp; Tschirner, 2000; Shohamy, 1994; Stansfield \u0026amp; Kenyon, 1992; Zhou, 2012) and between face-to-face and videoconferencing modes (Clark \u0026amp; Hooshmand, 1992; Kim \u0026amp; Craig, 2012; Nakatsuhara et al., 2017). Therefore, similar results were expected in this study when comparing videoconferencing and semi-direct modes. However, differences were identified, although the effect size was not large (\u003cem\u003ed\u003c/em\u003e = 0.391). Using the same rating criteria, participants received higher scores in the semi-direct mode than in the videoconferencing mode.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn terms of analytical measures, syntactic complexity was superior in the semi-direct mode, whereas accuracy, speed fluency, and repair fluency were higher in the videoconferencing mode. These findings are consistent with\u0026nbsp;Alzahrani (2020), who observed similar outcomes when comparing face-to-face and semi-direct modes. Although Alzahrani suggested that non-native speakers might avoid grammatical mistakes in front of\u0026nbsp;\u0026ldquo;native English speakers,\u0026rdquo; the first language of the examiners in the videoconferencing SST mode in this study was Japanese. Therefore, the trade-off effect provides a more convincing explanation for these findings.\u0026nbsp;Specifically, constructing more syntactically complex sentences requires additional time to consider sentence structure, or the formulator\u003ca href=\"#_ftn1\" name=\"_ftnref1\" title=\"\"\u003e\u003c/a\u003e\u003csup\u003e2\u003c/sup\u003e during the planning phase, potentially reducing fluency. Furthermore, the use of more complex sentences increases the likelihood of errors.\u003c/p\u003e\n\u003cp\u003eMultiple regression analyses indicated that fluency was the most significant predictor of holistic ratings in both the SST and TSST. This trend was particularly pronounced in the TSST, where syntactic complexity was not a significant predictor, and accuracy was only partially significant. This finding is noteworthy given that test-takers in the TSST generally produced more syntactically complex utterances, but these did not contribute significantly to holistic ratings. Given that examinees\u0026rsquo; fluency was generally lower in the TSST, one possible explanation is that fluency significantly influenced holistic ratings up to a certain level. Other measures, such as syntactic complexity and accuracy, only become meaningful when fluency is sufficiently high, particularly because fluency is crucial for the volume of content in utterances under time constraints. However, the number of participants in this study was not large enough for reliable correlational analysis; the confidence intervals were relatively wide, so firm conclusions cannot be drawn.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003ePsychological factors and washback effects (RQ2)\u0026nbsp;\u003c/h2\u003e\n\u003cp\u003eThe questionnaire results indicated that participants generally exhibited a strong preference for videoconferencing over the semi-direct mode. This aligns with most previous studies (e.g., James, 1988; Kiddle \u0026amp; Kormos, 2011; McNamara, 1987; Shohamy et al., 1993; Stansfield et al., 1990; Qian, 2009), which have consistently found that test-takers prefer to interact with their interlocutors, whether face-to-face or via videoconferencing (Du \u0026amp; Zhang, 2022). Several attempts have been made to explain this preference through anxiety, yielding mixed opinions. Some studies suggest that the presence of examiners alleviates examinees\u0026rsquo; anxiety (Glassen \u0026amp; Devine, 2023; Qian, 2009), while others argue the opposite (Song, 2014). Although some participants in this study agreed with Song\u0026rsquo;s view, the quantitative analysis did not reveal significantly different levels of anxiety between the two testing modes.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn terms of perceptions of test difficulty,\u0026nbsp;the results showed that test-takers felt they were better able to demonstrate their abilities in\u0026nbsp;videoconferencing mode. This is reflected in the higher mean scores for item 8, \u0026ldquo;I believe I did well on the tasks,\u0026rdquo; for the SST, and item 11, \u0026ldquo;I would have performed better if the format of the SST/TSST had been different\u0026rdquo; for the TSST. Additionally, participants felt that the videoconferencing mode tested their speaking ability more fairly, as evidenced by a higher mean score on item 14, \u0026ldquo;I believe I had enough opportunity to show my ability to speak English.\u0026rdquo;\u003c/p\u003e\n\u003cp\u003eParticipants\u0026rsquo; comments provide further insight into these results. They generally felt that videoconferencing facilitated interaction between examinees and examiners, creating a natural conversational environment that reflected real-life communication. They also mentioned the presence of responses such as nodding, facial expressions, and short responses, which provided feedback on whether their utterances made sense or successfully conveyed their intended meaning. In contrast, such reactions were absent in the TSST semi-direct mode, leading participants to feel that the structuring output required careful planning and more complex sentence structures. They also noted time constraints that seemed to necessitate well-organized speech. These comments align with the results showing that syntactic complexity in speech was better in the semi-direct mode, whereas fluency and accuracy were better in the videoconferencing mode. Interestingly, the holistic rating score was higher in the semi-direct mode, which contradicts Zhou\u0026rsquo;s (2012) observation that examinees who perceived the test as less difficult tended to achieve higher scores. These findings indicate that, while examinees generally prefer and feel more confident in the direct mode, believing it to be more valid than the semi-direct mode, they tend to produce more complex outputs in the latter. This emphasis on complexity, at the expense of fluency and accuracy, resulted in higher holistic rating scores in the semi-direct mode. Even more intriguing, although fluency was the primary and significant predictor of holistic rating scores in the semi-direct mode, fluency levels were generally lower in this mode.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSeveral factors may have contributed to these results. Zhou (2015a) suggested that raters might assign higher scores to candidates\u0026nbsp;who are perceived as disadvantaged by needing to speak to a machine. This hypothesis warrants further investigation into rater behavior and the psychological factors that influence rater assessments in semi-direct testing environments. Moreover, the comparison between face-to-face and semi-direct modes differs significantly from that between videoconferencing and semi-direct modes. Although many studies (e.g., Clark \u0026amp; Hooshmand, 1992; Kim \u0026amp; Craig, 2012; Lee, 2023; Nakatsuhara et al., 2017)\u0026nbsp;have reported equivalent ratings between face-to-face and videoconferencing modes, Mullooly and Glasson (2023) and Nakatsuhara et al. (2017) observed notable differences in\u0026nbsp;the functional outputs of test-takers. Therefore, simultaneous comparative analyses across face-to-face, videoconferencing, and semi-direct testing modes must be conducted to gain a more precise and comprehensive understanding.\u003c/p\u003e\n\u003cp\u003eWashback effects show distinct differences between videoconferencing and semi-direct modes. Examinees\u0026rsquo; comments suggest that while both testing modes generally encourage learners to improve their speaking skills, the videoconferencing mode is particularly favored for promoting conversational practice, as test-takers perceive it as a more natural communicative setting. This aligns with Brooks and Swain (2015), Fan (2014), Kiddle and Kormos (2011), and Qian (2009), who found that a lack of interaction in computer-delivered tests contributes to negative perceptions. The semi-direct mode, which is perceived as structured and less interactive, fosters a strategic and accuracy-focused speaking approach. This is likely to lead learners to focus on practicing specific topics within set time limits, which some find challenging.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWhat kind of learning do learners think each mode promotes? Specifically, what types of beliefs would each mode generate\u0026nbsp;and how would these beliefs shape learning strategies? Responses on the questionnaire suggest that if videoconferencing testing were introduced, learners would emphasize the importance of real-life use, fostering a desire to seek more opportunities for conversation. Conversely, if semi-direct testing were introduced, learners would prioritize accuracy and strategic learning, such as organizing thoughts in a short time and memorizing rough templates for\u0026nbsp;expected questions in advance. Although our discussion generally suggests that the direct videoconferencing mode is more encouraging, the semi-direct mode appears to foster more specific learning behaviors.\u003c/p\u003e\n\u003cp\u003eThe semi-direct mode may provide clearer instructions about what learners should do. However, as Murayama (2006) cautions, \u0026ldquo;test formats\u0026rdquo; can favor certain learning strategies over true effectiveness. He demonstrated that some learners believe in the effectiveness of rote learning, even\u0026nbsp;when presented with multiple-choice questions requiring deep understanding. Therefore, in the context of preparing for a speaking test, formulaic learning strategies based on students\u0026rsquo; beliefs about the effectiveness of the semi-direct testing mode might not be as effective or might be equally useful for the direct videoconferencing mode.\u003c/p\u003e\n\u003cp\u003eThe types of beliefs formed and learning strategies fostered by these different testing modes should be considered, to ensure that learners maintain their belief that these strategies truly fit each mode. If learners find that their strategies do not fit the mode, their beliefs and learning strategies may change. Therefore, long-term observations are necessary.\u003c/p\u003e"},{"header":"Conclusion and implications","content":"\u003cp\u003eOur results indicate that examinees received better holistic rating scores in the semi-direct mode than in the videoconferencing mode. Analytical measures revealed that the semi-direct mode\u0026nbsp;exhibits superior syntactic complexity at the expense of accuracy and fluency. However, fluency was the most significant predictor of holistic rating scores, a tendency that was particularly pronounced in the semi-direct mode.\u003c/p\u003e\n\u003cp\u003eRegarding psychological factors, despite better performance evaluations in the semi-direct mode, examinees strongly preferred the direct videoconferencing mode. They exhibited\u0026nbsp;greater confidence in their performance and a sense of demonstrating\u0026nbsp;their full abilities. A key factor appears to be the presence of interaction with an interlocutor, which examinees perceived as more engaging and responsive. They noted that real-time feedback\u0026mdash;such as backchannels, nods, and facial expressions\u0026mdash;helped them feel understood. Additionally, they believed that the videoconferencing mode fairly and accurately tested their speaking ability.\u003c/p\u003e\n\u003cp\u003eRegarding washback effects, participants felt that the videoconferencing mode fostered better learning behaviors by encouraging active engagement and spontaneous speech. In contrast, they associated the semi-direct mode with more concrete and preparatory strategies, such as focusing on grammatical accuracy and mentally preparing rough templates before responding. These differing tendencies suggest that each test mode may promote distinct learning orientations: the direct mode supports real-time communication skills, while the semi-direct mode encourages planned production. This distinction has implications for language instruction and assessment design. Educators and test developers should consider how task design influences learner behavior and ensure that assessments elicit the kinds of language use they aim to measure.\u003c/p\u003e\n\u003cp\u003eThis study had several methodological limitations. First, our sample was skewed, as most participants were female. Therefore, caution is needed when generalizing the results. Second, the sample size was not sufficiently large for correlation-based analyses, resulting in wider confidence intervals in the multiple regression analyses. Third, the scales for the psychological factors used in this study did not have sufficient internal consistency or replicate Zhou\u0026rsquo;s (2012) factor structures. Future research should closely examine and improve the constructs of each scale using larger and more diverse participant samples. Fourth, the analytical measures did not show the same relationships as in previous studies (e.g., Koizumi \u0026amp; In\u0026rsquo;nami, 2014). Speed fluency did not correlate positively with repair fluency. Future studies should further investigate the construct of CAF measures. Finally, the results on washback effects were based on participants\u0026rsquo; feedback comments collected immediately after taking both tests, reflecting a single point in time. For more precise and detailed insights, longitudinal studies are required to understand how learners\u0026rsquo; perceptions of the tests result in the formation of beliefs and learning strategies over time. Changes in beliefs and learning strategies over time and repeated test experiences should be considered.\u0026nbsp;\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cp\u003eCAF Complexity, acccuracy and fluency\u003c/p\u003e\n\u003cp\u003eACTFL American Council on the Teaching of Foreign Languages\u003c/p\u003e\n\u003cp\u003eIELTS International English Language Testing System\u003c/p\u003e\n\u003cp\u003eOPI Oral Proficiency Interview\u003c/p\u003e\n\u003cp\u003eSOPI Simulated Oral Proficiency Interview\u003c/p\u003e\n\u003cp\u003eTOEFL iBT Test of English as a Foreign Language Internet-based Test\u003c/p\u003e\n\u003cp\u003eSST Standard Speaking Test\u003c/p\u003e\n\u003cp\u003eTSST Telephone Standard Speaking Test\u003c/p\u003e\n\u003cp\u003eCEFR Common European Framework of Reference for Languages\u003c/p\u003e\n\u003cp\u003eRQ1 Research question 1\u003c/p\u003e\n\u003cp\u003eRQ2 Research question 2\u003c/p\u003e\n\u003cp\u003eA Accuracy\u003c/p\u003e\n\u003cp\u003eF1 Speed fluency\u003c/p\u003e\n\u003cp\u003eF2 Repair fluency\u003c/p\u003e\n\u003cp\u003eAS-unit Analysis of speech unit\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eEthics approval and consent to participate\u003c/h2\u003e\n\u003cp\u003eAll study participants provided informed consent, and the study design was approved by the Hiroshima Jogakuin University Research Ethics Review Committee (approval number: 2019-14).\u003c/p\u003e\n\u003ch2\u003eConsent for publication\u003c/h2\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003ch2\u003eFunding\u003c/h2\u003e\n\u003cp\u003eThis work was supported by the ALC Language Education Research Support Program of ALC Press Inc., awarded to the first and second authors. ALC Press Inc. also granted permission to compare the two tests, both of which were developed and administered by the company.\u003c/p\u003e\n\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\n\u003cp\u003eKS: conceptualization; data curation; formal analysis; funding acquisition, investigation; methodology; project administration; resources; software; validation; visualization; writing, original draft; writing, review and editing. RM: funding acquisition; project administration; resources; software; writing, review and editing.\u003c/p\u003e\n\u003ch2\u003eAcknowledgement\u003c/h2\u003e\n\u003cp\u003eWe are grateful to all the participants for their contributions to this study. We would also like to offer special thanks to Saki Onchi, Kayo Shimizu, Nene Nagashima, and Asuna Ueno for helping us with data entry.\u003c/p\u003e\n\u003ch2\u003eData Availability\u003c/h2\u003e\n\u003cp\u003eThe datasets generated and analyzed during the current study are not publicly available due to the test provider\u0026rsquo;s operational policy and the protection of examinees\u0026rsquo; and examiners\u0026rsquo; personal information, but are available from the corresponding author on reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eACTFL-ALC Press. (2000). \u003cem\u003eSST standard speaking test\u0026nbsp;\u003c/em\u003e\u003cem\u003emanual\u003c/em\u003e.\u0026nbsp;ACTFL-ALC Press.\u003c/li\u003e\n \u003cli\u003eALC EDUCATION INC. (2025). Level descriptions and professional applications\u003cem\u003e.\u003c/em\u003e https://tsst.alc.co.jp/biz/en/level/. Accessed 4 June 2025.\u003c/li\u003e\n \u003cli\u003eAlzahrani, N. A. (2020). \u003cem\u003eA comparative study of oral proficiency in direct (OPI) and semi-direct (VOCI) testing modes: Measures of complexity, accuracy, and fluency\u0026nbsp;\u003c/em\u003e(Publication No. 27834926) [Doctoral dissertation, Oklahoma State University]. ProQuest Dissertations Publishing. https://hdl.handle.net/11244/325439\u003c/li\u003e\n \u003cli\u003eBaba, S. (2019). How to produce beneficial washback effect by using high-stakes testing? Proposal from educational psychology. \u003cem\u003eJLTA Journal, 22\u003c/em\u003e, 44\u0026ndash;64. https://doi.org/10.20622/jltajournal.22.0_44\u003c/li\u003e\n \u003cli\u003eBrooks, L., \u0026amp; Swain, M. (2015). Students\u0026rsquo; voices: The challenge of measuring speaking for academic contexts. In B. Spolsky, O. Inbar, \u0026amp; M. Tannenbaum (Eds.), \u003cem\u003eChallenges for language education and policy: Making space for people\u003c/em\u003e (pp. 65-80). Routledge.\u003c/li\u003e\n \u003cli\u003eBrown, A. (1993). The role of test-taker feedback in the development process: Test takers\u0026rsquo; reactions to a tape-mediated test of proficiency in spoken Japanese. \u003cem\u003eLanguage Testing\u003c/em\u003e, \u003cem\u003e10\u003c/em\u003e, 277\u0026ndash;304.\u003c/li\u003e\n \u003cli\u003eClark, J. L. D. (1979). Direct versus semi-direct tests of speaking proficiency. In E. J. Briere \u0026amp; F. B. Hinofotis (Eds.), \u003cem\u003eConcepts in language testing: Some recent studies\u003c/em\u003e (pp. 35\u0026ndash;49). TESOL.\u003c/li\u003e\n \u003cli\u003eClark, J. L. D., \u0026amp; Hooshmand, D. (1992). \u0026ldquo;Screen-to-screen\u0026rdquo; testing: An exploratory study of oral proficiency interviewing using video conferencing. \u003cem\u003eSystem\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e(3), 293\u0026ndash;304. https://doi.org/10.1016/0346-251X(92)90041-Z\u003c/li\u003e\n \u003cli\u003eDu, Y., \u0026amp; Zhang, F. (2022). Examinees\u0026rsquo; affective preference for online speaking assessment: Synchronous vs asynchronous. \u003cem\u003eChinese Language Teaching Methodology and Technology\u003c/em\u003e, \u003cem\u003e5\u003c/em\u003e(1), 29\u0026ndash;46. https://engagedscholarship.csuohio.edu/cltmt/vol5/iss1/3\u003c/li\u003e\n \u003cli\u003eElder, C., Iwashita, N., \u0026amp; McNamara, T. (2002). Estimating the difficulty of oral proficiency tasks: What does the test-taker have to offer? \u003cem\u003eLanguage Testing, 19\u003c/em\u003e(4), 347\u0026ndash;368. https://doi.org/10.1191/0265532202lt235oa\u003c/li\u003e\n \u003cli\u003eFan, J. (2014). Chinese test takers\u0026rsquo; attitudes towards the Versant English test: A mixed-methods approach. \u003cem\u003eLanguage Testing in Asia, 4\u003c/em\u003e(1), 1\u0026ndash;17. https://doi.org/10.1186/s40468-014-0006-9\u003c/li\u003e\n \u003cli\u003eFan, J., \u0026amp; Ji, P. (2014). Test candidates\u0026rsquo; attitudes and their test performance: The case of the Fudan English test. \u003cem\u003eUniversity of Sydney Papers in TESOL\u003c/em\u003e,\u003cem\u003e\u0026nbsp;9\u003c/em\u003e, 1\u0026ndash;35. http://faculty.edfac.usyd.edu.au/projects/usp_in_tesol/pdf/volume09/Article01.pdf\u003c/li\u003e\n \u003cli\u003eFoster, P., Tonkyn, A., \u0026amp; Wigglesworth, G. (2000). Measuring spoken language: A unit for all reasons. \u003cem\u003eApplied Linguistics, 21\u003c/em\u003e, 354\u0026ndash;375. https://doi.org/10.1093/applin/21.3.354\u003c/li\u003e\n \u003cli\u003eGalaczi, E. D., \u0026amp; Taylor, L. (2018). Interactional competence: Conceptualisations, operationalisations, and outstanding questions. \u003cem\u003eLanguage Assessment Quarterly, 15\u003c/em\u003e(3), 219\u0026ndash;236. https://doi.org/10.1080/15434303.2018.1453816\u003c/li\u003e\n \u003cli\u003eGlasson, N. (2022). Is the devil you know better? Testwiseness and eliciting evidence of interactional competence in familiar versus unfamiliar triadic speaking tasks. \u003cem\u003eStudies in Language Assessment, 11\u003c/em\u003e(2), 58\u0026ndash;97. https://doi.org/10.58379/ttfe6660\u0026nbsp; \u003c/li\u003e\n \u003cli\u003eGlasson, N., \u0026amp; Devine, A. (2023). The eye of the stakeholder: perceptions of remote speaking. In J. Savage, E. Galaczi \u0026amp; H.-W. Lee (Eds.)\u0026nbsp;\u003cem\u003eResearch Notes\u003c/em\u003e \u003cem\u003eIssue 86\u003c/em\u003e (pp. 53\u0026ndash;69. Cambridge University Press \u0026amp; Assessment. https://www.cambridgeenglish.org/english-research-group/published-research/research-notes/\u003c/li\u003e\n \u003cli\u003eHousen, A., \u0026amp; Kuiken, F. (2009). Complexity, accuracy, and fluency in second language acquisition. \u003cem\u003eApplied Linguistics\u003c/em\u003e, \u003cem\u003e30\u003c/em\u003e, 461\u0026ndash;473. https://doi.org/10.1093/applin/amp048\u003c/li\u003e\n \u003cli\u003eHughes, A. (2003). \u003cem\u003eTesting for language teachers\u003c/em\u003e\u003cem\u003e\u0026nbsp;\u003c/em\u003e(2nd ed.). Cambridge University Press.\u003c/li\u003e\n \u003cli\u003eIsahara, H., Uchimoto, K., \u0026amp; Izumi, E. (2004). \u003cem\u003eNihon jin 1200 nin no eigo speaking corpus\u0026nbsp;\u003c/em\u003e[English speaking corpus of 1200 Japanese]. ALC Press.\u003c/li\u003e\n \u003cli\u003eJames, G. (1988). Development of an oral proficiency component in a test of English for academic purposes. In A. Hughes (Ed.), \u003cem\u003eTesting English for university study\u003c/em\u003e (ELT Documents 127) (pp. 111\u0026ndash;133). Modern English Publications and the British Council.\u003c/li\u003e\n \u003cli\u003eKenyon, D. M., \u0026amp; Tschirner, E. (2000). The rating of direct and semi-direct Oral Proficiency Interviews: Comparing performance at lower proficiency levels. \u003cem\u003eThe Modern Language Journal\u003c/em\u003e, \u003cem\u003e84\u003c/em\u003e(1), 85\u0026ndash;101. https://doi.org/10.1111/0026-7902.00054\u003c/li\u003e\n \u003cli\u003eKiddle, T., \u0026amp; Kormos, J. (2011). The effect of mode of response on a semi-direct test of oral proficiency. \u003cem\u003eLanguage Assessment Quarterly\u003c/em\u003e, \u003cem\u003e8\u003c/em\u003e(4), 342\u0026ndash;360. https://doi.org/10.1080/15434303.2011.613503\u003c/li\u003e\n \u003cli\u003eKim, J., \u0026amp; Craig, D. A. (2012). Validation of a videoconferenced speaking test. \u003cem\u003eComputer Assisted Language Learning\u003c/em\u003e, \u003cem\u003e25\u003c/em\u003e(3), 257\u0026ndash;275. https://doi.org/10.1080/09588221.2011.649482\u003c/li\u003e\n \u003cli\u003eKoizumi, R., \u0026amp; In\u0026rsquo;nami, Y. (2014). Modeling complexity, accuracy, and fluency of Japanese learners of English: A structural equation modeling approach. \u003cem\u003eJALT Journal\u003c/em\u003e, \u003cem\u003e36\u003c/em\u003e(1), 25\u0026ndash;42. https://doi.org/10.37546/JALTJJ36.1-2\u003c/li\u003e\n \u003cli\u003eKhabbazbashi, N., \u0026amp; Galaczi, E. D. (2020). A comparison of holistic, analytic, and part marking models in speaking assessment. \u003cem\u003eLanguage Testing, 37\u003c/em\u003e(3), 333\u0026ndash;360. https://doi.org/10.1177/0265532219881360\u003c/li\u003e\n \u003cli\u003eKrashen, S. (1985). \u003cem\u003eThe input hypothesis: Issues and implications\u003c/em\u003e. Longman.\u003c/li\u003e\n \u003cli\u003eLado, R. (1961). \u003cem\u003eLanguage testing\u003c/em\u003e. Longman.\u003c/li\u003e\n \u003cli\u003eLam, D. M. K. (2015). Contriving authentic interaction: Task implementation and engagement in school-based speaking assessment in Hong Kong. In G. Yu \u0026amp; Y. Jin (Eds.), \u003cem\u003eAssessing Chinese learners of English: Language constructs, consequences and conundrums\u003c/em\u003e (pp. 38\u0026ndash;60). Palgrave Macmillan.\u003c/li\u003e\n \u003cli\u003eLee, H. (2023). Looking into an innovative test mode in paired speaking from the perspective of scores. In J. Savage, E. Galaczi, \u0026amp; H.-W. Lee (Eds.) \u003cem\u003eResearch Notes\u003c/em\u003e \u003cem\u003eIssue 86\u003c/em\u003e (pp. 13\u0026ndash;32). Cambridge University Press \u0026amp; Assessment. https://www.cambridgeenglish.org/english-research-group/published-research/research-notes/\u003c/li\u003e\n \u003cli\u003eLevelt, W. J. M. (1989). \u003cem\u003eSpeaking: From intention to articulation\u003c/em\u003e. MIT Press.\u003c/li\u003e\n \u003cli\u003eLuk, J. (2010). Talking to score: Impression management in L2 oral assessment and the co-construction of a test discourse genre. \u003cem\u003eLanguage Assessment Quarterly\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e(1), 25\u0026ndash;53. https://doi.org/10.1080/15434300903473997\u003c/li\u003e\n \u003cli\u003eLuoma, S. (1997). \u003cem\u003eComparability of a tape-mediated and a face-to-face test of speaking: A triangulation study\u003c/em\u003e [Unpublished Licentiate theses]. University of Jyvaskyla, Finland. https://jyx.jyu.fi/handle/123456789/11733\u003c/li\u003e\n \u003cli\u003eLuoma, S. (2004). \u003cem\u003eAssessing speaking\u003c/em\u003e. Cambridge University Press.\u003c/li\u003e\n \u003cli\u003eMcNamara, T. F. (1987). \u003cem\u003eAssessing the language proficiency of health professionals: Recommendations for the reform of the Occupational English Test\u003c/em\u003e. A report submitted to the Council on Overseas Professional Qualifications. University of Melbourne, Department of Russian and Language Studies.\u003c/li\u003e\n \u003cli\u003eMullooly, J., \u0026amp; Glasson, N. (2023). Functional differences across modes in speaking performance. \u003cem\u003eCambridge English Research Notes, 86\u003c/em\u003e, 45\u0026ndash;54. https://www.cambridgeenglish.org/Images/702820-research-notes-86.pdf\u003c/li\u003e\n \u003cli\u003eMurayama, K. (2006). Adaptation to the test: A review of problems and perspectives. \u003cem\u003eThe Japanese Journal of Educational Psychology, 54\u003c/em\u003e(2), 265\u0026ndash;279. https://doi.org/10.5926/jjep1953.54.2_265\u003c/li\u003e\n \u003cli\u003eNakatsuhara, F., Inoue, C., Berry, V., \u0026amp; Galaczi, E. (2017). Exploring the use of videoconferencing technology in the assessment of spoken language: A mixed-methods study. \u003cem\u003eLanguage Assessment Quarterly\u003c/em\u003e, \u003cem\u003e14(\u003c/em\u003e1), 1\u0026ndash;18. https://doi.org/10.1080/15434303.2016.1263637\u003c/li\u003e\n \u003cli\u003eNakatsuhara, F., Inoue, C., Berry, V., \u0026amp; Galaczi, E. (2021). Video-conferencing speaking tests: Do they measure the same construct as face-to-face tests? \u003cem\u003eAssessment in Education: Principles, Policy \u0026amp; Practice\u003c/em\u003e, \u003cem\u003e28\u003c/em\u003e(4), 369\u0026ndash;388. https://doi.org/10.1080/0969594X.2021.1951163\u003c/li\u003e\n \u003cli\u003eO\u0026rsquo;Loughlin, K. J. (1995). Lexical density in candidate output on direct and semi-direct versions of an oral proficiency test. \u003cem\u003eLanguage Testing\u003c/em\u003e, \u003cem\u003e12\u003c/em\u003e, 217\u0026ndash;237. https://doi.org/10.1177/026553229501200205\u003c/li\u003e\n \u003cli\u003eO\u0026rsquo;Loughlin, K. J. (1997). \u003cem\u003eDirect and semi-direct tests of spoken language\u003c/em\u003e (Unpublished doctoral thesis). University of Melbourne.\u003c/li\u003e\n \u003cli\u003eO\u0026rsquo;Loughlin, K. J. (2001). \u003cem\u003eThe equivalence of direct and semi-direct speaking tests\u003c/em\u003e. Cambridge University Press.\u003c/li\u003e\n \u003cli\u003eQian, D. D. (2009). Comparing direct and semi-direct modes for speaking assessment: Affective effects on test takers. \u003cem\u003eLanguage Assessment Quarterly\u003c/em\u003e, \u003cem\u003e6\u003c/em\u003e(2), 113\u0026ndash;125. https://doi.org/10.1080/15434300902800059\u003c/li\u003e\n \u003cli\u003eRoever, C., \u0026amp; Ikeda, N. (2022). What scores from monologic speaking tests can(not) tell us about interactional competence. \u003cem\u003eLanguage Testing\u003c/em\u003e, \u003cem\u003e39\u003c/em\u003e(1), 7\u0026ndash;29. https://doi.org/10.1177/02655322211003332\u003c/li\u003e\n \u003cli\u003eRoever, C., \u0026amp; Kasper, G. (2018). Speaking in turns and sequences: Interactional competence as a target construct in testing speaking. \u003cem\u003eLanguage Testing, 35\u003c/em\u003e(3), 331\u0026ndash;355. https://doi.org/10.1177/0265532218758128\u003c/li\u003e\n \u003cli\u003eSeuren, L. M., Wherton, J., Greenhalgh, T., \u0026amp; Shaw, S. E. (2021). Whose turn is it anyway? Latency and the organization of turn-taking in video-mediated interaction. \u003cem\u003eJournal of Pragmatics, 172\u003c/em\u003e, 63\u0026ndash;78. https://doi.org/10.1016/j.pragma.2020.11.005\u003c/li\u003e\n \u003cli\u003eShohamy, E. (1994). The validity of direct versus semi-direct oral tests. \u003cem\u003eLanguage Testing\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e(2), 99\u0026ndash;123. https://doi.org/10.1177/026553229401100202\u003c/li\u003e\n \u003cli\u003eShohamy, E. (1998). Alternative assessment in language testing: Applying a multiplism approach. In E. Li \u0026amp; G. James (Eds.), \u003cem\u003eTesting and evaluation in second language education\u003c/em\u003e (pp. 99-114). HKUST Language Centre. http://lc.ust.hk/~center/workingpaper.html\u003c/li\u003e\n \u003cli\u003eShohamy, E., Donitze-Schmidt, S., \u0026amp; Waizer, R. (1993). The effect of the elicitation method on the language samples obtained on oral tests. Paper presented at the Annual Language Testing Colloquium, Cambridge, UK.\u003c/li\u003e\n \u003cli\u003eStansfield, C. W., \u0026amp; Kenyon, D. M. (1992). Research on the comparability of the oral proficiency interview and the simulated oral proficiency interview. \u003cem\u003eSystem\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e(3), 347\u0026ndash;364. https://doi.org/10.1016/0346-251X(92)90045-5\u003c/li\u003e\n \u003cli\u003eStansfield, C. W., Kenyon, D. M., Paiva, R., Doyle, F., Ulsh, I., \u0026amp; Cowles, M. A. (1990). The development and validation of the Portuguese speaking test. \u003cem\u003eHispania\u003c/em\u003e, \u003cem\u003e73\u003c/em\u003e(3), 641\u0026ndash;651. https://doi.org/10.2307/343942\u003c/li\u003e\n \u003cli\u003eSong, J. (2014). \u003cem\u003eA study of ESL students\u0026rsquo; performance and perceptions in face-to-face and virtual-world group oral tests\u003c/em\u003e [Unpublished doctoral thesis]. The University of Texas at Austin. https://doi.org/10.1111/0026-7902.00054\u003c/li\u003e\n \u003cli\u003eVan Moere, A. (2012). A psycholinguistic approach to oral language assessment. \u003cem\u003eLanguage Testing, 29\u003c/em\u003e(3), 325\u0026ndash;344. https://doi.org/10.1177/0265532211424478\u003c/li\u003e\n \u003cli\u003eYan, X., \u0026amp; Staples, S. (2023). \u003cem\u003eInvestigating the cognitive and social aspects of IELTS speaking performances across proficiency levels: Comparing the CAF-based and register-linguistic analyses\u003c/em\u003e (IELTS Research Reports Online Series, No. 2023/3). IELTS Partners. https://ielts.org/researchers/our-research/research-reports/investigating-the-cognitive-and-social-aspects-of-ielts-speaking-performances-across-proficiency-levels-comparing-the-caf-based-and-register-linguistic-analyses\u003c/li\u003e\n \u003cli\u003eZhang, L., \u0026amp; Jin, Y. (2021). Assessing interactional competence in the computer-based CET-SET: An investigation of the use of communication strategies. \u003cem\u003eAssessment in Education: Principles, Policy \u0026amp; Practice, 28\u003c/em\u003e(1), 1\u0026ndash;22. https://doi.org/10.1080/0969594X.2021.1976107\u003c/li\u003e\n \u003cli\u003eZhou, Y. (2012). Test-takers\u0026rsquo; affective reactions to a computer-delivered speaking test and their test performance. In M. Minegishi, O. Hieda, E. Hayatsu, \u0026amp; Y. Kawaguchi (Eds.), \u003cem\u003eWorking papers in corpus-based linguistics and language education, No. 9\u003c/em\u003e (pp. 295\u0026ndash;310). Tokyo University of Foreign Studies. http://cblle.tufs.ac.jp/assets/files/publications/working_papers_09/section/295-310.pdf\u003c/li\u003e\n \u003cli\u003eZhou, Y. (2015a). Comparing ratings of a face-to-face and telephone-mediated speaking test. \u003cem\u003eJACET Journal\u003c/em\u003e, \u003cem\u003e59\u003c/em\u003e, 33\u0026ndash;52. https://dl.ndl.go.jp/pid/10501826/1/1\u003c/li\u003e\n \u003cli\u003eZhou, Y. (2015b). Computer-delivered or face-to-face: Effects of delivery mode on the testing of second language speaking. \u003cem\u003eLanguage Testing in Asia\u003c/em\u003e, \u003cem\u003e5\u003c/em\u003e(2), 1\u0026ndash;16. https://doi.org/10.1186/s40468-014-0012-y\u003c/li\u003e\n \u003cli\u003eZhou, Y., \u0026amp; Yoshitomi, A. (2019). Test-taker perception of and test performance on computer-delivered speaking tests: The mediational role of test-taking motivation. \u003cem\u003eLanguage Testing in Asia\u003c/em\u003e, \u003cem\u003e9\u003c/em\u003e(10), 1\u0026ndash;19. https://doi.org/10.1186/s40468-019-0086-7\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Footnotes","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eThe AS-unit (Analysis of speech unit) is an augmented version of the T-unit, an independent clause, or a dependent clause connected to or embedded in an independent clause (Foster et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2000\u003c/span\u003e). Examples of one T-unit are: (a) \u0026ldquo;I like birds,\u0026rdquo; (b) \u0026ldquo;I liked the movie we saw yesterday,\u0026rdquo; and (c) \u0026ldquo;If it rains tomorrow, I will go to see a movie.\u0026rdquo; The AS-unit builds on the T-unit, including independent phrases that do not contain verbs, such as (d) \u0026ldquo;At the museum.\u0026rdquo; Examples (a) to (d) all contain one AS-unit, whereas (e) \u0026ldquo;I have a bird and its name is Pupu,\u0026rdquo; contains two AS-units because it has two independent clauses connected by the coordinating conjunction \u0026ldquo;and.\u0026rdquo;\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003e According to Levelt (\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e1989\u003c/span\u003e), the \u003cem\u003econceptualizer\u003c/em\u003e is responsible for generating the communicative intention and encoding it into some kind of coherent conceptual plan. In the next component, the \u003cem\u003eformulator\u003c/em\u003e, the organized preverbal plan activates the items in the lexicon that best correspond to the different chunks of the intended message, which will, in turn, be responsible for transforming it into a linguistic structure.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"speaking test, videoconferencing mode, semi-direct mode, CAF measures, complexity, accuracy, fluency, psychological factors, washback effect","lastPublishedDoi":"10.21203/rs.3.rs-7530331/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7530331/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis study investigates the differences between videoconferencing and human-to-machine modes in speaking assessment, focusing on holistic ratings, analytical measures, psychological factors, and washback effects (impacts of a test on the teaching and learning). Thirty-eight Japanese learners of English completed both test modes and a questionnaire. They received higher holistic rating scores in the semi-direct mode than in the videoconferencing mode. The semi-direct mode exhibited superior syntactic complexity but sacrificed accuracy and fluency. Participants strongly preferred the videoconferencing mode, and felt that it fostered better learning behaviors, whereas the semi-direct mode encouraged concrete learning strategies focused on accuracy and prepared templates.\u003c/p\u003e","manuscriptTitle":"Comparing Videoconferencing and Human-to-Machine Modes in Speaking Assessment: Holistic Ratings, Analytical Measures, Psychological Factors, and Washback Effects","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-22 09:18:56","doi":"10.21203/rs.3.rs-7530331/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3c846be7-c5da-4dfe-b283-01644e1993b1","owner":[],"postedDate":"September 22nd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-10-31T13:54:01+00:00","versionOfRecord":[],"versionCreatedAt":"2025-09-22 09:18:56","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7530331","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7530331","identity":"rs-7530331","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.