A Chi-square Statistic for Testing the Equality of Distracters’ Plausibility in Multiple-Choice Test Items

doi:10.21203/rs.3.rs-4441034/v1

A Chi-square Statistic for Testing the Equality of Distracters’ Plausibility in Multiple-Choice Test Items

2024 · doi:10.21203/rs.3.rs-4441034/v1

preprint OA: closed

Full text JSON View at publisher

Full text 97,254 characters · extracted from preprint-html · click to expand

A Chi-square Statistic for Testing the Equality of Distracters’ Plausibility in Multiple-Choice Test Items | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Chi-square Statistic for Testing the Equality of Distracters’ Plausibility in Multiple-Choice Test Items Sherwin E. Balbuena This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4441034/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 3 You are reading this latest preprint version Abstract This study introduces a new chi-square test statistic for testing the equality of response frequencies among distracters in multiple-choice tests. The formula uses the information from the number of correct answers and wrong answers, which becomes the basis of calculating the expected values of response frequencies per distracter. The method was applied to a statistics test response data and found to effectively detect unequally plausible distracters. Furthermore, the statistic had a quadratic relationship with item difficulty, indicating that at a certain range of plausibility values, there is an optimal item difficulty. distracter analysis chi-square test item analysis plausibility Figures Figure 1 Introduction Multiple-choice (MC) items are the most commonly used test type in schools due to their efficiency, objectivity, and ease of scoring [ 1 ][ 2 ]. These tests allow educators to assess a broad range of knowledge and skills quickly, making them ideal for large classrooms where individualized assessment can be time-consuming and impractical. Additionally, MC questions minimize the potential for scorer bias, providing a consistent and fair measure of student performance [ 3 ]. Their flexibility in testing various cognitive levels—from basic recall to application and analysis—further enhances their utility in diverse educational settings [ 4 ] [ 5 ]. Item analysis is a process of evaluating the quality of MC items. It involves statistical measures that provide information about their psychometric properties. The key item properties that are assessed include difficulty (DIFF), discrimination (DISC), and distracter efficiency (DE) [ 6 ] [ 7 ]. DIFF refers to the proportion of test-takers who answered an item correctly. The DISC indicates how well an item differentiates between high performers and low performers on the overall test, with higher DISC values indicating that more able examinees answered the items correctly. DE evaluates the functionality of incorrect options (distracters), ensuring that they plausibly attract those who do not know the correct answer while not misleading those who do. Together, these properties help refine multiple-choice items, enhancing their reliability and validity in measuring learners' knowledge and skills. Distracter efficiency (DE) is an important metric in the research on multiple-choice item quality, focusing on the performance of incorrect answer options. A distracter is considered functional if it is selected by at least 5% of examinees, indicating that it is effective in attracting those who do not know the correct answer. This threshold ensures that each distracter contributes to the item's overall discriminative power [ 8 ] [ 9 ]. Some studies confirmed its strong relationship with DIFF and DISC [ 10 ] [ 8 ] [ 11 ], while others did not find a correlation with these item quality metrics [ 12 ]. However, this distracter-level parameter is estimated individually through the computation of the proportion of examinees who chose the distracter. There is a distinct lack of studies analyzing the collective efficiency of multiple distracters. Furthermore, there are no established methods to evaluate whether all distracters are equally plausible, although this property is recommended. This lack of equal plausibility metrics means that while individual distracters might meet the > 5% criterion, the quality of the corresponding item could still be compromised if the distracters do not function well together to attract less-able examinees. This study aimed to develop a new method of assessing distracter plausibility relative to the response frequencies for the correct option and the other distracters. The new method will be tested for correlations with other item parameters, such as difficulty and discrimination, to determine whether they independently or collectively measure item quality. Methods Research Design This study employs an exploratory research design, which focuses on the development of a new statistical method for item analysis. It also aims to explore whether the statistical measure can be used as a supplementary assessment tool to assess the quality of multiple-choice test items. Correlations of the new statistic with other item analysis metrics will be conducted to assess the former’s validity and possible complementary role in item quality checks. Study Context The method used in this study was applied to the analysis of items in a statistical test, which measures the student’s ability to choose appropriate parametric and nonparametric techniques given the data characteristics or assumptions met. This test was administered in a graduate-level statistics course at one state college in the Bicol region, the Philippines, covering the school years 2021–2022 to 2022–2023. The 6-item test comprises only a portion of the final exams after excluding extremely easy and extremely difficult items and other non-MC items. Calculation of Item Parameters The calculations of DIFF, DISC, and the new statistic, referred to in this study as M , were computed using different approaches. DIFF and DISC were estimated using Rasch model analysis in the eRm package in R [ 13 ] [ 14 ]. The M statistic is given by the formula below, which is based on the calculation of the chi-square statistic using an expected value obtained by dividing the number of incorrect responses to an item by the number of distracters. $$M=\sum _{j=1}^{d}{z}_{j}=\sum _{j=1}^{d}\frac{{({n}_{j}-{e}_{j})}^{2}}{{e}_{j}}$$ where ${n}_{j}=$ observed frequency of responses for distracter j ${e}_{j}=\frac{{w}_{i}}{d}=\frac{{N}_{i}-{c}_{i}}{d}$ = expected value for distracter j w i = number of wrong responses for item i N i = number of test takers for item i c i = number of correct answers for item i The degrees of freedom are d or the number of distracters. Data Analysis The obtained values of DIFF, DISC, and M were correlated using Pearson product-moment correlation to determine whether a linear relationship existed. Further modeling using polynomial regression was conducted to obtain a better fit of the model to the data. The analyses were all performed in R and its IDE RStudio. All levels of significance were set at 5%. Results and Discussion Using the frequency of correct responses, the M statistic can be derived by dividing the frequency of incorrect responses by the number of distracters. In Table 1 , sample computations of the expected values are presented. For a number of distracters d = 3, a number of examinees N = 100, and a number of correct responses c = 70, for example, the expected frequency per distracter is 10.0. Given a certain observed frequency, its distance from the corresponding expected value can be computed, which may be a negative or positive distance. Squaring these differences and dividing by the expected value results in a chi-square with 1 degree of freedom. Summing these ratios of squared differences and expectations across the number of distracters d results in a chi-square test with d degrees of freedom. Table 1 Sample expected values of response frequencies for items with d = 3 and N = 100 No. of correct responses (c) Expected value of number of responses for a distracter [ (N-c)/3 ] No. of correct responses (c) Expected value of number of responses for a distracter [ (N-c)/3 ] 70 10.0000 78 7.3333 71 9.6667 79 7.0000 72 9.3333 80 6.6667 73 9.0000 81 6.3333 74 8.6667 82 6.0000 75 8.3333 83 5.6667 76 8.0000 84 5.3333 77 7.6667 85 5.0000 An example of whether the hypothetical item distracters are equally plausible based on their frequencies is provided in Table 2 . For instance, with 25 correct responses (c), the number of incorrect responses totals N – c, which is 100–25 = 75. If Distracters 1, 2, and 3 receive 35, 15, and 25 responses, respectively, and the expected value for each distracter is 8.33, their respective z values are calculated as 4, 4, and 0, summing to 8. The corresponding p-value for this statistic with df = 3 is 0.046, indicating that the observed frequencies significantly differ from the expected frequencies. Therefore, the distracters are not equally plausible. Table 2 An illustrative example of applying the statistic (d = 3, N = 100) c n1 n2 n3 z1 z2 z3 chisq p value Equally plausible? 25 35 15 25 4.0000 4.0000 0.0000 8.0000 0.046 No 25 46 18 11 27.3282 0.6205 5.2513 33.2000 0.000 No 25 51 7 17 41.2552 9.6302 0.8802 51.7656 0.000 No 26 20 20 34 0.8829 0.8829 3.5315 5.2973 0.151 Yes 28 4 50 18 16.6667 28.1667 1.5000 46.3333 0.000 No 32 15 15 38 2.5931 2.5931 10.3725 15.5588 0.001 No 34 20 23 23 0.1818 0.0455 0.0455 0.2727 0.965 Yes Application to a Dataset We analyzed a dataset consisting of test responses from a graduate-level statistics course with 198 participants. For six items, we recorded the frequency of responses for each option, marking the frequency of correct responses with an asterisk "*". Using the M statistic, we computed the expected values, which are displayed in Table 3 . For instance, in Item 1, 104 out of 198 examinees selected the correct answer, while 94 chose incorrect answers (distracters). Based on these data, the expected value was approximately 31.33. Table 3 Frequencies (percentages) of response to all options for the 6 test items and the corresponding expected frequency per distracter Item Options N-c Expected value A B C D 1 104*(53%) 37(19%) 41(21%) 16(8%) 94 31.33 2 37(19%) 103*(52%) 19(10%) 39(20%) 95 31.67 3 20(10%) 16(8%) 142*(72%) 20(10%) 56 18.67 4 35(18%) 40(20%) 20(10%) 103*(52%) 95 31.67 5 11(6%) 11(6%) 135*(68%) 41(21%) 63 21.00 6 11(6%) 6(3%) 36(18%) 145*(73%) 53 17.67 * frequency of correct responses (c) The DISC and DIFF parameters were estimated using the dichotomous Rasch model to evaluate the psychometric properties of the items. As shown in Table 4 , two items (Items 2 and 5) had negative discrimination values, indicating poor quality. There were three easy items (Items 3, 5, and 6 with negative logits) and three difficult items (Items 1, 2, and 4 with positive logits). Further analysis using M revealed that three items had equally plausible distracters (Items 2, 3, 4), while three items had distracters of unequal plausibility (Items 1, 5, 6). Item 5 was flagged as poor quality due to both its nondiscriminative nature and unequally plausible distracters. Overall, only Items 3 and 4 demonstrated good quality based on the assessed properties. In this new approach, the detection of implausible distracters is different from the traditional approach popularized by Haladyna and Downing (1993) [ 8 ]. Although most frequencies exceeded the > 5% criterion for functional distracters, except for Distracter B of Item 6, which had a 3% response, the items were still flagged for being collectively ineffective in attracting less-able test takers. Hence, the new method can complement the existing methodologies in distracter analysis to identify items with dysfunctional distracters for further investigation. Table 4 Rasch-based Item DISC and DIFF estimates and results of equality of distracters’ plausibility tests Item DISC DIFF M p-value Equally plausible? 1 0.244 0.519 11.51064 0.009 No 2 -0.062 0.546 7.663158 0.053 Yes 3 0.364 -0.577 0.571429 0.903 Yes 4 0.150 0.546 6.842105 0.077 Yes 5 -0.103 -0.358 28.57143 0.000 No 6 0.475 -0.676 29.24528 0.000 No The flagged items considered for revision are shown in Table 5 . The contents of the 3 problematic items below are given with the recommended revisions at the distracter level. Item 1 was found to have unequally plausible distracters, as shown in Table 4 ; hence, distracter D with the least frequency was revised from “Wilcoxon signed-rank test” to “Welch t-test”. The original option D was not attractive, possibly due to its association with paired data analysis. Replacing this with the “Welch t test” may be more effective since the tool is used as an alternative when the assumption for homogeneity of variances is violated. Table 5 Item content and options of the three items flagged for unequal plausibility and recommended revisions with justifications Item Content and Options ( with recommended revisions ) Reason/s for revision Item 1. The following are the characteristics of data: (1) dependent variable is measured at interval/ratio level; (2) There are two independent categories for the nominal independent variable; (3) The distribution in each group is normal; (4) The variances of the two groups are equal; (5) There are no observed outliers. What statistical tool is the most appropriate to compare the two groups? Independent groups t test* Mann‒Whitney U test Paired t test Wilcoxon signed-rank test (Replace with Welch t test) Unequal plausibility of distracters. Distracter D had the lowest frequency. The replacement is assumed to distract more effectively because the tool is used alternatively when t test Assumption (4) is not met. Item 5. The following are the characteristics of data: (1) Variables X and Y are measured at interval/ratio level; (2) X and Y are paired; (3) The distribution of the paired data are bivariate normal; (4) There are no observed outliers; (5) There is a linear relationship between X and Y. What statistical tool is the most appropriate to test the hypothesis that there is no linear correlation between X and Y? Analysis of variance (Replace with Chi-square test of independence) Paired t test (Replace with Point-biserial correlation) Pearson product-moment correlation* Spearman rank correlation Unequal plausibility of distracters. Distracters A and B had the lowest frequencies maybe because the contents are tests of comparison. The replacements are assumed to distract more effectively because the tools are used alternatively to test relationships between variables. Item 6. The following are the characteristics of data: (1) Variables X and Y are measured at interval/ratio level; (2) X and Y are paired; (3) The distribution of the paired data are not normal; (4) There are observed outliers; (5) There is a monotonic relationship between X and Y. What statistical tool is the most appropriate to test the hypothesis that there is no correlation between X and Y? Analysis of variance (Replace with Chi-square test of independence) Paired t test (Replace with Point-biserial correlation) Pearson product-moment correlation Spearman rank correlation* Unequal plausibility of distracters. Distracters A and B had the lowest frequencies maybe because the contents are tests of comparison. The replacements are assumed to distract more effectively because the tools are used alternatively to test relationships between variables. In items involving the assumptions of tests of relationships between variables, Items 5 and 6 had unequally plausible distracters. Two of the three distracters were comparison tests (e.g., ANOVA and paired t tests); therefore, they were less appealing or less effective as distracters in a collective manner. This is likely because these types of tests may not be as relevant or plausible within the context of the question (e.g., assumptions of correlation test), making them less likely to be chosen by examinees who do not know the correct answer. Consequently, these distracters fail to effectively challenge test takers and are not as efficient at diverting them from the correct answer. Replacing these with other tests of relationships (e.g., chi-square test of independence and point-biserial correlation) may address this unequal plausibility. Correlation of M with DIFF and DISC To investigate potential relationships between DIFF and M, as well as between item DISC and M, we conducted correlation and regression analyses. The results indicated a moderate negative linear correlation between M and DIFF (r = -0.458), although this relationship was not statistically significant (p > 0.05). This suggests that as item difficulty increases, the plausibility of the distracters tends to decrease, but the relationship is not strong enough to be conclusive. Additionally, no significant correlation was found between M and DISC (r = -0.037, p > 0.05), indicating that the discriminative power of an item is not related to the plausibility of its distracters. Finally, there was a nonsignificant negative correlation between DIFF and DISC (r = -0.465, p > 0.05), implying that while there may be a tendency for more difficult items to be less discriminative, this trend is not statistically significant. Overall, the analyses suggest linear independence among these metrics. Despite the lack of correlations, we noted some polynomial trends in the relationships between M and DIFF. The scatterplot in Fig. 1 shows a rather curvilinear trend such that when the value of the M statistic changes, the value of DIFF follows an inverted U pattern with a maximum a value near M = 10. Therefore, we conducted a polynomial regression to determine if a polynomial function fits the empirical data. Table 6 Results of polynomial regression showing significant coefficients for linear and quadratic trends Coefficients: Estimate Std. error t value p value (Intercept) -0.610253 0.187745 -3.250 0.04747 * M 0.189268 0.033788 5.602 0.01124 * I(M^2) -0.006447 0.001007 -6.400 0.00773 ** Residual standard error: 0.1791 for 3 degrees of freedom Multiple R-squared: 0.9461, Adjusted R-squared: 0.9101 F-statistic: 26.31 on 2 and 3 DF, p value: 0.01253 In Table 6 , polynomial regression showed a multiple R-squared of 0.9461, which indicates that approximately 94.61% of the variance in DIFF is explained by the model, showing a high degree of fit. The adjusted R-squared equals 0.9101 after adjusting the R-squared value for the number of predictors in the model, still indicating a strong fit. The overall model significance test suggested that the model significantly predicted DIFF [F(2,3) = 26.31, p = 0.01253]. The polynomial regression results reveal a significant quadratic relationship between DIFF and M. The significant negative coefficient for M 2 suggests a parabolic curve where DIFF initially increases with M but starts to decrease as M continues to increase. The high R-squared and adjusted R-squared values indicate that the model explains a large portion of the variability in DIFF. Given the significant p values for all the coefficients, we can infer that the equal plausibility metric (M) has a meaningful and complex impact on item difficulty (DIFF). This model can be useful for understanding how changes in M affect DIFF and can inform the design and evaluation of test items to achieve desired levels of difficulty. Conclusion and Recommendations This study introduced a chi-square statistic, M, designed to detect significant deviations from the expected frequencies of distracters, known as the equal plausibility of distracters. This novel item analysis metric fills a gap by evaluating the collective functionality of distracters and serves as a basis for identifying items with dysfunctional distracters. The statistic was empirically tested using response data from a statistics test and was found to effectively detect items with implausible distracters. Furthermore, the new metric showed a quadratic relationship with item difficulty, suggesting an optimal difficulty level within a specific range of M values. Several limitations were noted in this study. First, the dataset included only 6 items, which is a very small sample size for item analysis, potentially affecting the observed relationships. Second, DIFF and DISC estimates were obtained using the Rasch model, which differs from classical test theory estimates. No assumption tests were conducted to confirm whether the items met the Rasch model's expectations, potentially invalidating the derived estimates. Future research is encouraged to verify these results and address the limitations identified in this study. Declarations Funding This study did not receive funding from any granting agencies. Human Ethics and Consent to Participate This study utilized a test response data set to evaluate the applicability of the developed statistical method. No personal or identifiable information was used, ensuring that ethical standards were maintained. As such, formal consent from participants was not required for this analysis. Consent for Publication Not applicable. Competing Interest The author declares no competing interests. Data Availability Statement The data used in this study will be available upon request. Author Contribution The author solely conceptualized the study, conducted the data analysis, and authored the manuscript. All aspects of the research, from the initial idea through to the final write-up, were independently carried out by the author. References Abdulghani, H. M., Irshad, M., Haque, S., Ahmad, T., Sattar, K., & Khalil, M. S. (2017). Effectiveness of longitudinal faculty development programs on MCQs items writing skills: A follow-up study. PloS One, 12 (10), e0185895. Wood, E., Klausz, N., & MacNeil, S. (2022). Examining the influence of multiple-choice test formats on student performance. Innovative Higher Education, 47 , 515–531. https://doi.org/10.1007/s10755-021-09581-7 Xu, X., Kauer, S., & Tupy, S. (2016). Multiple-choice questions: Tips for optimizing assessment in-seat and online. Scholarship of Teaching and Learning in Psychology , 2 (2), 147. Palmer, E. J., & Devitt, P. G. (2007). Assessment of higher order cognitive skills in undergraduate education: modified essay or multiple-choice questions? Research paper. BMC Medical Education, 7 , 1-7. Mitra, A. K. (2022). The Art of Designing a Quality Multiple Choice Question in Chemistry. Resonance , 27 (6), 1017-1031. Elgadal, A. H., & Mariod, A. A. (2021). Item analysis of multiple-choice questions (MCQs): assessment tool for quality assurance measures. Sudan Journal of Medical Sciences, 16 (3), 334-346. DiBattista, D., & Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. Canadian Journal for the Scholarship of Teaching and Learning, 2 (2), 4. Haladyna, T. M., & Downing, S. M. (1993). How many options is enough for a multiple-choice test item?. Educational and Psychological Measurement, 53 (4), 999-1010. Gierl, M. J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, analyzing, and using distracters for multiple-choice tests in education: A comprehensive review. Review of Educational Research, 87 (6), 1082-1116. Rezigalla, A. A., Eleragi, A. M. E. S. A., Elhussein, A. B., Alfaifi, J., ALGhamdi, M. A., Al Ameer, A. Y., ... & Adam, M. I. E. (2024). Item analysis: the impact of distracter efficiency on the difficulty index and discrimination power of multiple-choice items. BMC Medical Education, 24 (1), 445. Testa, S., Toscano, A., & Rosato, R. (2018). Distracter efficiency in an item pool for a statistics classroom exam: Assessing its relation with item cognitive level classified according to Bloom’s taxonomy. Frontiers in Psychology, 9 , 357601. Puthiaparampil, T., & Rahman, M. (2021). How important is distracter efficiency for grading Best Answer Questions?. BMC Medical Education, 21 , 1-6. Mair, P., & Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software , 20 , 1-20. R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editor assigned by journal 07 Jun, 2024 Submission checks completed at journal 27 May, 2024 First submitted to journal 18 May, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4441034","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":307137845,"identity":"462ee4cb-d501-4935-a98a-06e59888c8c8","order_by":0,"name":"Sherwin E. Balbuena","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABTklEQVRIie3QsWqDQBjA8U8O7GJwtQjxFU6EiJSSV1EOmkVIp9ChUKGgS0rXKwnkFQKFzIYDu0iyCjokCE4ZAoGSwZYqISW2hTRbKf4n9fzx3R1AXd0fDANyd0/S7hVEMPmDH/j90gHhvpBz55Og35Dyi3+E6GdTNxVuEkUZ3Acb4Tppai8kWKzzSVv05j6sewx02TkkRt/yNCHM1HESkIGAM60VZh31yY0tGhLg6IyBMfQrG/MtV264jBtLtoZszKxJZLfkhhObGAigYglwZFbIfFmQd9Ye0e6mJHfPtPsq53ncxmIK6O0HEpVTHGY5kY1KYmLJ5mXg42JuMYX7Tgy69NRhwMg4utJQjplKw0xXH8qzRCme9mcdwaAVoovF/axu2eWIkpSjOVNEj2SLbbEx8dFaLra9i6YuVciRyqsS8Algf97TSV1dXd2/6gPlEoTm7D44gwAAAABJRU5ErkJggg==","orcid":"","institution":"Dr Emilio B Espinosa Sr Memorial State College of Agriculture and Technology","correspondingAuthor":true,"prefix":"","firstName":"Sherwin","middleName":"E.","lastName":"Balbuena","suffix":""}],"badges":[],"createdAt":"2024-05-18 12:29:04","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4441034/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4441034/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":57889340,"identity":"85d5c9b7-a657-4bb8-9724-ed7cce723b8d","added_by":"auto","created_at":"2024-06-07 05:59:24","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":20408,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eScatterplot of M and DIFF showing the curvilinear relationship\u003c/em\u003e\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-4441034/v1/fe1051d82bd015549d28943a.jpg"},{"id":57889667,"identity":"5802ec53-dde4-47e9-aaf9-b5b9024c4908","added_by":"auto","created_at":"2024-06-07 06:07:24","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":560421,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4441034/v1/fbb91b2f-1bc9-4a9f-b020-27d447f08cbb.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Chi-square Statistic for Testing the Equality of Distracters’ Plausibility in Multiple-Choice Test Items","fulltext":[{"header":"Introduction","content":"\u003cp\u003eMultiple-choice (MC) items are the most commonly used test type in schools due to their efficiency, objectivity, and ease of scoring [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e][\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. These tests allow educators to assess a broad range of knowledge and skills quickly, making them ideal for large classrooms where individualized assessment can be time-consuming and impractical. Additionally, MC questions minimize the potential for scorer bias, providing a consistent and fair measure of student performance [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Their flexibility in testing various cognitive levels\u0026mdash;from basic recall to application and analysis\u0026mdash;further enhances their utility in diverse educational settings [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eItem analysis is a process of evaluating the quality of MC items. It involves statistical measures that provide information about their psychometric properties. The key item properties that are assessed include difficulty (DIFF), discrimination (DISC), and distracter efficiency (DE) [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. DIFF refers to the proportion of test-takers who answered an item correctly. The DISC indicates how well an item differentiates between high performers and low performers on the overall test, with higher DISC values indicating that more able examinees answered the items correctly. DE evaluates the functionality of incorrect options (distracters), ensuring that they plausibly attract those who do not know the correct answer while not misleading those who do. Together, these properties help refine multiple-choice items, enhancing their reliability and validity in measuring learners' knowledge and skills.\u003c/p\u003e \u003cp\u003eDistracter efficiency (DE) is an important metric in the research on multiple-choice item quality, focusing on the performance of incorrect answer options. A distracter is considered functional if it is selected by at least 5% of examinees, indicating that it is effective in attracting those who do not know the correct answer. This threshold ensures that each distracter contributes to the item's overall discriminative power [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Some studies confirmed its strong relationship with DIFF and DISC [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e], while others did not find a correlation with these item quality metrics [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. However, this distracter-level parameter is estimated individually through the computation of the proportion of examinees who chose the distracter. There is a distinct lack of studies analyzing the collective efficiency of multiple distracters. Furthermore, there are no established methods to evaluate whether all distracters are equally plausible, although this property is recommended. This lack of equal plausibility metrics means that while individual distracters might meet the \u0026gt;\u0026thinsp;5% criterion, the quality of the corresponding item could still be compromised if the distracters do not function well together to attract less-able examinees.\u003c/p\u003e \u003cp\u003eThis study aimed to develop a new method of assessing distracter plausibility relative to the response frequencies for the correct option and the other distracters. The new method will be tested for correlations with other item parameters, such as difficulty and discrimination, to determine whether they independently or collectively measure item quality.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eResearch Design\u003c/h2\u003e \u003cp\u003eThis study employs an exploratory research design, which focuses on the development of a new statistical method for item analysis. It also aims to explore whether the statistical measure can be used as a supplementary assessment tool to assess the quality of multiple-choice test items. Correlations of the new statistic with other item analysis metrics will be conducted to assess the former\u0026rsquo;s validity and possible complementary role in item quality checks.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eStudy Context\u003c/h2\u003e \u003cp\u003eThe method used in this study was applied to the analysis of items in a statistical test, which measures the student\u0026rsquo;s ability to choose appropriate parametric and nonparametric techniques given the data characteristics or assumptions met. This test was administered in a graduate-level statistics course at one state college in the Bicol region, the Philippines, covering the school years 2021\u0026ndash;2022 to 2022\u0026ndash;2023. The 6-item test comprises only a portion of the final exams after excluding extremely easy and extremely difficult items and other non-MC items.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eCalculation of Item Parameters\u003c/h2\u003e \u003cp\u003eThe calculations of DIFF, DISC, and the new statistic, referred to in this study as \u003cem\u003eM\u003c/em\u003e, were computed using different approaches. DIFF and DISC were estimated using Rasch model analysis in the eRm package in R [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. The \u003cem\u003eM\u003c/em\u003e statistic is given by the formula below, which is based on the calculation of the chi-square statistic using an expected value obtained by dividing the number of incorrect responses to an item by the number of distracters.\u003cdiv id=\"Equa\" class=\"Equation\"\u003e\u003cdiv format=\"TEX\" class=\"mathdisplay\" id=\"FileID_Equa\" name=\"EquationSource\"\u003e\n$$M=\\sum _{j=1}^{d}{z}_{j}=\\sum _{j=1}^{d}\\frac{{({n}_{j}-{e}_{j})}^{2}}{{e}_{j}}$$\u003c/div\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\${n}_{j}=\$\u003c/span\u003e \u003c/span\u003e observed frequency of responses for distracter \u003cem\u003ej\u003c/em\u003e\u003c/p\u003e \u003cp\u003e \u003cspan class=\"InlineEquation\"\u003e \u003cspan class=\"mathinline\"\u003e\${e}_{j}=\\frac{{w}_{i}}{d}=\\frac{{N}_{i}-{c}_{i}}{d}\$\u003c/span\u003e \u003c/span\u003e = expected value for distracter \u003cem\u003ej\u003c/em\u003e\u003c/p\u003e \u003cp\u003e \u003cem\u003ew\u003c/em\u003e \u003csub\u003e \u003cem\u003ei\u003c/em\u003e \u003c/sub\u003e = number of wrong responses for item \u003cem\u003ei\u003c/em\u003e\u003c/p\u003e \u003cp\u003e \u003cem\u003eN\u003c/em\u003e \u003csub\u003e \u003cem\u003ei\u003c/em\u003e \u003c/sub\u003e = number of test takers for item \u003cem\u003ei\u003c/em\u003e\u003c/p\u003e \u003cp\u003e \u003cem\u003ec\u003c/em\u003e \u003csub\u003e \u003cem\u003ei\u003c/em\u003e \u003c/sub\u003e = number of correct answers for item \u003cem\u003ei\u003c/em\u003e\u003c/p\u003e \u003cp\u003eThe degrees of freedom are \u003cem\u003ed\u003c/em\u003e or the number of distracters.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eData Analysis\u003c/h2\u003e \u003cp\u003eThe obtained values of DIFF, DISC, and M were correlated using Pearson product-moment correlation to determine whether a linear relationship existed. Further modeling using polynomial regression was conducted to obtain a better fit of the model to the data. The analyses were all performed in R and its IDE RStudio. All levels of significance were set at 5%.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results and Discussion","content":"\u003cp\u003eUsing the frequency of correct responses, the M statistic can be derived by dividing the frequency of incorrect responses by the number of distracters. In Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, sample computations of the expected values are presented. For a number of distracters d = 3, a number of examinees N = 100, and a number of correct responses c = 70, for example, the expected frequency per distracter is 10.0. Given a certain observed frequency, its distance from the corresponding expected value can be computed, which may be a negative or positive distance. Squaring these differences and dividing by the expected value results in a chi-square with 1 degree of freedom. Summing these ratios of squared differences and expectations across the number of distracters \u003cem\u003ed\u003c/em\u003e results in a chi-square test with \u003cem\u003ed\u003c/em\u003e degrees of freedom.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cem\u003eSample expected values of response frequencies for items with d = 3 and N = 100\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"4\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNo. of correct responses (c)\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eExpected value of number of responses for a distracter\u003c/p\u003e \u003cp\u003e[ (N-c)/3 ]\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNo. of correct responses (c)\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eExpected value of number of responses for a distracter\u003c/p\u003e \u003cp\u003e[ (N-c)/3 ]\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e70\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e10.0000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e78\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e7.3333\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e71\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e9.6667\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e79\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e7.0000\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e72\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e9.3333\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e80\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e6.6667\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e73\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e9.0000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e81\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e6.3333\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e74\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e8.6667\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e82\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e6.0000\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e75\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e8.3333\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e83\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5.6667\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e76\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e8.0000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e84\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5.3333\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e77\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e7.6667\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e85\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5.0000\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e \u003cp\u003e\u003c/p\u003e \u003cp\u003eAn example of whether the hypothetical item distracters are equally plausible based on their frequencies is provided in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. For instance, with 25 correct responses (c), the number of incorrect responses totals N – c, which is 100–25 = 75. If Distracters 1, 2, and 3 receive 35, 15, and 25 responses, respectively, and the expected value for each distracter is 8.33, their respective z values are calculated as 4, 4, and 0, summing to 8. The corresponding p-value for this statistic with df = 3 is 0.046, indicating that the observed frequencies significantly differ from the expected frequencies. Therefore, the distracters are not equally plausible.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cem\u003eAn illustrative example of applying the statistic (d = 3, N = 100)\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"10\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003ec\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003en1\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003en2\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003en3\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ez1\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ez2\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003ez3\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003echisq\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003ep value\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eEqually plausible?\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e35\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e4.0000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e4.0000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.0000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e8.0000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.046\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e46\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e11\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e27.3282\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.6205\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e5.2513\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e33.2000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e25\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e51\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e17\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e41.2552\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e9.6302\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.8802\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e51.7656\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e26\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e34\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.8829\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.8829\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e3.5315\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e5.2973\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.151\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e28\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e16.6667\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e28.1667\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e1.5000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e46.3333\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e32\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e38\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2.5931\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e2.5931\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e10.3725\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e15.5588\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e34\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e23\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e23\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.1818\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.0455\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.0455\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.2727\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.965\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e \u003cp\u003e\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section3\"\u003e \u003cdiv class=\"Heading\"\u003eApplication to a Dataset\u003c/div\u003e \u003cp\u003eWe analyzed a dataset consisting of test responses from a graduate-level statistics course with 198 participants. For six items, we recorded the frequency of responses for each option, marking the frequency of correct responses with an asterisk \"*\". Using the \u003cem\u003eM\u003c/em\u003e statistic, we computed the expected values, which are displayed in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. For instance, in Item 1, 104 out of 198 examinees selected the correct answer, while 94 chose incorrect answers (distracters). Based on these data, the expected value was approximately 31.33.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cem\u003eFrequencies (percentages) of response to all options for the 6 test items and the corresponding expected frequency per distracter\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"9\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eItem\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colspan=\"4\" nameend=\"c6\" namest=\"c2\"\u003e \u003cp\u003eOptions\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c8\" namest=\"c7\"\u003e \u003cp\u003eN-c\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" colname=\"c9\"\u003e \u003cp\u003eExpected value\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eA\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eB\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eC\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eD\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e\u0026nbsp;\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e104*(53%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e37(19%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e41(21%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e16(8%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e \u003cp\u003e94\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e \u003cp\u003e31.33\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e37(19%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e103*(52%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e19(10%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e39(20%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e \u003cp\u003e95\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e \u003cp\u003e31.67\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e20(10%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e16(8%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e142*(72%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e20(10%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e \u003cp\u003e56\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e \u003cp\u003e18.67\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e35(18%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e40(20%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e20(10%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e103*(52%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e \u003cp\u003e95\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e \u003cp\u003e31.67\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e11(6%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e11(6%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e135*(68%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e41(21%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e \u003cp\u003e63\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e \u003cp\u003e21.00\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e11(6%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6(3%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e36(18%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e145*(73%)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e \u003cp\u003e53\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colspan=\"2\" nameend=\"c9\" namest=\"c8\"\u003e \u003cp\u003e17.67\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"9\"\u003e* frequency of correct responses (c)\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e \u003cp\u003e\u003c/p\u003e \u003cp\u003eThe DISC and DIFF parameters were estimated using the dichotomous Rasch model to evaluate the psychometric properties of the items. As shown in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, two items (Items 2 and 5) had negative discrimination values, indicating poor quality. There were three easy items (Items 3, 5, and 6 with negative logits) and three difficult items (Items 1, 2, and 4 with positive logits). Further analysis using M revealed that three items had equally plausible distracters (Items 2, 3, 4), while three items had distracters of unequal plausibility (Items 1, 5, 6). Item 5 was flagged as poor quality due to both its nondiscriminative nature and unequally plausible distracters. Overall, only Items 3 and 4 demonstrated good quality based on the assessed properties.\u003c/p\u003e \u003cp\u003eIn this new approach, the detection of implausible distracters is different from the traditional approach popularized by Haladyna and Downing (1993) [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. Although most frequencies exceeded the \u0026gt; 5% criterion for functional distracters, except for Distracter B of Item 6, which had a 3% response, the items were still flagged for being collectively ineffective in attracting less-able test takers. Hence, the new method can complement the existing methodologies in distracter analysis to identify items with dysfunctional distracters for further investigation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cem\u003eRasch-based Item DISC and DIFF estimates and results of equality of distracters’ plausibility tests\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"6\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eItem\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDISC\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDIFF\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eM\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eEqually plausible?\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.244\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.519\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e11.51064\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.009\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.062\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.546\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e7.663158\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.053\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.364\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-0.577\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.571429\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.903\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.150\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.546\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e6.842105\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.077\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eYes\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.103\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-0.358\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e28.57143\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.475\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-0.676\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e29.24528\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eNo\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e \u003cp\u003e\u003c/p\u003e \u003cp\u003eThe flagged items considered for revision are shown in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e. The contents of the 3 problematic items below are given with the recommended revisions at the distracter level. Item 1 was found to have unequally plausible distracters, as shown in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e; hence, distracter D with the least frequency was revised from “Wilcoxon signed-rank test” to “Welch t-test”. The original option D was not attractive, possibly due to its association with paired data analysis. Replacing this with the “Welch t test” may be more effective since the tool is used as an alternative when the assumption for homogeneity of variances is violated.\u003c/p\u003e \n\u003cp\u003e\u003cstrong\u003eTable 5\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eItem content and options of the three items flagged for unequal plausibility and recommended revisions with justifications\u003c/em\u003e\u003c/p\u003e\n\u003ctable style=\"border-collapse: collapse; border: none;\"\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd style=\"width: 333.7pt; border-top: 1pt solid windowtext; border-left: none; border-bottom: 1pt solid windowtext; border-right: none; padding: 0in 5.4pt; vertical-align: top;\"\u003e\n \u003cp style=\"margin: 0in; font-size: 15px; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eItem Content and Options (\u003cspan style=\"color: rgb(192, 0, 0);\"\u003ewith recommended revisions\u003c/span\u003e)\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\u003ctd style=\"width: 134.3pt; border-top: 1pt solid windowtext; border-left: none; border-bottom: 1pt solid windowtext; border-right: none; padding: 0in 5.4pt; vertical-align: top;\"\u003e\n \u003cp style=\"margin: 0in; font-size: 15px; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eReason/s for revision\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd style=\"width: 333.7pt; border: none; padding: 0in 5.4pt; vertical-align: top;\"\u003e\n \u003cp style=\"margin: 0in; font-size: 15px; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eItem 1. The following are the characteristics of data: (1) dependent variable is measured at interval/ratio level; (2) There are two independent categories for the nominal independent variable; (3) The distribution in each group is normal; (4) The variances of the two groups are equal; (5) There are no observed outliers. What statistical tool is the most appropriate to compare the two groups?\u003c/span\u003e\u003c/p\u003e\n \u003cdiv style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\n \u003col start=\"1\" style=\"margin-bottom: 0in; list-style-type: upper-alpha; margin-left: 0.5in;\"\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eIndependent groups t test*\u003c/span\u003e\u003c/li\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eMann‒Whitney U test\u003c/span\u003e\u003c/li\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003ePaired t test\u003c/span\u003e\u003c/li\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif; color: rgb(192, 0, 0);\"\u003eWilcoxon signed-rank test (Replace with Welch t test)\u003c/span\u003e\u003c/li\u003e\n \u003c/ol\u003e\n \u003c/div\u003e\n \u003cp style=\"margin: 0in; font-size: 15px; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003e\u0026nbsp;\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\u003ctd style=\"width: 134.3pt; border: none; padding: 0in 5.4pt; vertical-align: top;\"\u003e\n \u003cp style=\"margin: 0in; font-size: 15px; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eUnequal plausibility of distracters. Distracter D had the lowest frequency. The replacement is assumed to distract more effectively because the tool is used alternatively when t test Assumption (4) is not met.\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd style=\"width: 333.7pt; padding: 0in 5.4pt; vertical-align: top;\"\u003e\n \u003cp style=\"margin: 0in; font-size: 15px; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eItem 5. The following are the characteristics of data: (1) Variables X and Y are measured at interval/ratio level; (2) X and Y are paired; (3) The distribution of the paired data are bivariate normal; (4) There are no observed outliers; (5) There is a linear relationship between X and Y. What statistical tool is the most appropriate to test the hypothesis that there is no linear correlation between X and Y?\u003c/span\u003e\u003c/p\u003e\n \u003cdiv style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\n \u003col start=\"1\" style=\"margin-bottom: 0in; list-style-type: upper-alpha; margin-left: 0.5in;\"\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif; color: rgb(192, 0, 0);\"\u003eAnalysis of variance (Replace with Chi-square test of independence)\u003c/span\u003e\u003c/li\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif; color: rgb(192, 0, 0);\"\u003ePaired t test (Replace with Point-biserial correlation)\u003c/span\u003e\u003c/li\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003ePearson product-moment correlation*\u003c/span\u003e\u003c/li\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eSpearman rank correlation\u003c/span\u003e\u003c/li\u003e\n \u003c/ol\u003e\n \u003c/div\u003e\n \u003cp style=\"margin: 0in; font-size: 15px; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003e\u0026nbsp;\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\u003ctd style=\"width: 134.3pt; padding: 0in 5.4pt; vertical-align: top;\"\u003e\n \u003cp style=\"margin: 0in; font-size: 15px; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eUnequal plausibility of distracters. Distracters A and B had the lowest frequencies maybe because the contents are tests of comparison. The replacements are assumed to distract more effectively because the tools are used alternatively to test relationships between variables.\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd style=\"width: 333.7pt; border-top: none; border-right: none; border-left: none; border-image: initial; border-bottom: 1pt solid windowtext; padding: 0in 5.4pt; vertical-align: top;\"\u003e\n \u003cp style=\"margin: 0in; font-size: 15px; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eItem 6. The following are the characteristics of data: (1) Variables X and Y are measured at interval/ratio level; (2) X and Y are paired; (3) The distribution of the paired data are not normal; (4) There are observed outliers; (5) There is a monotonic relationship between X and Y. What statistical tool is the most appropriate to test the hypothesis that there is no correlation between X and Y?\u003c/span\u003e\u003c/p\u003e\n \u003cdiv style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\n \u003col start=\"1\" style=\"margin-bottom: 0in; list-style-type: upper-alpha; margin-left: 0.5in;\"\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif; color: rgb(192, 0, 0);\"\u003eAnalysis of variance (Replace with Chi-square test of independence)\u003c/span\u003e\u003c/li\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif; color: rgb(192, 0, 0);\"\u003ePaired t test (Replace with Point-biserial correlation)\u003c/span\u003e\u003c/li\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003ePearson product-moment correlation\u003c/span\u003e\u003c/li\u003e\n \u003cli style=\"margin: 0in 0in 8pt; font-size: 11pt; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eSpearman rank correlation*\u003c/span\u003e\u003c/li\u003e\n \u003c/ol\u003e\n \u003c/div\u003e\n \u003cp style=\"margin: 0in; font-size: 15px; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003e\u0026nbsp;\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\u003ctd style=\"width: 134.3pt; border-top: none; border-right: none; border-left: none; border-image: initial; border-bottom: 1pt solid windowtext; padding: 0in 5.4pt; vertical-align: top;\"\u003e\n \u003cp style=\"margin: 0in; font-size: 15px; font-family: Calibri, sans-serif;\"\u003e\u003cspan style=\"font-family: \u0026quot;Times New Roman\u0026quot;, serif;\"\u003eUnequal plausibility of distracters. Distracters A and B had the lowest frequencies maybe because the contents are tests of comparison. The replacements are assumed to distract more effectively because the tools are used alternatively to test relationships between variables.\u003c/span\u003e\u003c/p\u003e\n \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e \u003cp\u003eIn items involving the assumptions of tests of relationships between variables, Items 5 and 6 had unequally plausible distracters. Two of the three distracters were comparison tests (e.g., ANOVA and paired t tests); therefore, they were less appealing or less effective as distracters in a collective manner. This is likely because these types of tests may not be as relevant or plausible within the context of the question (e.g., assumptions of correlation test), making them less likely to be chosen by examinees who do not know the correct answer. Consequently, these distracters fail to effectively challenge test takers and are not as efficient at diverting them from the correct answer. Replacing these with other tests of relationships (e.g., chi-square test of independence and point-biserial correlation) may address this unequal plausibility.\u003c/p\u003e \u003cp\u003e \u003cb\u003eCorrelation of\u003c/b\u003e \u003cb\u003eM\u003c/b\u003e \u003cb\u003ewith DIFF and DISC\u003c/b\u003e\u003c/p\u003e \u003cp\u003eTo investigate potential relationships between DIFF and M, as well as between item DISC and M, we conducted correlation and regression analyses. The results indicated a moderate negative linear correlation between M and DIFF (r = -0.458), although this relationship was not statistically significant (p \u0026gt; 0.05). This suggests that as item difficulty increases, the plausibility of the distracters tends to decrease, but the relationship is not strong enough to be conclusive. Additionally, no significant correlation was found between M and DISC (r = -0.037, p \u0026gt; 0.05), indicating that the discriminative power of an item is not related to the plausibility of its distracters. Finally, there was a nonsignificant negative correlation between DIFF and DISC (r = -0.465, p \u0026gt; 0.05), implying that while there may be a tendency for more difficult items to be less discriminative, this trend is not statistically significant. Overall, the analyses suggest linear independence among these metrics.\u003c/p\u003e \u003cp\u003eDespite the lack of correlations, we noted some polynomial trends in the relationships between \u003cem\u003eM\u003c/em\u003e and DIFF. The scatterplot in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows a rather curvilinear trend such that when the value of the M statistic changes, the value of DIFF follows an inverted U pattern with a maximum a value near M = 10. Therefore, we conducted a polynomial regression to determine if a polynomial function fits the empirical data.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003cdiv class=\"gridtable\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cem\u003eResults of polynomial regression showing significant coefficients for linear and quadratic trends\u003c/em\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003c/colgroup\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCoefficients:\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEstimate\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStd. error\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003et value\u003c/p\u003e \u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ep value\u003c/p\u003e \u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e(Intercept)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.610253\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.187745\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-3.250\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.04747 *\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eM\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.189268\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.033788\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e5.602\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.01124 *\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eI(M^2)\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e-0.006447\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.001007\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-6.400\u003c/p\u003e \u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.00773 **\u003c/p\u003e \u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/table\u003e\u003c/div\u003e \u003cp\u003e\u003c/p\u003e \u003cp\u003eResidual standard error: 0.1791 for 3 degrees of freedom\u003c/p\u003e \u003cp\u003eMultiple R-squared: 0.9461, Adjusted R-squared: 0.9101\u003c/p\u003e \u003cp\u003eF-statistic: 26.31 on 2 and 3 DF, p value: 0.01253\u003c/p\u003e \u003cp\u003eIn Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, polynomial regression showed a multiple R-squared of 0.9461, which indicates that approximately 94.61% of the variance in DIFF is explained by the model, showing a high degree of fit. The adjusted R-squared equals 0.9101 after adjusting the R-squared value for the number of predictors in the model, still indicating a strong fit. The overall model significance test suggested that the model significantly predicted DIFF [F(2,3) = 26.31, p = 0.01253].\u003c/p\u003e \u003cp\u003eThe polynomial regression results reveal a significant quadratic relationship between DIFF and M. The significant negative coefficient for \u003cem\u003eM\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e suggests a parabolic curve where DIFF initially increases with M but starts to decrease as M continues to increase. The high R-squared and adjusted R-squared values indicate that the model explains a large portion of the variability in DIFF. Given the significant p values for all the coefficients, we can infer that the equal plausibility metric (M) has a meaningful and complex impact on item difficulty (DIFF). This model can be useful for understanding how changes in M affect DIFF and can inform the design and evaluation of test items to achieve desired levels of difficulty.\u003c/p\u003e \u003c/div\u003e "},{"header":"Conclusion and Recommendations","content":"\u003cp\u003eThis study introduced a chi-square statistic, M, designed to detect significant deviations from the expected frequencies of distracters, known as the equal plausibility of distracters. This novel item analysis metric fills a gap by evaluating the collective functionality of distracters and serves as a basis for identifying items with dysfunctional distracters. The statistic was empirically tested using response data from a statistics test and was found to effectively detect items with implausible distracters. Furthermore, the new metric showed a quadratic relationship with item difficulty, suggesting an optimal difficulty level within a specific range of M values.\u003c/p\u003e\u003cp\u003eSeveral limitations were noted in this study. First, the dataset included only 6 items, which is a very small sample size for item analysis, potentially affecting the observed relationships. Second, DIFF and DISC estimates were obtained using the Rasch model, which differs from classical test theory estimates. No assumption tests were conducted to confirm whether the items met the Rasch model's expectations, potentially invalidating the derived estimates. Future research is encouraged to verify these results and address the limitations identified in this study.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study did not receive funding from any granting agencies.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHuman Ethics and Consent to Participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study utilized a test response data set to evaluate the applicability of the developed statistical method. No personal or identifiable information was used, ensuring that ethical standards were maintained. As such, formal consent from participants was not required for this analysis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for Publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting Interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe author declares no competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe data used in this study will be available upon request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contribution\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe author solely conceptualized the study, conducted the data analysis, and authored the manuscript. All aspects of the research, from the initial idea through to the final write-up, were independently carried out by the author.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAbdulghani, H. M., Irshad, M., Haque, S., Ahmad, T., Sattar, K., \u0026amp; Khalil, M. S. (2017). Effectiveness of longitudinal faculty development programs on MCQs items writing skills: A follow-up study. \u003cem\u003ePloS One, 12\u003c/em\u003e(10), e0185895.\u003c/li\u003e\n\u003cli\u003eWood, E., Klausz, N., \u0026amp; MacNeil, S. (2022). Examining the influence of multiple-choice test formats on student performance. \u003cem\u003eInnovative Higher Education, 47\u003c/em\u003e, 515\u0026ndash;531. https://doi.org/10.1007/s10755-021-09581-7\u003c/li\u003e\n\u003cli\u003eXu, X., Kauer, S., \u0026amp; Tupy, S. (2016). Multiple-choice questions: Tips for optimizing assessment in-seat and online. \u003cem\u003eScholarship of Teaching and Learning in Psychology\u003c/em\u003e, \u003cem\u003e2\u003c/em\u003e(2), 147.\u003c/li\u003e\n\u003cli\u003ePalmer, E. J., \u0026amp; Devitt, P. G. (2007). Assessment of higher order cognitive skills in undergraduate education: modified essay or multiple-choice questions? Research paper. \u003cem\u003eBMC Medical Education, 7\u003c/em\u003e, 1-7.\u003c/li\u003e\n\u003cli\u003eMitra, A. K. (2022). The Art of Designing a Quality Multiple Choice Question in Chemistry. \u003cem\u003eResonance\u003c/em\u003e, \u003cem\u003e27\u003c/em\u003e(6), 1017-1031.\u003c/li\u003e\n\u003cli\u003eElgadal, A. H., \u0026amp; Mariod, A. A. (2021). Item analysis of multiple-choice questions (MCQs): assessment tool for quality assurance measures. \u003cem\u003eSudan Journal of Medical Sciences, 16\u003c/em\u003e(3), 334-346.\u003c/li\u003e\n\u003cli\u003eDiBattista, D., \u0026amp; Kurzawa, L. (2011). Examination of the quality of multiple-choice items on classroom tests. \u003cem\u003eCanadian Journal for the Scholarship of Teaching and Learning, 2\u003c/em\u003e(2), 4.\u003c/li\u003e\n\u003cli\u003eHaladyna, T. M., \u0026amp; Downing, S. M. (1993). How many options is enough for a multiple-choice test item?. \u003cem\u003eEducational and Psychological Measurement, 53\u003c/em\u003e(4), 999-1010.\u003c/li\u003e\n\u003cli\u003eGierl, M. J., Bulut, O., Guo, Q., \u0026amp; Zhang, X. (2017). Developing, analyzing, and using distracters for multiple-choice tests in education: A comprehensive review. \u003cem\u003eReview of Educational Research, 87\u003c/em\u003e(6), 1082-1116.\u003c/li\u003e\n\u003cli\u003eRezigalla, A. A., Eleragi, A. M. E. S. A., Elhussein, A. B., Alfaifi, J., ALGhamdi, M. A., Al Ameer, A. Y., ... \u0026amp; Adam, M. I. E. (2024). Item analysis: the impact of distracter efficiency on the difficulty index and discrimination power of multiple-choice items. \u003cem\u003eBMC Medical Education, 24\u003c/em\u003e(1), 445.\u003c/li\u003e\n\u003cli\u003eTesta, S., Toscano, A., \u0026amp; Rosato, R. (2018). Distracter efficiency in an item pool for a statistics classroom exam: Assessing its relation with item cognitive level classified according to Bloom\u0026rsquo;s taxonomy. \u003cem\u003eFrontiers in Psychology, 9\u003c/em\u003e, 357601.\u003c/li\u003e\n\u003cli\u003ePuthiaparampil, T., \u0026amp; Rahman, M. (2021). How important is distracter efficiency for grading Best Answer Questions?. \u003cem\u003eBMC Medical Education, 21\u003c/em\u003e, 1-6.\u003c/li\u003e\n\u003cli\u003eMair, P., \u0026amp; Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. \u003cem\u003eJournal of Statistical Software\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e, 1-20.\u003c/li\u003e\n\u003cli\u003eR Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ \u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-medical-education","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"meed","sideBox":"Learn more about [BMC Medical Education](http://bmcmededuc.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/meed/default.aspx","title":"BMC Medical Education","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"distracter analysis, chi-square test, item analysis, plausibility","lastPublishedDoi":"10.21203/rs.3.rs-4441034/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4441034/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis study introduces a new chi-square test statistic for testing the equality of response frequencies among distracters in multiple-choice tests. The formula uses the information from the number of correct answers and wrong answers, which becomes the basis of calculating the expected values of response frequencies per distracter. The method was applied to a statistics test response data and found to effectively detect unequally plausible distracters. Furthermore, the statistic had a quadratic relationship with item difficulty, indicating that at a certain range of plausibility values, there is an optimal item difficulty.\u003c/p\u003e","manuscriptTitle":"A Chi-square Statistic for Testing the Equality of Distracters’ Plausibility in Multiple-Choice Test Items","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-06-07 05:59:19","doi":"10.21203/rs.3.rs-4441034/v1","editorialEvents":[{"type":"communityComments","content":1},{"type":"editorAssigned","content":"","date":"2024-06-07T19:35:57+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-05-27T05:05:11+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Education","date":"2024-05-18T12:26:59+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-medical-education","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"meed","sideBox":"Learn more about [BMC Medical Education](http://bmcmededuc.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/meed/default.aspx","title":"BMC Medical Education","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"271c90ed-2cca-4713-8ac2-7f811ec445db","owner":[],"postedDate":"June 7th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2024-06-07T05:59:19+00:00","versionOfRecord":[],"versionCreatedAt":"2024-06-07 05:59:19","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4441034","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4441034","identity":"rs-4441034","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00