A Method for Identifying Predatory Journals Driven by Large Language Models

preprint OA: closed
Full text JSON View at publisher
Full text 162,240 characters · extracted from preprint-html · click to expand
A Method for Identifying Predatory Journals Driven by Large Language Models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Method for Identifying Predatory Journals Driven by Large Language Models Fanrui Zhang, Ming Chen This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9029371/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This study investigates whether the method of fine-tuning language models can be applied to the task of predatory journal identification and seeks to identify the optimal fine-tuning strategy feasible in similar practical environments. This study employs the Low-Rank Adaptation (LoRA) method to perform instruction-supervised fine-tuning on open-source distilled models (such as DeepSeek-R1-Distill-Qwen-1.5B) based on different strategies. Additionally, three machine learning algorithms and a general large language model API call solution were introduced to compare the performance differences between fine-tuned models, traditional classifiers, and non-fine-tuned large models. The results indicate that a 1.5B model fine-tuned with 398 structured samples surpassed the performance of non-fine-tuned general large models in the specific task, achieving an accuracy of 76%. A 7B model fine-tuned using the same strategy achieved an accuracy of 92%. The comparison revealed that fine-tuning can enhance the performance of distilled models in executing domain-specific tasks, and an increase in the parameter scale of the baseline model can significantly improve the performance of its fine-tuned version in the specific task. Predatory Journals Large Language Models Machine Learning Open Access Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Open Access (OA) is a publishing model promoted by the international scholarly community since the late 1990s, aiming to address challenges in academic publishing arising from the rapid development of the Internet, to facilitate scholarly communication, and to enhance the societal impact of academic research (Leslie et al., 2002 ). A key outcome of the OA movement is the emergence of OA journals. These journals are typically characterized by author-funded publication models in which authors retain copyright, while published articles are made freely accessible for reading and downloading (Beall, 2013 ). However, some journals exploit the features of the OA model by disregarding academic standards in pursuit of unreasonable profits; such journals are commonly referred to as “predatory journals” (Sumayyia et al., 2023). With the continued expansion of the OA movement, predatory journals have become a global and interdisciplinary concern, attracting increasing attention from the academic community (Shamseer et al., 2017 ; Siler et al., 2021 ). Both academic and professional communities have explored various approaches to predatory journal identification, ranging from early conceptual discussions to feature-based analysis and quantitative identification methods. Nevertheless, even regularly updated predatory journal lists continue to face limitations, including delays in updates and the risk of false negatives. In recent years, generative artificial intelligence has emerged as a prominent research focus, with growing interest in its potential applications across scholarly activities. Wang et al. ( 2023 ) argued that AI-supported "AI for Science" has already influenced the knowledge production processes of many researchers. It is anticipated that generative AI will become closely integrated into the lifecycle of scientific research, particularly in processes such as the creation and dissemination of research outputs. Existing studies (van Dis et al., 2023 ; Hosseini et al., 2025 ) suggest that generative AI can be used to detect predatory publishing and predatory journals. However, these advancements are also accompanied by concerns about their potential to exacerbate the proliferation of low-quality content in academia. In this context, there is a pressing need to explore the application of generative AI in journal evaluation and academic quality assessment. In particular, investigating the feasibility of generative AI for predatory journal identification and developing AI-driven identification methods with improved effectiveness and efficiency have become increasingly important. This study will explore whether the AI -driven method can be applied to the task of identifying predatory journals. This study will apply the distillation model of generative large language models (LLMs)as a classifier to the task of predatory journal identification, perform instruction-supervised fine-tuning on the baseline distillation model based on LoRA, and test whether the fine-tuned model can be competent for the specific task. The subsequent research questions are formulated to direct the inquiry: RQ1: Can the method of fine-tuning language models be applied to the task of identifying predatory journals? RQ2: Which fine-tuning method is the optimal one? RQ3: How do fine-tuning sample size and prompt strategies influence fine-tuning performance? Review Generative Large Language Models and Their Applications In 2018, Google released GPT-1, a pre-trained generative model based on the Transformer architecture. Since then, generative large language models (Generative LLMs) have gained prominence across various academic fields (Radford et al., 2018). Researchers (Zhyar et al., 2023; Kang et al., 2024; Zhang et al., 2024 ; Huang et al., 2024 ) have widely employed generative LLMs to perform tasks related to natural language processing, exploring their practical effectiveness in areas such as text summarization, recommendation systems, classification tasks, and knowledge processing, thereby unlocking their potential for broader applications. Huang et al. ( 2024 ) combined chain-of-thought techniques with the large model GPT-4, using the annotated data generated by GPT-4 to pre-train a smaller model, BERT. They evaluated the performance of the model, which underwent two-stage training, on named entity recognition tasks and observed a significant improvement. Wang et al. (2024) discussed AI-generated content (AIGC) detection methods such as white-box detection, zero-shot detection, and fine-tuning. Qiu et al.(2024) developed a method for detecting obscene text in open-domain dialogue systems based on large language model knowledge distillation, and constructed the first dataset, CENSORCHAT, designed specifically for detecting obscene text in human-machine dialogue scenarios. Dorfner et al.(2025) constructed a specialized dataset to evaluate the performance of fine-tuned models in biomedical knowledge tasks such as case report generation and diagnostic reasoning. Their study indicated that models fine-tuned with domain-specific knowledge may not outperform general models. This conclusion challenges the assumption that fine-tuning inherently improves LLM performance on domain-specific tasks and highlights the issue of "catastrophic forgetting," where overfitting to fine-tuning data leads to errors in general knowledge comprehension. Anisuzzaman et al.(2025) provided practical examples of fine-tuned models in the medical field, detailing the fine-tuning methods used and summarizing the advantages and disadvantages of this approach. Recent research in the application of generative LLMs has primarily focused on investigating the effectiveness of fine-tuned models for specific domain tasks. While many studies have demonstrated that fine-tuning can enhance the performance of baseline models across various tasks, the approach remains a subject of debate. Fine-tuning large models requires performance validation to mitigate the risk of "catastrophic forgetting," and the costs associated with model deployment and computation also need to be considered. AI-driven Quantitative Identification of Predatory Journals With the deepening of research into the characteristics of predatory journals and the advancement of artificial intelligence technologies, academic research on the quantitative identification of predatory journals has seen numerous methodological innovations. The academic research on the quantitative identification of predatory journals has made more methodological innovations based on the design of quantitative indicators for identification and the construction of mathematical models for identification. Albana ( 2020 ) introduced a question-based evaluation algorithm for predatory journals, providing researchers with a new approach to identifying such journals. Dadkhah et al. ( 2022 ) proposed the concept of "Jourchain," leveraging blockchain technology to create a semi-private journal blockchain. By continuously expanding the blockchain records, this system improves the efficiency of identifying suspicious journals. Chen Li-Xian et al. (2023) utilized machine learning algorithms to analyze textual information from predatory journal websites, ultimately developing a practical system known as "AJPC." Al-Moghrabi et al. ( 2024 ) combined machine learning methods with ChatGPT models to generate evaluations based on journal names. The evaluations were then standardized into 20 positive and negative indicators, which were analyzed to identify the key factors affecting the model’s accuracy in distinguishing predatory journals. Their findings indicated that ChatGPT can accurately identify predatory journals based on the provided journal names. In summary, with advancements in quantitative research technologies and shifting academic priorities, the methods for identifying predatory journals have evolved, incorporating techniques such as basic mathematical models, machine learning algorithms, blockchain technology, and large language models, leading to promising results. However, research on AI-driven identification methods remains limited, and no studies have yet developed a classification system for predatory journal identification using model fine-tuning. Therefore, this study aims to employ a distilled generative LLM model as a classifier for the predatory journal identification task. We apply LoRA-based instruction-supervised fine-tuning to the baseline distilled model and assess whether the fine-tuned model can effectively handle the task. Additionally, to explore the most efficient fine-tuning strategy under the current research conditions, we designed multiple comparative experiments using different fine-tuning strategies and baseline models. Given that traditional classifiers such as machine learning models have already achieved good results in predatory journal classification tasks, we also conducted comparative experiments to evaluate the performance of the fine-tuned model against machine learning models and general-purpose large language models. The final fine-tuned model developed was designed to be easily deployable, providing an efficient solution for predatory journal identification and offering a new approach to the field. Methods As discussed above, existing study (Kang et al., 2024) have demonstrated the feasibility of using distilled models as classification systems. In addition, fine-tuning large language models such as GPT-3 for specific downstream tasks has been shown to significantly improve baseline model performance. Zhyar et al. (2023) demonstrated that fine-tuning pretrained transformers for text classification tasks resulted in substantial gains in accuracy, precision, recall, and F1-score, outperforming traditional machine learning models like SVM and CNN. Their studies revealed that fine-tuned models consistently achieved better classification results, especially when trained with larger datasets and optimized hyperparameters, thus highlighting the effectiveness of fine-tuning in enhancing task-specific model performance. Building on this evidence, this study employs distilled models for data-driven reasoning and examines whether their performance exceeds that of baseline models. Baseline Distilled Model Selection Generative large language models (LLMs) represent a major development in the field of artificial intelligence. Their core functionality lies in predicting and generating text sequences based on statistical regularities, thereby enabling the comprehension and generation of natural language (Lin et al., 2021 ). LLMs have been widely applied across a range of domains and tasks, including text generation and mathematical reasoning (Sakhi, 2023 ; Zhang et al., 2024 ; Huang et al., 2024 ). The generative models examined in this study are the DeepSeek-R1 model and its distilled variants. DeepSeek-R1 is trained using large-scale reinforcement learning, and its reasoning process incorporates iterative reflection and verification, endowing it with strong logical reasoning capabilities in tasks such as mathematical validation and code generation. The distilled versions of DeepSeek-R1 are small, dense models derived through knowledge distillation. These models are based on the open-source Qwen2.5 architecture and fine-tuned using samples generated by the DeepSeek-R1 model. Compared with the teacher model, the distilled models retain essential reasoning abilities while offering advantages in terms of reduced model size and lower computational cost (DEEPSEEK-AI, 2025 ). Considering the experimental environment and available hardware resources, this study selects DeepSeek-R1-Distill-Qwen-1.5B and DeepSeek-R1-Distill-Qwen-7B as baseline models. During the experimental phase, the performance of these fine-tuned distilled models is compared with that of the DeepSeek-R1 model in the predatory journal identification task. And Python is used to invoke the DeepSeek-R1 model via API for task execution. Fine-tuning Method Selection Hallucination is a well-recognized phenomenon in the application of generative large language models (LLMs). To reduce its occurrence, prior studies have proposed several effective approaches, including prompt optimization, knowledge augmentation, and model fine-tuning (Tonmoy et al., 2024 ). Existing research on large-model adaptation has explored various fine-tuning strategies, such as full-parameter fine-tuning, parameter-efficient fine-tuning, reinforcement learning, and contrastive learning. Anisuzzaman (2025) have shown that parameter-efficient fine-tuning methods, exemplified by Low-Rank Adaptation (LoRA), modify only a limited number of parameters while substantially reducing computational and storage costs, without compromising the core representational capacity of the base model. In addition, instruction-supervised fine-tuning has been proposed as an effective strategy to enhance a model’s ability to interpret and follow task-specific instructions (Aviv et al., 2025). In light of these findings, this study adopts an instruction-supervised fine-tuning paradigm implemented through a parameter-efficient approach. Specifically, LoRA (Hu et al., 2021 ) is employed to introduce low-rank adaptation matrices into the baseline model. Supervised fine-tuning is then conducted using carefully constructed instruction-based structured data, during which only the parameters associated with the low-rank matrices are updated. This approach enables the model to adapt to domain-specific classification tasks with efficiency comparable to full-parameter fine-tuning, while maintaining relatively low computational and deployment costs. Experimental Design First, an automated program was employed to collect bibliometric indicators for known predatory journals and legitimate open access journals, thereby constructing an initial dataset. Second, a binary label, Predatory, was defined, with predatory journals assigned a value of “1” and legitimate OA journals assigned a value of “0.” Based on this label, equal numbers of journals from each category were randomly sampled and divided into training and test sets. These datasets were used for fine-tuning dataset construction, fine-tuned model evaluation, and the training and evaluation of machine learning models. During instruction-supervised fine-tuning, the outputs of the distilled models were constrained to either “0” or “1” to achieve a classification objective. Third, model predictions on the test set were compared with the original Predatory labels to compute accuracy, recall, precision, and F1-score. To reduce randomness in the experimental results, each evaluation was repeated three times, and the mean values of all metrics were reported. Finally, performance differences among models fine-tuned under different strategies were analyzed, and comparisons were conducted between traditional machine learning models, baseline models, and fine-tuned models to identify the optimal configuration and fine-tuning strategy for predatory journal classification. All experiments were conducted in a Python 3.11 environment with CUDA 12.9 and PyTorch 2.6, running on a Windows-based server equipped with an Intel Core i7-12800HX CPU and an NVIDIA GeForce RTX 4070 GPU. Given the available hardware resources, this study selected the knowledge-distilled small dense models DeepSeek-R1-Distill-Qwen-1.5B (denoted as L-1) and DeepSeek-R1-Distill-Qwen-7B (denoted as L-7B) as baseline models for empirical evaluation. The Llama-Factory framework (Zheng et al., 2024 ) was used for fine-tuning and performance assessment. To ensure reproducibility and operational efficiency, Python scripts were developed to automate data collection, pre-processing, dataset construction, and model evaluation. Data Collection and Preprocessing Predatory journals typically lack rigorous peer-review processes, focusing more on economic gain than the academic quality of papers (Shamseer et al., 2017 ). Bibliometric indicators that measure a journal's academic influence can quantify its academic level. Therefore, in existing predatory journal identification research, bibliometric indicators are often used as the subject of study. Considering the hardware requirements for instruction fine-tuning and the quality of the fine-tuning dataset, this study chose to collect five quantitative indicators for journals: Impact Factor Without Self-Citations, Cited Half-Life, Article Influence Score, Immediacy Index, and CiteScore. The performance of predatory journals in terms of academic influence measured by these indicators is believed to differ from that of legitimate open access journals (Wu et al., 2024 ). Therefore, this study uses these indicators as the basic basis for identifying predatory journals. The predatory journals in this study are sourced from Beall's list of predatory journals (Butler, 2013). Legitimate open access journals were randomly selected from the Directory of Open Access Journals (DOAJ) database (DOAJ, 2025 ). Indicators for both types of journals were collected from the Journal Citation Reports (JCR) database, Web of Science (WoS), and Scopus(Wu et al., 2024 ). After data collection and cleaning, this study ultimately constructed a total dataset containing 498 journal data samples, with 249 samples each from predatory journals and legitimate OA journals. The "Predatory" label for predatory journal samples was assigned “1”, while legitimate OA journals were assigned “0”. Prompt Engineering and Fine-tuning Dataset Research by Luo et al. ( 2025 ) demonstrated that prompts have a significant impact on a large language model’s understanding of a given task. In addition, the Llama-Factory framework requires instruction-supervised fine-tuning datasets to be provided as structured data in JSON format, with each data instance adhering to a uniform key–value schema, namely a JSON triple consisting of ”instruction”, “input”, and “output”. Accordingly, this study designed a dedicated instruction to serve as the basic prompt for instruction-based fine-tuning. An example of the JSON-formatted data array is shown in Fig. 1 . To enable the model to acquire the necessary background knowledge about predatory journals, this study calculated the average values of each indicator for both predatory journal samples and legitimate OA journal samples, and incorporated these averages into the instruction as a reference for the model to assess the relative magnitude of indicator values. Existing research (Feroze et al., 2024) noted that models with fewer parameters may lack the fundamental cognitive capabilities possessed by large-parameter models. Accordingly, to examine the impact of background knowledge on model performance in classification tasks, this study designed an alternative instruction that excluded the average values of each indicator; an example is shown in Fig. 2 . In addition, to investigate the effect of including example question–answer pairs in the instruction on classification performance, a 1-shot prompt was designed, as illustrated in Fig. 3 . Following the Llama-Factory framework specifications, this study used the training set to construct the instruction fine-tuning dataset. From the 498 samples, 100 were randomly selected to form the test set, while the remaining 398 formed the training set. To explore the impact of different sample sizes on the classification effectiveness of fine-tuned models, this study conducted random sampling in equal proportions from predatory journal samples and legitimate OA journal samples and constructed structured datasets containing 100 samples (50 samples per journal type) and 200 samples (proportions analogous to the 100-sample set). Based on different instruction conditions, this study constructed five JSON format fine-tuning data training sets, namely: ① Dataset A , consisting of 398 samples containing the basic prompt; ② Dataset B , consisting of 200 samples containing the basic prompt; ③ Dataset C , consisting of 100 samples containing the basic prompt; ④ Dataset D , consisting of 398 samples with background knowledge prompts removed; ⑤ Dataset E , consisting of 398 samples containing the 1-shot prompt. Based on the above data and models, this study designed six experiments: ① Fine-tune baseline model L-1 using dataset A to obtain model L-2, and compare their performances; ② Fine-tune baseline model L-7B using dataset A to obtain L-3 and test its performance; ③ Fine-tune model L-1 using datasets B and C respectively to obtain models L-4 and L-5, and compare their performances; ④ Fine-tune model L-7B using datasets D and E respectively to obtain models L-6 and L-7, and compare their performance differences with model L-3; ⑤ Train three machine learning models using the 398-sample training set and observe their classification effectiveness using the test set; ⑥ Use Python to implement API calls to the general large model DeepSeek-R1, perform classification tasks on the test set, and compare its effectiveness with fine-tuned models. The process for constructing the generative large language model fine-tuning dataset and model fine-tuning in this study is shown in Fig. 4 . Results Impact of Instruction Fine-tuning on Classification Effectiveness Fine-tuning baseline models using high-quality datasets can significantly optimize the performance of large language models on specific tasks (Zhang et al., 2024 ). In this study, the prompts and input data from the test set were sequentially provided to the baseline model L-1. The model's output, either "1" or "0," was then recorded. Precision, recall, and F1-score were calculated based on the original "Predatory" label attribute ("1" or "0") for each journal, allowing for an evaluation of L-1's classification effectiveness. Simultaneously, LoRA technology was employed for instruction-supervised fine-tuning of L-1. The fine-tuning dataset, referred to as dataset A in Section 2.3.2, was used with the following relevant parameters for instruction-supervised fine-tuning: quantization level 4, quantization method (bitsandbytes), learning rate \(\text{5e-5}\) , maximum gradient norm 1.0, and 3 training epochs. After fine-tuning, the process of inputting the test set data and calculating corresponding metrics was repeated. The parameters for evaluating the performance of the fine-tuned model included: a temperature coefficient of 0.1 and a Top-p sampling value of 0.9. The results from the identification task on the test set are summarized in Table 1 . It is evident that model L-2, after instruction-supervised fine-tuning, achieved an accuracy of 76% in the predatory journal identification task, demonstrating a marked improvement over the non-fine-tuned baseline model. These experimental results indicate that distilled models fine-tuned through instruction-supervised fine-tuning, using a sufficiently large and high-quality dataset, outperform baseline models of the same parameter size in specific tasks. Even smaller-parameter dense models, after acquiring the necessary background knowledge, exhibit logical reasoning capabilities adequate to support binary classification tasks with fewer variables. Table 1 Performance Comparison of Large Language Models Based on Instruction Supervised Fine-tuning Model type A P R F L-1 * 0 61% 0.6078 0.6200 0.6139 1 0.6122 0.6000 0.6061 Avg 0.6100 0.6100 0.6100 L-2 * 0 76% 0.6912 0.9400 0.7966 1 0.9062 0.5800 0.7073 Avg 0.7987 0.7600 0.7520 Note: L-1: DeepSeek-R1-Distill-Qwen-1.5B; L-2: DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 398 samples. Impact of Parameter Scale on Classification Effectiveness To explore the impact of model parameter scale on the performance of fine-tuned versions in classification tasks, this study conducted a comparative experiment between the fine-tuned DeepSeek-R1-Distill-Qwen-1.5B model and the fine-tuned DeepSeek-R1-Distill-Qwen-7B model. The baseline model L-7B was fine-tuned using dataset A through instruction-supervised fine-tuning to obtain the fine-tuned model L-3. Its classification effectiveness was then tested using the same test set. The fine-tuning parameters for this process were consistent with those in the previous study, and the experimental results are shown in Table 2 . Model L-3, which has a larger parameter scale than L-2, outperformed it in the same task, achieving an accuracy of 92%. In the test set classification task, all legitimate OA journals were correctly identified (no false negatives), while predatory journals still had some instances of missed detection (false negatives). This may be due to the characteristics of the test set data, as not all predatory journals perform below average on all five bibliometric indicators, and some individual journals may show significantly higher performance on certain indicators compared to others of their type. Alternatively, it may be due to the inherent limitations in the mathematical and logical reasoning capabilities of distilled models for specific tasks. Table 2 Performance Comparison of Models Based on Different Parameter Scales Model type A P R F L-3* 0 92% 0.8621 1.0000 0.9259 1 1.0000 0.8400 0.9310 Avg 0.9310 0.9259 0.9195 Note: L-3: DeepSeek-R1-Distill-Qwen-7B model fine-tuned with 398 samples. Impact of Sample Size on Classification Effectiveness Existing research (Majdik et al., 2024 ) has demonstrated that instruction supervised fine-tuning based on datasets with different sample sizes may yield varying optimization effects on baseline model performance. Based on this, this study designed a performance comparison study for fine-tuned models based on the same baseline model but fine-tuned with datasets of different sample sizes. First, baseline model L-1 was fine-tuned using instruction supervised fine-tuning with dataset C and dataset D (constructed in section 2.3.2) to obtain models L-4 and L-5, respectively. Then, the performance of these two fine-tuned models on the test set was examined. Fine-tuning parameters for this part were the same as above; specific research results are shown in Table 3 . The study found that after fine-tuning L-1 with dataset C , its performance in the classification task was even worse than L-1, with its identification of predatory journals showing characteristics of random events. This might be due to overfitting caused by fine-tuning with very few samples, exacerbating "hallucination" in the small dense model due to increased noise in the fine-tuning data. (Dorfner et al., 2025 ) also found in their research that small models have inherently limited pre-trained knowledge. If fine-tuning forcibly adjusts parameters with a small number of samples, it may disrupt existing correct knowledge, i.e., cause "catastrophic forgetting." Therefore, performance degradation of the model after fine-tuning with few samples can occur. Table 3 Performance Comparison of Models Based on Different Fine-tuning Sample Sizes Model type A P R F L-4* 0 49% 0.4909 0.5400 0.5143 1 0.4889 0.4400 0.4632 Avg 0.4899 0.4900 0.4887 L-5* 0 45% 0.4706 0.8000 0.5926 1 0.3333 0.1000 0.1538 Avg 0.4020 0.4500 0.3732 Note: L-4: DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 100 samples; L-5: DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 200 samples. Similarly, the model obtained after fine-tuning with dataset B also showed unstable performance in the same task. Furthermore, when model L-4 executed the test set data, its output did not follow the instruction "do not output the reasoning process, answer directly" for 100% of the total samples. For model L-5, this proportion was only 6% during task execution, while for model L-2 it was 0%. This indicates that with extremely small sample sizes, increasing the fine-tuning data sample size through instruction supervised fine-tuning may not effectively improve the model's overall performance, but can optimize its output format based on structured fine-tuning data. This part of the experiment also shows that the 398-sample fine-tuning dataset A used in this study could significantly improve the baseline model's performance on the specific task. Impact of Prompts on Classification Effectiveness Existing research has demonstrated that prompt engineering can significantly reduce the likelihood of generative LLMs producing erroneous information (hallucination)(Pagnoni et al., 2021 ). To explore the impact of prompts during instruction-supervised fine-tuning, this study was designed to compare the fine-tuning effects using different prompts, while keeping the baseline model and dataset consistent. Two different prompt instructions were designed, as described in Section 2.3.2. One was a 0-shot prompt with the related background knowledge removed, while the other was a 1-shot prompt with an added question-answer example to the original prompt. Models L-6 and L-7 were obtained by fine-tuning model L-7B with dataset D (instructions without background knowledge) and dataset E (instructions with one example), respectively, and their performance was tested. The baseline model L-7B used in this part of the study was DeepSeek-R1-Distill-Qwen-7B. The fine-tuning parameters were consistent with those in previous experiments, and the research results are presented in Table 4 . Compared to the original fine-tuning dataset (dataset A ), the performance of the model fine-tuned with dataset D (without background knowledge) decreased significantly, with an accuracy of only 63%, which was lower than the performance of model L-2 under the original prompt condition. Other metrics also showed varying degrees of decline. Under the 1-shot prompt condition, the fine-tuned model’s performance on the predatory journal identification task did not show significant improvement compared to the original prompt condition, achieving an accuracy of 88%, with other metrics slightly worse than the model's performance under the original prompt condition. By analyzing the characteristics of the predatory journal identification task and the impact of prompt engineering on generative large language model outputs, this study suggests that for generative LLMs, the predatory journal identification task is akin to mathematical logical reasoning based on background knowledge. A single example in the prompt condition does not provide sufficient background knowledge or enhance the model's "understanding" of the task. Moreover, the effectiveness of prompts is inherently limited by the capabilities of the large models themselves. While distilled models with smaller parameters possess basic logical reasoning abilities, they still cannot match the performance of commercial large models. Based on the results of this section, this study concludes that in classification tasks involving mathematical logical reasoning, distilled models require background knowledge to reduce "hallucinations." Additionally, structured output examples may be more effective in standardizing the model’s output format. Table 4 Performance Comparison of Large Language Models Based on Different Prompt Conditions Model type A P R F L-6* 0 63% 0.6032 0.7600 0.6726 1 0.6757 0.5000 0.5747 Avg 0.6394 0.6300 0.6236 L-7* 0 88% 0.8167 0.9800 0.8909 1 0.9750 0.7800 0.8667 Avg 0.8958 0.8800 0.8788 Note: L-6: DeepSeek-R1-Distill-Qwen-7B model fine-tuned with samples without background knowledge; L-7: DeepSeek-R1-Distill-Qwen-7B model fine-tuned with 1-shot prompt samples. Fine-tuned Models Compared to Traditional Classifiers Existing research has employed machine learning algorithms with quantitative bibliometric indicators to identify predatory journals, achieving good performance. (Albana, 2020 ; Wu, 2023) To determine whether fine-tuned models outperform traditional classifiers on the same task, this study trained and evaluated three classic machine learning models—Gaussian Naïve Bayes (GaussianNB), Random Forest (RandomForestClassifier), and Support Vector Machine (SVM)—using the training and test set data. Among these, the Gaussian Naïve Bayes model is a generative model based on Bayes' theorem, with hyperparameters set to their default values. The Random Forest algorithm, a classic ensemble learning algorithm, was configured with 400 decision trees to balance model performance and computational cost, considering the sample size of the training set in this study. The Support Vector Machine algorithm, a standard supervised learning model, used default hyperparameters for classification tasks, with the Radial Basis Function (RBF) as the kernel function and regularization strength set to the default value of 1.0. The accuracy and other performance metrics of these three machine learning models in the predatory journal identification task are presented in Table 5 , and the ROC curve is shown in Fig. 5 . As shown, the accuracy rates of the three machine learning models were 69%, 79%, and 77%, respectively. Compared to model L-2, both the Random Forest and Support Vector Machine algorithms achieved slightly higher accuracy, but the difference was minimal. The Gaussian Naïve Bayes algorithm correctly identified most predatory journals, but it frequently misclassified legitimate OA journals as predatory. In comparison to model L-3, the performance of all three machine learning models was inferior, with a significant performance gap. This suggests that, under the same training set conditions, the 7B parameter baseline model used in this study, after instruction-supervised fine-tuning with 398 structured samples as the fine-tuning dataset, outperformed traditional machine learning models in the predatory journal identification task. Table 5 Performance Comparison of Three Machine Learning Models Model type A P R F GaussianNB 0 69% 0.9524 0.4000 0.5634 1 0.6203 0.9800 0.7597 Avg 0.7863 0.6900 0.6615 RandomForest 0 79% 0.7736 0.8200 0.7961 1 0.8085 0.7600 0.7835 Avg 0.7910 0.7900 0.7898 SVM 0 77% 0.7647 0.7800 0.7723 1 0.7755 0.7600 0.7677 Avg 0.7701 0.7700 0.7700 Fine-tuned Models Compared to General Large Models This study utilized Python to make API calls to DeepSeek-R1 and, using JSON-formatted test set data, assessed its performance in predatory journal identification. The model was provided only with journal indicator values and background knowledge, without being informed of the journal names. The parameters for the API call included a temperature coefficient of 0.1 and a Top-p sampling value of 0.9. The research results are summarized in Table 6 . The large language model, which was not fine-tuned with domain-specific structured data, performed the predatory journal identification task with only marginally better results than the non-fine-tuned 1.5B parameter distilled model (L-1). However, the performance of model L-2 slightly exceeded that of the full-parameter general large model, further demonstrating that instruction-supervised fine-tuning based on LoRA technology can enable distilled models to achieve sufficient performance for domain-specific tasks. Table 6 DeepSeek-R1 Specific Task Performance Model Type A P R F DeepSeek-R1 0 68% 0.6500 0.7800 0.7091 1 0.7250 0.5800 0.6444 Avg 0.6394 0.6800 0.6768 Discussion Through instruction supervised fine-tuning using different strategies, this study obtained six fine-tuned models: the DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 398 samples (L-2), the DeepSeek-R1-Distill-Qwen-7B model fine-tuned with 398 samples (L-3), the DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 100 samples (L-4), the DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 200 samples (L-5), the DeepSeek-R1-Distill-Qwen-7B model fine-tuned with samples without background knowledge (L-6), and the DeepSeek-R1-Distill-Qwen-7B model fine-tuned with samples using the 1-Shot prompt strategy (L-7). The research results are summarized in Fig. 6 . Through performance testing on the same task, this study explored the impact of conditions such as whether to fine-tune, different model parameter sizes, different fine-tuning sample quantities, and different prompt strategies on the performance of distilled models on specific tasks. Additionally, machine learning model training and evaluation experiments were conducted using the training set from the fine-tuning experiments to compare the performance advantages and disadvantages between fine-tuned models and traditional machine learning models. API calls were also made to DeepSeek-R1 to evaluate its performance on the specific task. The results indicate that model L-3, achieved by using DeepSeek-R1-Distill-Qwen-7B as the baseline and performing instruction-supervised fine-tuning with 398 structured samples, delivered the best performance on the predatory journal identification task. Under the same fine-tuning conditions, the fine-tuned 1.5B model from the same series also performed similarly to traditional machine learning models and slightly outperformed the general large model without knowledge enhancement or fine-tuning. As shown in the experimental results, instruction-supervised fine-tuning indeed improves the baseline model's performance on specific tasks. Moreover, based on the performance of L-3, increasing the parameter scale of the baseline model also enhances the performance of the fine-tuned model. When studying the impact of fine-tuning dataset sample size on classification effectiveness (comparing L-2 with L-4 and L-5), it was found that increasing the fine-tuning sample size does improve the performance of the fine-tuned model to some extent. However, when the sample size is small, only a slight increase in the sample size affects the model’s output standardization rather than significantly improving classification accuracy. Additionally, when examining the impact of different prompt strategies on classification effectiveness (comparing L-3 with L-6 and L-7), it was found that background knowledge plays a crucial role in enhancing the model’s ability to perform mathematical reasoning tasks. Furthermore, the inclusion of standardized input-output examples in fine-tuning samples significantly impacts the standardization of the fine-tuned model’s outputs. The study found that significantly increasing the number of fine-tuning samples can bring substantial improvements to the fine-tuned model's performance; however, slight increases in sample size do not yield obvious performance gains. The trend of model performance changes with gradual increases in fine-tuning sample size can be quantitatively evaluated. Based on this, the optimal sample size that can significantly improve model performance and the maximum sample size where fine-tuning benefits reach saturation can be determined, providing scientific basis for subsequent model optimization and resource allocation. Moreover, applying fine-tuned generative large language models to specific downstream tasks essentially reduces the model's "hallucination" in a particular domain. Therefore, future work could combine knowledge enhancement with instruction supervised fine-tuning, utilizing the Retrieval Augmented Generation (RAG) framework to explore more efficient, reliable, stable, and controllable quantitative identification methods and tools for predatory journals. Conclusion This study demonstrates that fine-tuning language models can indeed be applied to the task of predatory journal identification. By fine-tuning the baseline model DeepSeek-R1-Distill-Qwen-7B with a dataset containing structured samples and background knowledge prompts, we significantly enhance the model's effectiveness in identifying predatory journals compared to traditional machine learning methods.The optimal fine-tuning solution identified in this study involves fine-tuning the baseline model DeepSeek-R1-Distill-Qwen-7B using a dataset composed of 398 structured samples containing background knowledge prompts. This approach not only improves the effectiveness of predatory journal identification significantly but also maintains comparable deployment and computational costs to those of the 1.5B parameter fine-tuned model. Furthermore, it can be easily implemented using the Llama-Factory framework and automated via Python, making it both feasible and easy to deploy.The optimal fine-tuning solution identified in this study involves fine-tuning the baseline model DeepSeek-R1-Distill-Qwen-7B using a dataset composed of 398 structured samples containing background knowledge prompts. This approach not only improves the effectiveness of predatory journal identification significantly but also maintains comparable deployment and computational costs to those of the 1.5B parameter fine-tuned model. Furthermore, it can be easily implemented using the Llama-Factory framework and automated via Python, making it both feasible and easy to deploy. Based on the findings, this study concludes that fine-tuning the baseline model DeepSeek-R1-Distill-Qwen-7B using 398 structured samples with background knowledge prompts offers the most effective solution for predatory journal identification. This method significantly outperforms traditional machine learning models, with comparable deployment and computational costs to the 1.5B parameter fine-tuned model. The approach is easily implementable and can be automated within the Llama-Factory framework, offering a novel quantitative solution for predatory journal identification distinct from existing methods. Additionally, the study recommends selecting fine-tuning strategies and deployment methods based on practical factors such as hardware, deployment costs, and time constraints. Key recommendations include: ① selecting baseline models with larger parameter scales for superior performance; ② constructing fine-tuning datasets with more samples and higher data quality; ③ designing comprehensive and reasonable prompts to provide essential background knowledge. This study has the following theoretical and practical implications. First, this study experimentally developed an effective predatory journal identification tool based on bibliometric indicators, demonstrating the feasibility of using distilled models fine-tuned with structured samples for predatory journal identification. The findings reveal the influence of factors such as sample size and prompt design on the fine-tuning effectiveness of distilled models in binary classification tasks, thereby providing methodological insights for subsequent research. Second, in contrast to existing studies (Albana, 2020 ; Wu, 2023) that rely on traditional machine learning approaches and require large-scale datasets to achieve satisfactory identification performance, the fine-tuning strategy adopted in this study leverages the intrinsic reasoning capabilities of distilled models. By training on a relatively small set of high-quality samples composed of key indicators, the proposed approach is able to achieve competitive performance, partially alleviating the dependence of conventional machine learning methods on extensive data collection. From a practical perspective, researchers and institutions with predatory journal identification needs can adopt the workflow proposed in this study to deploy identification models locally according to their available hardware resources. Furthermore, human–model interactive identification can be implemented using the Llama-Factory framework, facilitating practical application and extension. This article has several limitations: (1) This study employed a stratified sampling strategy with sample size increments of twofold, resulting in only three types of datasets with sample sizes of 100, 200, and 400. As a result, the research on the "impact of sample size on classification effectiveness" still has certain limitations. Future studies could explore more granular sample levels to obtain more comprehensive insights, such as constructing a fine-tuning dataset with 300 samples. (2) This study selected bibliometric indicators that are widely used in existing research. Future work could benefit from integrating characteristics specific to predatory journals to develop new quantitative indicators for experiments, potentially yielding more convincing results. (3) This study found that fine-tuning with 1-shot prompt samples did not lead to a significant improvement in model performance. However, the impact of incorporating additional examples in prompts was not fully explored. Future research could design prompts with more examples to investigate how the number of examples influences the model's performance after fine-tuning. Declarations Funding This work was (partially) supported by the Research on the Construction of China's Characteristic Philosophy and Social Science Evaluation System under the Perspective of Academic “Full Evaluation” of China (No. 24&ZD323). Author Contribution Zhang. wrote the main manuscript text and prepared figures. Chen. made substantial contributions to the conception and design of the work, revised it critically for important intellectual content.All authors reviewed the manuscript. Data Availability All data supporting the findings of this study are available within the paper and its Supplementary Information. References Al-Moghrabi, D., Arqub, S. A., Maroulakos, M. P., et al. (2024). Can ChatGPT identify predatory biomedical and dental journals? A cross-sectional content analysis. Journal of Dentistry. Journal of Dentistry. https://doi.org/10.1016/j.jdent.2024.104840 Albana, B. Q. (2020). Avoiding publishing in predatory journals: An evaluation algorithm. Journal on Efficiency and Responsibility in Education and Science , 13(3), 154–163. https://doi.org/10.7160/eriesj.2020.130303 Anisuzzaman, D. M., Malins, J. G., Friedman, P. A., et al. (2025). Fine-tuning large language models for specialized use cases. Mayo Clinic Proceedings: Digital Healt , 3(1). .https://doi.org/10.1016/j.mcpdig.2024.11.005 Aviv, B., & Ramakanth, K. (2025). How important is domain-specific language model pretraining and instruction finetuning for biomedical relation extraction? Natural Language Processing and Information Systems , 15836: 80–94. https://doi.org/10.1007/978-3-031-97141-9_6 Beall, J. (2013). Predatory publishing is just one of the consequences of gold open access. Learned Publishing, 2013, 26(2), 79–84. https://doi.org/10.1087/20130203 Chen, L. X., Su, S. W., Liao, C. H., et al. (2023). An open automation system for predatory journal detection. Scientific Reports , 13(1), 2976. https://doi.org/10.1038/s41598-023-30045-9 Chen, M., Wang, L. Z. (2022). An Altmetrics and citation analysis of selected predatory journals in library and information science field. The Journal of Academic Librarianship, (48):102628. https://doi.org/10.1016/j.acalib.2022.102618 Dadkhah, M., Rahimnia, F., Rafati, N. S., et al. (2022). Jourchain: Using blockchain to avoid questionable journals. Irish Journal of Medical Science , 191(3), 1435–1439. https://doi.org/10.1007/s11845-021-02784-z DeepSeek-AI. (2025). DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. Retrieved August 14, 2025, from https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf DOAJ. (n.d.). Directory of open access journals. https://doaj.org/ . Retrieved July 30, 2025. Dorfner, F. J., MSc, A. D., Busch, F., et al. (2025). Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks. Journal of the American Medical , 1–10. https://doi.org/10.1093/jamia/ocaf045 Hosseini, M., Horbach, S. P. J. M., Holmes, K., et al. (2025). Open science at the generative AI turn: An exploratory analysis of challenges and opportunities. Quantitative Science Studies , 6, 22–45. https://doi.org/10.1162/qss_a_00338 Hu, E. J., Shen, Y., Wallis, P., et al. (2021). LoRA: Low-rank adaptation of large language models. ArXiv . https://doi.org/10.48550/arXiv.2106.09685 Huang, Y., Tang, K., & Chen, M. (2024). Leveraging large language models for enhanced NLP task performance through knowledge distillation and optimized training strategies. ArXiv . https://doi.org/10.48550/arXiv.2402.09282 Kang, M., Lee, S., Baek, J., Kawaguchi, K., & Hwang, S. J. (2023). Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23) (Article 2109, pp. 1–30). Curran Associates Inc. https://doi.org/10.48550/arXiv.2305.18395 Leslie, C., Darius, C., Michael, E., et al. (2002). Budapest Open Access Initiative. Retrieved January 24, 2026, from https://www.budapestopenaccessinitiative.org/read/ Lin, T. Y., Wang, Y. X., Liu, X. Y., et al. (2021). A survey of transformers. AI Open , (3): 111–132. https://doi.org/10.1016/j.aiopen.2021.07.002 Luo, P. C., Hong, L. Z., & Nie, L. (2025). Automatic classification of research data sets into the Chinese Library Classification with generative large language model. The Electronic Library , 43(4), 600–618. https://doi.org/10.1108/EL-02-2025-0042 . Majdik, Z. P., Graham, S. S., Edward, J. C. S., et al. (2024). Sample size considerations for fine-tuning large language models for named entity recognition tasks: Methodological study. JMIR AI , 3: e52095. https://doi.org/10.2196/52095 Pagnoni, A., Balachandran, V., & Tsvetkov, Y. (2021). Understanding factuality in abstractive summarization with FRANK: A benchmark for factuality metrics. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologie s (pp. 4812–4829). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-main.383 Qiu, H., Zhang, S., He, H., et al. (2024). Facilitating pornographic text detection for open-domain dialogue systems via knowledge distillation of large language models. ArXiv . https://doi.org/10.48550/arXiv.2403.13250 Radford, A., & Narasimhan, K. (2018). Improving language understanding by generative pre-training. OpenAI . https://api.semanticscholar.org/CorpusID:49313245 Sakhi, T. (2023). *Effectiveness of natural language processing techniques in categorizing scientific articles by research methodology* [Master’s thesis, University of Twente]. Shamseer, L., Moher, D., Maduekwe, O., et al. (2017). Potential predatory and legitimate biomedical journals: Can you tell the difference? A cross-sectional comparison. BMC Medicine , 15(1), 28–42. https://doi.org/10.1186/s12916-017-0785-9 Siler, K., Vincent-Lamarre, P., Sugimoto, C. R., et al. (2021). Predatory publishers’ latest scam: Bootlegged and rebranded papers. Nature , 598(7882), 563–565. https://doi.org/10.1038/d41586-021-02866-z Tonmoy, S. M., Zaman, S. M., Jain, V., et al. (2024). A comprehensive survey of hallucination mitigation techniques in large language models. Retrieved August 14, 2025, from https://arxiv.org/abs/2401.01313 van Dis, E. A. M., Bollen, J., Zuidema, W., et al. (2023). ChatGPT: Five priorities for research. Nature , 614(7947), 224–226. https://doi.org/10.1038/d41586-023-00288-7 Wang, H., Fu, T., Du, Y., et al. (2023). Scientific discovery in the age of artificial intelligence. Nature , 620(7972), 47–60. https://doi.org/10.1038/s41586-023-06221-2 Wang, W., & Qiao, H. (2024). How natural language processing enables AIGC recognition? Latest trends and future prospects. In Proceedings of the 2024 7th International Conference on Software Engineering and Information Management (pp. 103–109). https://doi.org/10.1145/3647722.3647738 Wu, J. H., Liu, T. Y., Mu, K. L., et al. (2024). Identification and causal analysis of predatory open access journals based on interpretable machine learning. Scientometrics , 129(4), 2121–2158. https://doi.org/10.1007/s11192-024-05027-x Zhang, Y. F., Wang, Z. Y., He, Z. T., et al. (2024). BB-GeoGPT: A framework for learning a large language model for geographic information science. Information Processing & Management , 61(5), 103808. https://doi.org/10.1016/j.ipm.2024.103808 Zheng, Y., Zhang, R., Zhang, J., et al. (2024). LlamaFactory: Unified efficient fine-tuning of 100 + language models. ArXiv . https://doi.org/10.48550/arXiv.2403.13372 Additional Declarations No competing interests reported. Supplementary Files data.zip Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9029371","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":608775315,"identity":"697d33f5-95c1-4dc6-97ff-e0cd74d350d5","order_by":0,"name":"Fanrui Zhang","email":"","orcid":"","institution":"Nanjing University","correspondingAuthor":false,"prefix":"","firstName":"Fanrui","middleName":"","lastName":"Zhang","suffix":""},{"id":608775317,"identity":"c63bd9b9-9d01-47dd-aacf-30b38c15fa6b","order_by":1,"name":"Ming Chen","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA1klEQVRIiWNgGAWjYDACCRBhwMDAz8CQAGQxk6BFsoE0LSBdB8AUEVr4Zzcfe3Sj4I7d5tsNzyQYKqwTG9jPHsBvyZ1j6cY5Bs+St905kCbBcCY9sYEnLwGvFgOJHDPpHIPDyWY3EtIkGNsOJzZI8BgQ0JL/DazFeAZIyz+itOSwgbTYGUiAtDQQoUXiRhrYYQkSNxKSLRKAHmvjycGvhX9G8jPpnD+H7fln5CTe+FBjLdvPfga/FhhIbGDgSQBHJhtR6oHAnoGB/QCxikfBKBgFo2CEAQBqRkLoFTVgwwAAAABJRU5ErkJggg==","orcid":"","institution":"Nanjing University","correspondingAuthor":true,"prefix":"","firstName":"Ming","middleName":"","lastName":"Chen","suffix":""}],"badges":[],"createdAt":"2026-03-04 10:54:51","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9029371/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9029371/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105053742,"identity":"0a976ee0-4c61-4dd7-8acb-fc2e69883665","added_by":"auto","created_at":"2026-03-20 11:05:55","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":272126,"visible":true,"origin":"","legend":"\u003cp\u003eExample of a JSON Fine-tuning Dataset (Base Prompt)\u003c/p\u003e","description":"","filename":"1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9029371/v1/ca242bcbd75338e41af0eb48.jpg"},{"id":105053739,"identity":"bf2500a0-8eac-4405-91ef-72ef253ac54a","added_by":"auto","created_at":"2026-03-20 11:05:55","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":196387,"visible":true,"origin":"","legend":"\u003cp\u003ePrompt without Background Knowledge\u003c/p\u003e","description":"","filename":"2.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9029371/v1/8109543c8b0b89523267a244.jpg"},{"id":105053741,"identity":"1f983e86-074c-4822-87b7-977f4ecd7623","added_by":"auto","created_at":"2026-03-20 11:05:55","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":291606,"visible":true,"origin":"","legend":"\u003cp\u003e1-Shot Prompt\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9029371/v1/cb6a6f73cda52350b46d52cc.jpg"},{"id":105053740,"identity":"dc3f1957-dd57-4dd1-ba9b-1a71e893ae65","added_by":"auto","created_at":"2026-03-20 11:05:55","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":113436,"visible":true,"origin":"","legend":"\u003cp\u003eFine-tuning Dataset Construction and Model Training\u003c/p\u003e","description":"","filename":"4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9029371/v1/cc3663dbe1857e9973e309a0.jpg"},{"id":105727717,"identity":"6af1eb0f-cb40-4657-8e92-07989b3847c3","added_by":"auto","created_at":"2026-03-30 11:01:52","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":53572,"visible":true,"origin":"","legend":"\u003cp\u003eROC Curve for Three Machine Learning Models\u003c/p\u003e","description":"","filename":"5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9029371/v1/8b8cbc10827b407231bd5c14.jpg"},{"id":105053744,"identity":"bfedbf7f-f24f-44de-a06e-4ee81250ae0a","added_by":"auto","created_at":"2026-03-20 11:05:55","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":156235,"visible":true,"origin":"","legend":"\u003cp\u003eSummary of Specific Task Performance for Each Model\u003c/p\u003e","description":"","filename":"6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9029371/v1/417674c96fd2c35d0b73eeb0.jpg"},{"id":105730392,"identity":"1998495e-74cf-46d6-bb6c-55ec9f4d05a0","added_by":"auto","created_at":"2026-03-30 11:24:25","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1946081,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9029371/v1/b3676f7f-2efc-42b9-b661-4ba24f67251b.pdf"},{"id":105562878,"identity":"0c5e7b8b-e2aa-4e63-8990-12f4235ebd19","added_by":"auto","created_at":"2026-03-27 12:45:06","extension":"zip","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":694032,"visible":true,"origin":"","legend":"","description":"","filename":"data.zip","url":"https://assets-eu.researchsquare.com/files/rs-9029371/v1/5fa72a3c05c894c24c59a9c3.zip"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Method for Identifying Predatory Journals Driven by Large Language Models","fulltext":[{"header":"Introduction","content":"\u003cp\u003eOpen Access (OA) is a publishing model promoted by the international scholarly community since the late 1990s, aiming to address challenges in academic publishing arising from the rapid development of the Internet, to facilitate scholarly communication, and to enhance the societal impact of academic research (Leslie et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2002\u003c/span\u003e). A key outcome of the OA movement is the emergence of OA journals. These journals are typically characterized by author-funded publication models in which authors retain copyright, while published articles are made freely accessible for reading and downloading (Beall, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). However, some journals exploit the features of the OA model by disregarding academic standards in pursuit of unreasonable profits; such journals are commonly referred to as \u0026ldquo;predatory journals\u0026rdquo; (Sumayyia et al., 2023). With the continued expansion of the OA movement, predatory journals have become a global and interdisciplinary concern, attracting increasing attention from the academic community (Shamseer et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Siler et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Both academic and professional communities have explored various approaches to predatory journal identification, ranging from early conceptual discussions to feature-based analysis and quantitative identification methods. Nevertheless, even regularly updated predatory journal lists continue to face limitations, including delays in updates and the risk of false negatives.\u003c/p\u003e \u003cp\u003eIn recent years, generative artificial intelligence has emerged as a prominent research focus, with growing interest in its potential applications across scholarly activities. Wang et al. (\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) argued that AI-supported \"AI for Science\" has already influenced the knowledge production processes of many researchers. It is anticipated that generative AI will become closely integrated into the lifecycle of scientific research, particularly in processes such as the creation and dissemination of research outputs. Existing studies (van Dis et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Hosseini et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2025\u003c/span\u003e) suggest that generative AI can be used to detect predatory publishing and predatory journals. However, these advancements are also accompanied by concerns about their potential to exacerbate the proliferation of low-quality content in academia. In this context, there is a pressing need to explore the application of generative AI in journal evaluation and academic quality assessment. In particular, investigating the feasibility of generative AI for predatory journal identification and developing AI-driven identification methods with improved effectiveness and efficiency have become increasingly important. This study will explore whether the AI -driven method can be applied to the task of identifying predatory journals. This study will apply the distillation model of generative large language models (LLMs)as a classifier to the task of predatory journal identification, perform instruction-supervised fine-tuning on the baseline distillation model based on LoRA, and test whether the fine-tuned model can be competent for the specific task. The subsequent research questions are formulated to direct the inquiry:\u003c/p\u003e \u003cp\u003eRQ1: Can the method of fine-tuning language models be applied to the task of identifying predatory journals?\u003c/p\u003e \u003cp\u003eRQ2: Which fine-tuning method is the optimal one?\u003c/p\u003e \u003cp\u003eRQ3: How do fine-tuning sample size and prompt strategies influence fine-tuning performance?\u003c/p\u003e"},{"header":"Review","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eGenerative Large Language Models and Their Applications\u003c/h2\u003e \u003cp\u003eIn 2018, Google released GPT-1, a pre-trained generative model based on the Transformer architecture. Since then, generative large language models (Generative LLMs) have gained prominence across various academic fields (Radford et al., 2018). Researchers (Zhyar et al., 2023; Kang et al., 2024; Zhang et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Huang et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) have widely employed generative LLMs to perform tasks related to natural language processing, exploring their practical effectiveness in areas such as text summarization, recommendation systems, classification tasks, and knowledge processing, thereby unlocking their potential for broader applications.\u003c/p\u003e \u003cp\u003eHuang et al. (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) combined chain-of-thought techniques with the large model GPT-4, using the annotated data generated by GPT-4 to pre-train a smaller model, BERT. They evaluated the performance of the model, which underwent two-stage training, on named entity recognition tasks and observed a significant improvement. Wang et al. (2024) discussed AI-generated content (AIGC) detection methods such as white-box detection, zero-shot detection, and fine-tuning. Qiu et al.(2024) developed a method for detecting obscene text in open-domain dialogue systems based on large language model knowledge distillation, and constructed the first dataset, CENSORCHAT, designed specifically for detecting obscene text in human-machine dialogue scenarios. Dorfner et al.(2025) constructed a specialized dataset to evaluate the performance of fine-tuned models in biomedical knowledge tasks such as case report generation and diagnostic reasoning. Their study indicated that models fine-tuned with domain-specific knowledge may not outperform general models. This conclusion challenges the assumption that fine-tuning inherently improves LLM performance on domain-specific tasks and highlights the issue of \"catastrophic forgetting,\" where overfitting to fine-tuning data leads to errors in general knowledge comprehension. Anisuzzaman et al.(2025) provided practical examples of fine-tuned models in the medical field, detailing the fine-tuning methods used and summarizing the advantages and disadvantages of this approach.\u003c/p\u003e \u003cp\u003eRecent research in the application of generative LLMs has primarily focused on investigating the effectiveness of fine-tuned models for specific domain tasks. While many studies have demonstrated that fine-tuning can enhance the performance of baseline models across various tasks, the approach remains a subject of debate. Fine-tuning large models requires performance validation to mitigate the risk of \"catastrophic forgetting,\" and the costs associated with model deployment and computation also need to be considered.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eAI-driven Quantitative Identification of Predatory Journals\u003c/h3\u003e\n\u003cp\u003eWith the deepening of research into the characteristics of predatory journals and the advancement of artificial intelligence technologies, academic research on the quantitative identification of predatory journals has seen numerous methodological innovations. The academic research on the quantitative identification of predatory journals has made more methodological innovations based on the design of quantitative indicators for identification and the construction of mathematical models for identification.\u003c/p\u003e \u003cp\u003eAlbana (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) introduced a question-based evaluation algorithm for predatory journals, providing researchers with a new approach to identifying such journals. Dadkhah et al. (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) proposed the concept of \"Jourchain,\" leveraging blockchain technology to create a semi-private journal blockchain. By continuously expanding the blockchain records, this system improves the efficiency of identifying suspicious journals. Chen Li-Xian et al. (2023) utilized machine learning algorithms to analyze textual information from predatory journal websites, ultimately developing a practical system known as \"AJPC.\" Al-Moghrabi et al. (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) combined machine learning methods with ChatGPT models to generate evaluations based on journal names. The evaluations were then standardized into 20 positive and negative indicators, which were analyzed to identify the key factors affecting the model\u0026rsquo;s accuracy in distinguishing predatory journals. Their findings indicated that ChatGPT can accurately identify predatory journals based on the provided journal names.\u003c/p\u003e \u003cp\u003eIn summary, with advancements in quantitative research technologies and shifting academic priorities, the methods for identifying predatory journals have evolved, incorporating techniques such as basic mathematical models, machine learning algorithms, blockchain technology, and large language models, leading to promising results. However, research on AI-driven identification methods remains limited, and no studies have yet developed a classification system for predatory journal identification using model fine-tuning.\u003c/p\u003e \u003cp\u003eTherefore, this study aims to employ a distilled generative LLM model as a classifier for the predatory journal identification task. We apply LoRA-based instruction-supervised fine-tuning to the baseline distilled model and assess whether the fine-tuned model can effectively handle the task. Additionally, to explore the most efficient fine-tuning strategy under the current research conditions, we designed multiple comparative experiments using different fine-tuning strategies and baseline models. Given that traditional classifiers such as machine learning models have already achieved good results in predatory journal classification tasks, we also conducted comparative experiments to evaluate the performance of the fine-tuned model against machine learning models and general-purpose large language models. The final fine-tuned model developed was designed to be easily deployable, providing an efficient solution for predatory journal identification and offering a new approach to the field.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003eAs discussed above, existing study (Kang et al., 2024) have demonstrated the feasibility of using distilled models as classification systems. In addition, fine-tuning large language models such as GPT-3 for specific downstream tasks has been shown to significantly improve baseline model performance. Zhyar et al. (2023) demonstrated that fine-tuning pretrained transformers for text classification tasks resulted in substantial gains in accuracy, precision, recall, and F1-score, outperforming traditional machine learning models like SVM and CNN. Their studies revealed that fine-tuned models consistently achieved better classification results, especially when trained with larger datasets and optimized hyperparameters, thus highlighting the effectiveness of fine-tuning in enhancing task-specific model performance. Building on this evidence, this study employs distilled models for data-driven reasoning and examines whether their performance exceeds that of baseline models.\u003c/p\u003e\n\u003ch3\u003eBaseline Distilled Model Selection\u003c/h3\u003e\n\u003cp\u003eGenerative large language models (LLMs) represent a major development in the field of artificial intelligence. Their core functionality lies in predicting and generating text sequences based on statistical regularities, thereby enabling the comprehension and generation of natural language (Lin et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). LLMs have been widely applied across a range of domains and tasks, including text generation and mathematical reasoning (Sakhi, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Zhang et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Huang et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). The generative models examined in this study are the DeepSeek-R1 model and its distilled variants. DeepSeek-R1 is trained using large-scale reinforcement learning, and its reasoning process incorporates iterative reflection and verification, endowing it with strong logical reasoning capabilities in tasks such as mathematical validation and code generation.\u003c/p\u003e \u003cp\u003eThe distilled versions of DeepSeek-R1 are small, dense models derived through knowledge distillation. These models are based on the open-source Qwen2.5 architecture and fine-tuned using samples generated by the DeepSeek-R1 model. Compared with the teacher model, the distilled models retain essential reasoning abilities while offering advantages in terms of reduced model size and lower computational cost (DEEPSEEK-AI, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Considering the experimental environment and available hardware resources, this study selects DeepSeek-R1-Distill-Qwen-1.5B and DeepSeek-R1-Distill-Qwen-7B as baseline models. During the experimental phase, the performance of these fine-tuned distilled models is compared with that of the DeepSeek-R1 model in the predatory journal identification task. And Python is used to invoke the DeepSeek-R1 model via API for task execution.\u003c/p\u003e\n\u003ch3\u003eFine-tuning Method Selection\u003c/h3\u003e\n\u003cp\u003eHallucination is a well-recognized phenomenon in the application of generative large language models (LLMs). To reduce its occurrence, prior studies have proposed several effective approaches, including prompt optimization, knowledge augmentation, and model fine-tuning (Tonmoy et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Existing research on large-model adaptation has explored various fine-tuning strategies, such as full-parameter fine-tuning, parameter-efficient fine-tuning, reinforcement learning, and contrastive learning. Anisuzzaman (2025) have shown that parameter-efficient fine-tuning methods, exemplified by Low-Rank Adaptation (LoRA), modify only a limited number of parameters while substantially reducing computational and storage costs, without compromising the core representational capacity of the base model. In addition, instruction-supervised fine-tuning has been proposed as an effective strategy to enhance a model\u0026rsquo;s ability to interpret and follow task-specific instructions (Aviv et al., 2025).\u003c/p\u003e \u003cp\u003eIn light of these findings, this study adopts an instruction-supervised fine-tuning paradigm implemented through a parameter-efficient approach. Specifically, LoRA (Hu et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) is employed to introduce low-rank adaptation matrices into the baseline model. Supervised fine-tuning is then conducted using carefully constructed instruction-based structured data, during which only the parameters associated with the low-rank matrices are updated. This approach enables the model to adapt to domain-specific classification tasks with efficiency comparable to full-parameter fine-tuning, while maintaining relatively low computational and deployment costs.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eExperimental Design\u003c/h2\u003e \u003cp\u003eFirst, an automated program was employed to collect bibliometric indicators for known predatory journals and legitimate open access journals, thereby constructing an initial dataset. Second, a binary label, Predatory, was defined, with predatory journals assigned a value of \u0026ldquo;1\u0026rdquo; and legitimate OA journals assigned a value of \u0026ldquo;0.\u0026rdquo; Based on this label, equal numbers of journals from each category were randomly sampled and divided into training and test sets. These datasets were used for fine-tuning dataset construction, fine-tuned model evaluation, and the training and evaluation of machine learning models. During instruction-supervised fine-tuning, the outputs of the distilled models were constrained to either \u0026ldquo;0\u0026rdquo; or \u0026ldquo;1\u0026rdquo; to achieve a classification objective. Third, model predictions on the test set were compared with the original Predatory labels to compute accuracy, recall, precision, and F1-score. To reduce randomness in the experimental results, each evaluation was repeated three times, and the mean values of all metrics were reported. Finally, performance differences among models fine-tuned under different strategies were analyzed, and comparisons were conducted between traditional machine learning models, baseline models, and fine-tuned models to identify the optimal configuration and fine-tuning strategy for predatory journal classification.\u003c/p\u003e \u003cp\u003eAll experiments were conducted in a Python 3.11 environment with CUDA 12.9 and PyTorch 2.6, running on a Windows-based server equipped with an Intel Core i7-12800HX CPU and an NVIDIA GeForce RTX 4070 GPU. Given the available hardware resources, this study selected the knowledge-distilled small dense models DeepSeek-R1-Distill-Qwen-1.5B (denoted as L-1) and DeepSeek-R1-Distill-Qwen-7B (denoted as L-7B) as baseline models for empirical evaluation. The Llama-Factory framework (Zheng et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) was used for fine-tuning and performance assessment. To ensure reproducibility and operational efficiency, Python scripts were developed to automate data collection, pre-processing, dataset construction, and model evaluation.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eData Collection and Preprocessing\u003c/h3\u003e\n\u003cp\u003ePredatory journals typically lack rigorous peer-review processes, focusing more on economic gain than the academic quality of papers (Shamseer et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). Bibliometric indicators that measure a journal's academic influence can quantify its academic level. Therefore, in existing predatory journal identification research, bibliometric indicators are often used as the subject of study. Considering the hardware requirements for instruction fine-tuning and the quality of the fine-tuning dataset, this study chose to collect five quantitative indicators for journals: Impact Factor Without Self-Citations, Cited Half-Life, Article Influence Score, Immediacy Index, and CiteScore. The performance of predatory journals in terms of academic influence measured by these indicators is believed to differ from that of legitimate open access journals (Wu et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Therefore, this study uses these indicators as the basic basis for identifying predatory journals. The predatory journals in this study are sourced from Beall's list of predatory journals (Butler, 2013). Legitimate open access journals were randomly selected from the Directory of Open Access Journals (DOAJ) database (DOAJ, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Indicators for both types of journals were collected from the Journal Citation Reports (JCR) database, Web of Science (WoS), and Scopus(Wu et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). After data collection and cleaning, this study ultimately constructed a total dataset containing 498 journal data samples, with 249 samples each from predatory journals and legitimate OA journals. The \"Predatory\" label for predatory journal samples was assigned \u0026ldquo;1\u0026rdquo;, while legitimate OA journals were assigned \u0026ldquo;0\u0026rdquo;.\u003c/p\u003e\n\u003ch3\u003ePrompt Engineering and Fine-tuning Dataset\u003c/h3\u003e\n\u003cp\u003eResearch by Luo et al. (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2025\u003c/span\u003e) demonstrated that prompts have a significant impact on a large language model\u0026rsquo;s understanding of a given task. In addition, the Llama-Factory framework requires instruction-supervised fine-tuning datasets to be provided as structured data in JSON format, with each data instance adhering to a uniform key\u0026ndash;value schema, namely a JSON triple consisting of \u0026rdquo;instruction\u0026rdquo;, \u0026ldquo;input\u0026rdquo;, and \u0026ldquo;output\u0026rdquo;. Accordingly, this study designed a dedicated instruction to serve as the basic prompt for instruction-based fine-tuning. An example of the JSON-formatted data array is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eTo enable the model to acquire the necessary background knowledge about predatory journals, this study calculated the average values of each indicator for both predatory journal samples and legitimate OA journal samples, and incorporated these averages into the instruction as a reference for the model to assess the relative magnitude of indicator values. Existing research (Feroze et al., 2024) noted that models with fewer parameters may lack the fundamental cognitive capabilities possessed by large-parameter models. Accordingly, to examine the impact of background knowledge on model performance in classification tasks, this study designed an alternative instruction that excluded the average values of each indicator; an example is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. In addition, to investigate the effect of including example question\u0026ndash;answer pairs in the instruction on classification performance, a 1-shot prompt was designed, as illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFollowing the Llama-Factory framework specifications, this study used the training set to construct the instruction fine-tuning dataset. From the 498 samples, 100 were randomly selected to form the test set, while the remaining 398 formed the training set. To explore the impact of different sample sizes on the classification effectiveness of fine-tuned models, this study conducted random sampling in equal proportions from predatory journal samples and legitimate OA journal samples and constructed structured datasets containing 100 samples (50 samples per journal type) and 200 samples (proportions analogous to the 100-sample set).\u003c/p\u003e \u003cp\u003eBased on different instruction conditions, this study constructed five JSON format fine-tuning data training sets, namely:\u003c/p\u003e \u003cp\u003e① Dataset \u003cem\u003eA\u003c/em\u003e, consisting of 398 samples containing the basic prompt;\u003c/p\u003e \u003cp\u003e② Dataset \u003cem\u003eB\u003c/em\u003e, consisting of 200 samples containing the basic prompt;\u003c/p\u003e \u003cp\u003e③ Dataset \u003cem\u003eC\u003c/em\u003e, consisting of 100 samples containing the basic prompt;\u003c/p\u003e \u003cp\u003e④ Dataset \u003cem\u003eD\u003c/em\u003e, consisting of 398 samples with background knowledge prompts removed;\u003c/p\u003e \u003cp\u003e⑤ Dataset \u003cem\u003eE\u003c/em\u003e, consisting of 398 samples containing the 1-shot prompt.\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eBased on the above data and models, this study designed six experiments:\u003c/h2\u003e \u003cp\u003e① Fine-tune baseline model L-1 using dataset \u003cem\u003eA\u003c/em\u003e to obtain model L-2, and compare their performances;\u003c/p\u003e \u003cp\u003e② Fine-tune baseline model L-7B using dataset \u003cem\u003eA\u003c/em\u003e to obtain L-3 and test its performance;\u003c/p\u003e \u003cp\u003e③ Fine-tune model L-1 using datasets \u003cem\u003eB\u003c/em\u003e and \u003cem\u003eC\u003c/em\u003e respectively to obtain models L-4 and L-5, and compare their performances;\u003c/p\u003e \u003cp\u003e④ Fine-tune model L-7B using datasets \u003cem\u003eD\u003c/em\u003e and \u003cem\u003eE\u003c/em\u003e respectively to obtain models L-6 and L-7, and compare their performance differences with model L-3;\u003c/p\u003e \u003cp\u003e⑤ Train three machine learning models using the 398-sample training set and observe their classification effectiveness using the test set;\u003c/p\u003e \u003cp\u003e⑥ Use Python to implement API calls to the general large model DeepSeek-R1, perform classification tasks on the test set, and compare its effectiveness with fine-tuned models.\u003c/p\u003e \u003cp\u003eThe process for constructing the generative large language model fine-tuning dataset and model fine-tuning in this study is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eImpact of Instruction Fine-tuning on Classification Effectiveness\u003c/h2\u003e \u003cp\u003eFine-tuning baseline models using high-quality datasets can significantly optimize the performance of large language models on specific tasks (Zhang et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). In this study, the prompts and input data from the test set were sequentially provided to the baseline model L-1. The model's output, either \"1\" or \"0,\" was then recorded. Precision, recall, and F1-score were calculated based on the original \"Predatory\" label attribute (\"1\" or \"0\") for each journal, allowing for an evaluation of L-1's classification effectiveness.\u003c/p\u003e \u003cp\u003eSimultaneously, LoRA technology was employed for instruction-supervised fine-tuning of L-1. The fine-tuning dataset, referred to as dataset \u003cem\u003eA\u003c/em\u003e in Section 2.3.2, was used with the following relevant parameters for instruction-supervised fine-tuning: quantization level 4, quantization method (bitsandbytes), learning rate \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\text{5e-5}\\)\u003c/span\u003e\u003c/span\u003e, maximum gradient norm 1.0, and 3 training epochs. After fine-tuning, the process of inputting the test set data and calculating corresponding metrics was repeated.\u003c/p\u003e \u003cp\u003eThe parameters for evaluating the performance of the fine-tuned model included: a temperature coefficient of 0.1 and a Top-p sampling value of 0.9. The results from the identification task on the test set are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. It is evident that model L-2, after instruction-supervised fine-tuning, achieved an accuracy of 76% in the predatory journal identification task, demonstrating a marked improvement over the non-fine-tuned baseline model. These experimental results indicate that distilled models fine-tuned through instruction-supervised fine-tuning, using a sufficiently large and high-quality dataset, outperform baseline models of the same parameter size in specific tasks. Even smaller-parameter dense models, after acquiring the necessary background knowledge, exhibit logical reasoning capabilities adequate to support binary classification tasks with fewer variables.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance Comparison of Large Language Models Based on Instruction Supervised Fine-tuning\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003etype\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eA\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eF\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eL-1\u003csup\u003e*\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e61%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.6078\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.6200\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.6139\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.6122\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.6000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.6061\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.6100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.6100\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.6100\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eL-2\u003csup\u003e*\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e76%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.6912\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.9400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.7966\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9062\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.5800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.7073\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.7987\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.7600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.7520\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"6\"\u003eNote: L-1: DeepSeek-R1-Distill-Qwen-1.5B; L-2: DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 398 samples.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eImpact of Parameter Scale on Classification Effectiveness\u003c/h2\u003e \u003cp\u003eTo explore the impact of model parameter scale on the performance of fine-tuned versions in classification tasks, this study conducted a comparative experiment between the fine-tuned DeepSeek-R1-Distill-Qwen-1.5B model and the fine-tuned DeepSeek-R1-Distill-Qwen-7B model. The baseline model L-7B was fine-tuned using dataset \u003cem\u003eA\u003c/em\u003e through instruction-supervised fine-tuning to obtain the fine-tuned model L-3. Its classification effectiveness was then tested using the same test set. The fine-tuning parameters for this process were consistent with those in the previous study, and the experimental results are shown in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. Model L-3, which has a larger parameter scale than L-2, outperformed it in the same task, achieving an accuracy of 92%. In the test set classification task, all legitimate OA journals were correctly identified (no false negatives), while predatory journals still had some instances of missed detection (false negatives). This may be due to the characteristics of the test set data, as not all predatory journals perform below average on all five bibliometric indicators, and some individual journals may show significantly higher performance on certain indicators compared to others of their type. Alternatively, it may be due to the inherent limitations in the mathematical and logical reasoning capabilities of distilled models for specific tasks.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance Comparison of Models Based on Different Parameter Scales\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003etype\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eA\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eF\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eL-3*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e92%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.8621\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.9259\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.0000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.8400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.9310\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9310\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.9259\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.9195\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"6\"\u003eNote: L-3: DeepSeek-R1-Distill-Qwen-7B model fine-tuned with 398 samples.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eImpact of Sample Size on Classification Effectiveness\u003c/h2\u003e \u003cp\u003eExisting research (Majdik et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) has demonstrated that instruction supervised fine-tuning based on datasets with different sample sizes may yield varying optimization effects on baseline model performance. Based on this, this study designed a performance comparison study for fine-tuned models based on the same baseline model but fine-tuned with datasets of different sample sizes.\u003c/p\u003e \u003cp\u003eFirst, baseline model L-1 was fine-tuned using instruction supervised fine-tuning with dataset \u003cem\u003eC\u003c/em\u003e and dataset \u003cem\u003eD\u003c/em\u003e (constructed in section 2.3.2) to obtain models L-4 and L-5, respectively. Then, the performance of these two fine-tuned models on the test set was examined. Fine-tuning parameters for this part were the same as above; specific research results are shown in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. The study found that after fine-tuning L-1 with dataset \u003cem\u003eC\u003c/em\u003e, its performance in the classification task was even worse than L-1, with its identification of predatory journals showing characteristics of random events. This might be due to overfitting caused by fine-tuning with very few samples, exacerbating \"hallucination\" in the small dense model due to increased noise in the fine-tuning data. (Dorfner et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2025\u003c/span\u003e) also found in their research that small models have inherently limited pre-trained knowledge. If fine-tuning forcibly adjusts parameters with a small number of samples, it may disrupt existing correct knowledge, i.e., cause \"catastrophic forgetting.\" Therefore, performance degradation of the model after fine-tuning with few samples can occur.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance Comparison of Models Based on Different Fine-tuning Sample Sizes\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003etype\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eA\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eF\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eL-4*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e49%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.4909\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.5400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.5143\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.4889\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.4400\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.4632\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.4899\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.4900\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.4887\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eL-5*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e45%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.4706\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.8000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.5926\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.3333\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.1000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.1538\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.4020\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.4500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.3732\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"6\"\u003eNote: L-4: DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 100 samples; L-5: DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 200 samples.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eSimilarly, the model obtained after fine-tuning with dataset \u003cem\u003eB\u003c/em\u003e also showed unstable performance in the same task.\u003c/p\u003e \u003cp\u003eFurthermore, when model L-4 executed the test set data, its output did not follow the instruction \"do not output the reasoning process, answer directly\" for 100% of the total samples. For model L-5, this proportion was only 6% during task execution, while for model L-2 it was 0%. This indicates that with extremely small sample sizes, increasing the fine-tuning data sample size through instruction supervised fine-tuning may not effectively improve the model's overall performance, but can optimize its output format based on structured fine-tuning data. This part of the experiment also shows that the 398-sample fine-tuning dataset \u003cem\u003eA\u003c/em\u003e used in this study could significantly improve the baseline model's performance on the specific task.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eImpact of Prompts on Classification Effectiveness\u003c/h2\u003e \u003cp\u003eExisting research has demonstrated that prompt engineering can significantly reduce the likelihood of generative LLMs producing erroneous information (hallucination)(Pagnoni et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). To explore the impact of prompts during instruction-supervised fine-tuning, this study was designed to compare the fine-tuning effects using different prompts, while keeping the baseline model and dataset consistent. Two different prompt instructions were designed, as described in Section 2.3.2. One was a 0-shot prompt with the related background knowledge removed, while the other was a 1-shot prompt with an added question-answer example to the original prompt. Models L-6 and L-7 were obtained by fine-tuning model L-7B with dataset \u003cem\u003eD\u003c/em\u003e (instructions without background knowledge) and dataset \u003cem\u003eE\u003c/em\u003e (instructions with one example), respectively, and their performance was tested. The baseline model L-7B used in this part of the study was DeepSeek-R1-Distill-Qwen-7B. The fine-tuning parameters were consistent with those in previous experiments, and the research results are presented in Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eCompared to the original fine-tuning dataset (dataset \u003cem\u003eA\u003c/em\u003e), the performance of the model fine-tuned with dataset \u003cem\u003eD\u003c/em\u003e (without background knowledge) decreased significantly, with an accuracy of only 63%, which was lower than the performance of model L-2 under the original prompt condition. Other metrics also showed varying degrees of decline.\u003c/p\u003e \u003cp\u003eUnder the 1-shot prompt condition, the fine-tuned model\u0026rsquo;s performance on the predatory journal identification task did not show significant improvement compared to the original prompt condition, achieving an accuracy of 88%, with other metrics slightly worse than the model's performance under the original prompt condition. By analyzing the characteristics of the predatory journal identification task and the impact of prompt engineering on generative large language model outputs, this study suggests that for generative LLMs, the predatory journal identification task is akin to mathematical logical reasoning based on background knowledge. A single example in the prompt condition does not provide sufficient background knowledge or enhance the model's \"understanding\" of the task. Moreover, the effectiveness of prompts is inherently limited by the capabilities of the large models themselves. While distilled models with smaller parameters possess basic logical reasoning abilities, they still cannot match the performance of commercial large models.\u003c/p\u003e \u003cp\u003eBased on the results of this section, this study concludes that in classification tasks involving mathematical logical reasoning, distilled models require background knowledge to reduce \"hallucinations.\" Additionally, structured output examples may be more effective in standardizing the model\u0026rsquo;s output format.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance Comparison of Large Language Models Based on Different Prompt Conditions\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003etype\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eA\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eF\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eL-6*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e63%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.6032\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.7600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.6726\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.6757\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.5000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.5747\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.6394\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.6300\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.6236\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eL-7*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e88%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.8167\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.9800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.8909\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9750\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.7800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.8667\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.8958\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.8800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.8788\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"6\"\u003eNote: L-6: DeepSeek-R1-Distill-Qwen-7B model fine-tuned with samples without background knowledge; L-7: DeepSeek-R1-Distill-Qwen-7B model fine-tuned with 1-shot prompt samples.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eFine-tuned Models Compared to Traditional Classifiers\u003c/h2\u003e \u003cp\u003eExisting research has employed machine learning algorithms with quantitative bibliometric indicators to identify predatory journals, achieving good performance. (Albana, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Wu, 2023) To determine whether fine-tuned models outperform traditional classifiers on the same task, this study trained and evaluated three classic machine learning models\u0026mdash;Gaussian Na\u0026iuml;ve Bayes (GaussianNB), Random Forest (RandomForestClassifier), and Support Vector Machine (SVM)\u0026mdash;using the training and test set data. Among these, the Gaussian Na\u0026iuml;ve Bayes model is a generative model based on Bayes' theorem, with hyperparameters set to their default values. The Random Forest algorithm, a classic ensemble learning algorithm, was configured with 400 decision trees to balance model performance and computational cost, considering the sample size of the training set in this study. The Support Vector Machine algorithm, a standard supervised learning model, used default hyperparameters for classification tasks, with the Radial Basis Function (RBF) as the kernel function and regularization strength set to the default value of 1.0. The accuracy and other performance metrics of these three machine learning models in the predatory journal identification task are presented in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, and the ROC curve is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eAs shown, the accuracy rates of the three machine learning models were 69%, 79%, and 77%, respectively. Compared to model L-2, both the Random Forest and Support Vector Machine algorithms achieved slightly higher accuracy, but the difference was minimal. The Gaussian Na\u0026iuml;ve Bayes algorithm correctly identified most predatory journals, but it frequently misclassified legitimate OA journals as predatory. In comparison to model L-3, the performance of all three machine learning models was inferior, with a significant performance gap. This suggests that, under the same training set conditions, the 7B parameter baseline model used in this study, after instruction-supervised fine-tuning with 398 structured samples as the fine-tuning dataset, outperformed traditional machine learning models in the predatory journal identification task.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance Comparison of Three Machine Learning Models\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003etype\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eA\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eF\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eGaussianNB\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e69%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.9524\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.4000\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.5634\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.6203\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.9800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.7597\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.7863\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.6900\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.6615\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eRandomForest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e79%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.7736\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.8200\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.7961\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.8085\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.7600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.7835\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.7910\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.7900\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.7898\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e77%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.7647\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.7800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.7723\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.7755\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.7600\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.7677\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.7701\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.7700\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.7700\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eFine-tuned Models Compared to General Large Models\u003c/h2\u003e \u003cp\u003eThis study utilized Python to make API calls to DeepSeek-R1 and, using JSON-formatted test set data, assessed its performance in predatory journal identification. The model was provided only with journal indicator values and background knowledge, without being informed of the journal names. The parameters for the API call included a temperature coefficient of 0.1 and a Top-p sampling value of 0.9. The research results are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e. The large language model, which was not fine-tuned with domain-specific structured data, performed the predatory journal identification task with only marginally better results than the non-fine-tuned 1.5B parameter distilled model (L-1). However, the performance of model L-2 slightly exceeded that of the full-parameter general large model, further demonstrating that instruction-supervised fine-tuning based on LoRA technology can enable distilled models to achieve sufficient performance for domain-specific tasks.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDeepSeek-R1 Specific Task Performance\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eType\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eA\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eP\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eF\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eDeepSeek-R1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e68%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.6500\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.7800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.7091\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.7250\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.5800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.6444\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAvg\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.6394\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.6800\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.6768\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThrough instruction supervised fine-tuning using different strategies, this study obtained six fine-tuned models: the DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 398 samples (L-2), the DeepSeek-R1-Distill-Qwen-7B model fine-tuned with 398 samples (L-3), the DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 100 samples (L-4), the DeepSeek-R1-Distill-Qwen-1.5B model fine-tuned with 200 samples (L-5), the DeepSeek-R1-Distill-Qwen-7B model fine-tuned with samples without background knowledge (L-6), and the DeepSeek-R1-Distill-Qwen-7B model fine-tuned with samples using the 1-Shot prompt strategy (L-7). The research results are summarized in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e. Through performance testing on the same task, this study explored the impact of conditions such as whether to fine-tune, different model parameter sizes, different fine-tuning sample quantities, and different prompt strategies on the performance of distilled models on specific tasks. Additionally, machine learning model training and evaluation experiments were conducted using the training set from the fine-tuning experiments to compare the performance advantages and disadvantages between fine-tuned models and traditional machine learning models. API calls were also made to DeepSeek-R1 to evaluate its performance on the specific task.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe results indicate that model L-3, achieved by using DeepSeek-R1-Distill-Qwen-7B as the baseline and performing instruction-supervised fine-tuning with 398 structured samples, delivered the best performance on the predatory journal identification task. Under the same fine-tuning conditions, the fine-tuned 1.5B model from the same series also performed similarly to traditional machine learning models and slightly outperformed the general large model without knowledge enhancement or fine-tuning.\u003c/p\u003e \u003cp\u003eAs shown in the experimental results, instruction-supervised fine-tuning indeed improves the baseline model's performance on specific tasks. Moreover, based on the performance of L-3, increasing the parameter scale of the baseline model also enhances the performance of the fine-tuned model. When studying the impact of fine-tuning dataset sample size on classification effectiveness (comparing L-2 with L-4 and L-5), it was found that increasing the fine-tuning sample size does improve the performance of the fine-tuned model to some extent. However, when the sample size is small, only a slight increase in the sample size affects the model\u0026rsquo;s output standardization rather than significantly improving classification accuracy. Additionally, when examining the impact of different prompt strategies on classification effectiveness (comparing L-3 with L-6 and L-7), it was found that background knowledge plays a crucial role in enhancing the model\u0026rsquo;s ability to perform mathematical reasoning tasks. Furthermore, the inclusion of standardized input-output examples in fine-tuning samples significantly impacts the standardization of the fine-tuned model\u0026rsquo;s outputs.\u003c/p\u003e \u003cp\u003eThe study found that significantly increasing the number of fine-tuning samples can bring substantial improvements to the fine-tuned model's performance; however, slight increases in sample size do not yield obvious performance gains. The trend of model performance changes with gradual increases in fine-tuning sample size can be quantitatively evaluated. Based on this, the optimal sample size that can significantly improve model performance and the maximum sample size where fine-tuning benefits reach saturation can be determined, providing scientific basis for subsequent model optimization and resource allocation. Moreover, applying fine-tuned generative large language models to specific downstream tasks essentially reduces the model's \"hallucination\" in a particular domain. Therefore, future work could combine knowledge enhancement with instruction supervised fine-tuning, utilizing the Retrieval Augmented Generation (RAG) framework to explore more efficient, reliable, stable, and controllable quantitative identification methods and tools for predatory journals.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study demonstrates that fine-tuning language models can indeed be applied to the task of predatory journal identification. By fine-tuning the baseline model DeepSeek-R1-Distill-Qwen-7B with a dataset containing structured samples and background knowledge prompts, we significantly enhance the model's effectiveness in identifying predatory journals compared to traditional machine learning methods.The optimal fine-tuning solution identified in this study involves fine-tuning the baseline model DeepSeek-R1-Distill-Qwen-7B using a dataset composed of 398 structured samples containing background knowledge prompts. This approach not only improves the effectiveness of predatory journal identification significantly but also maintains comparable deployment and computational costs to those of the 1.5B parameter fine-tuned model. Furthermore, it can be easily implemented using the Llama-Factory framework and automated via Python, making it both feasible and easy to deploy.The optimal fine-tuning solution identified in this study involves fine-tuning the baseline model DeepSeek-R1-Distill-Qwen-7B using a dataset composed of 398 structured samples containing background knowledge prompts. This approach not only improves the effectiveness of predatory journal identification significantly but also maintains comparable deployment and computational costs to those of the 1.5B parameter fine-tuned model. Furthermore, it can be easily implemented using the Llama-Factory framework and automated via Python, making it both feasible and easy to deploy.\u003c/p\u003e \u003cp\u003eBased on the findings, this study concludes that fine-tuning the baseline model DeepSeek-R1-Distill-Qwen-7B using 398 structured samples with background knowledge prompts offers the most effective solution for predatory journal identification. This method significantly outperforms traditional machine learning models, with comparable deployment and computational costs to the 1.5B parameter fine-tuned model. The approach is easily implementable and can be automated within the Llama-Factory framework, offering a novel quantitative solution for predatory journal identification distinct from existing methods.\u003c/p\u003e \u003cp\u003eAdditionally, the study recommends selecting fine-tuning strategies and deployment methods based on practical factors such as hardware, deployment costs, and time constraints. Key recommendations include:\u003c/p\u003e \u003cp\u003e① selecting baseline models with larger parameter scales for superior performance;\u003c/p\u003e \u003cp\u003e② constructing fine-tuning datasets with more samples and higher data quality;\u003c/p\u003e \u003cp\u003e③ designing comprehensive and reasonable prompts to provide essential background knowledge.\u003c/p\u003e \u003cp\u003eThis study has the following theoretical and practical implications. First, this study experimentally developed an effective predatory journal identification tool based on bibliometric indicators, demonstrating the feasibility of using distilled models fine-tuned with structured samples for predatory journal identification. The findings reveal the influence of factors such as sample size and prompt design on the fine-tuning effectiveness of distilled models in binary classification tasks, thereby providing methodological insights for subsequent research.\u003c/p\u003e \u003cp\u003eSecond, in contrast to existing studies (Albana, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Wu, 2023) that rely on traditional machine learning approaches and require large-scale datasets to achieve satisfactory identification performance, the fine-tuning strategy adopted in this study leverages the intrinsic reasoning capabilities of distilled models. By training on a relatively small set of high-quality samples composed of key indicators, the proposed approach is able to achieve competitive performance, partially alleviating the dependence of conventional machine learning methods on extensive data collection.\u003c/p\u003e \u003cp\u003eFrom a practical perspective, researchers and institutions with predatory journal identification needs can adopt the workflow proposed in this study to deploy identification models locally according to their available hardware resources. Furthermore, human\u0026ndash;model interactive identification can be implemented using the Llama-Factory framework, facilitating practical application and extension.\u003c/p\u003e \u003cp\u003eThis article has several limitations: (1) This study employed a stratified sampling strategy with sample size increments of twofold, resulting in only three types of datasets with sample sizes of 100, 200, and 400. As a result, the research on the \"impact of sample size on classification effectiveness\" still has certain limitations. Future studies could explore more granular sample levels to obtain more comprehensive insights, such as constructing a fine-tuning dataset with 300 samples. (2) This study selected bibliometric indicators that are widely used in existing research. Future work could benefit from integrating characteristics specific to predatory journals to develop new quantitative indicators for experiments, potentially yielding more convincing results. (3) This study found that fine-tuning with 1-shot prompt samples did not lead to a significant improvement in model performance. However, the impact of incorporating additional examples in prompts was not fully explored. Future research could design prompts with more examples to investigate how the number of examples influences the model's performance after fine-tuning.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eFunding\u003c/h2\u003e \u003cp\u003eThis work was (partially) supported by the Research on the Construction of China's Characteristic Philosophy and Social Science Evaluation System under the Perspective of Academic \u0026ldquo;Full Evaluation\u0026rdquo; of China (No. 24\u0026amp;ZD323).\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eZhang. wrote the main manuscript text and prepared figures. Chen. made substantial contributions to the conception and design of the work, revised it critically for important intellectual content.All authors reviewed the manuscript.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eAll data supporting the findings of this study are available within the paper and its Supplementary Information.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAl-Moghrabi, D., Arqub, S. A., Maroulakos, M. P., et al. (2024). Can ChatGPT identify predatory biomedical and dental journals? A cross-sectional content analysis. Journal of Dentistry. \u003cem\u003eJournal of Dentistry.\u003c/em\u003e\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jdent.2024.104840\u003c/span\u003e\u003cspan address=\"10.1016/j.jdent.2024.104840\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlbana, B. Q. (2020). Avoiding publishing in predatory journals: An evaluation algorithm. \u003cem\u003eJournal on Efficiency and Responsibility in Education and Science\u003c/em\u003e, 13(3), 154\u0026ndash;163. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.7160/eriesj.2020.130303\u003c/span\u003e\u003cspan address=\"10.7160/eriesj.2020.130303\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAnisuzzaman, D. M., Malins, J. G., Friedman, P. A., et al. (2025). Fine-tuning large language models for specialized use cases. \u003cem\u003eMayo Clinic Proceedings: Digital Healt\u003c/em\u003e, 3(1). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e.https://doi.org/10.1016/j.mcpdig.2024.11.005\u003c/span\u003e\u003cspan address=\".10.1016/j.mcpdig.2024.11.005\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAviv, B., \u0026amp; Ramakanth, K. (2025). How important is domain-specific language model pretraining and instruction finetuning for biomedical relation extraction? \u003cem\u003eNatural Language Processing and Information Systems\u003c/em\u003e, 15836: 80\u0026ndash;94. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-031-97141-9_6\u003c/span\u003e\u003cspan address=\"10.1007/978-3-031-97141-9_6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBeall, J. (2013). Predatory publishing is just one of the consequences of gold open access. Learned Publishing, 2013, 26(2), 79\u0026ndash;84. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1087/20130203\u003c/span\u003e\u003cspan address=\"10.1087/20130203\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, L. X., Su, S. W., Liao, C. H., et al. (2023). An open automation system for predatory journal detection. \u003cem\u003eScientific Reports\u003c/em\u003e, 13(1), 2976. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-023-30045-9\u003c/span\u003e\u003cspan address=\"10.1038/s41598-023-30045-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, M., Wang, L. Z. (2022). An Altmetrics and citation analysis of selected predatory journals in library and information science field. The Journal of Academic Librarianship, (48):102628. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.acalib.2022.102618\u003c/span\u003e\u003cspan address=\"10.1016/j.acalib.2022.102618\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDadkhah, M., Rahimnia, F., Rafati, N. S., et al. (2022). Jourchain: Using blockchain to avoid questionable journals. \u003cem\u003eIrish Journal of Medical Science\u003c/em\u003e, 191(3), 1435\u0026ndash;1439. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11845-021-02784-z\u003c/span\u003e\u003cspan address=\"10.1007/s11845-021-02784-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDeepSeek-AI. (2025). DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. Retrieved August 14, 2025, from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf\u003c/span\u003e\u003cspan address=\"https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDOAJ. (n.d.). Directory of open access journals. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doaj.org/\u003c/span\u003e\u003cspan address=\"https://doaj.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Retrieved July 30, 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDorfner, F. J., MSc, A. D., Busch, F., et al. (2025). Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks. \u003cem\u003eJournal of the American Medical\u003c/em\u003e, 1\u0026ndash;10. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/jamia/ocaf045\u003c/span\u003e\u003cspan address=\"10.1093/jamia/ocaf045\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHosseini, M., Horbach, S. P. J. M., Holmes, K., et al. (2025). Open science at the generative AI turn: An exploratory analysis of challenges and opportunities. \u003cem\u003eQuantitative Science Studies\u003c/em\u003e, 6, 22\u0026ndash;45. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1162/qss_a_00338\u003c/span\u003e\u003cspan address=\"10.1162/qss_a_00338\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu, E. J., Shen, Y., Wallis, P., et al. (2021). LoRA: Low-rank adaptation of large language models. \u003cem\u003eArXiv\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2106.09685\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2106.09685\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang, Y., Tang, K., \u0026amp; Chen, M. (2024). Leveraging large language models for enhanced NLP task performance through knowledge distillation and optimized training strategies. \u003cem\u003eArXiv\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2402.09282\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2402.09282\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKang, M., Lee, S., Baek, J., Kawaguchi, K., \u0026amp; Hwang, S. J. (2023). Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks. In Proceedings of \u003cem\u003ethe 37th International Conference on Neural Information Processing Systems (NIPS '23)\u003c/em\u003e (Article 2109, pp. 1\u0026ndash;30). Curran Associates Inc. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2305.18395\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2305.18395\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLeslie, C., Darius, C., Michael, E., et al. (2002). Budapest Open Access Initiative. Retrieved January 24, 2026, from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.budapestopenaccessinitiative.org/read/\u003c/span\u003e\u003cspan address=\"https://www.budapestopenaccessinitiative.org/read/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin, T. Y., Wang, Y. X., Liu, X. Y., et al. (2021). A survey of transformers. \u003cem\u003eAI Open\u003c/em\u003e, (3): 111\u0026ndash;132. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.aiopen.2021.07.002\u003c/span\u003e\u003cspan address=\"10.1016/j.aiopen.2021.07.002\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuo, P. C., Hong, L. Z., \u0026amp; Nie, L. (2025). Automatic classification of research data sets into the Chinese Library Classification with generative large language model. \u003cem\u003eThe Electronic Library\u003c/em\u003e, 43(4), 600\u0026ndash;618. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1108/EL-02-2025-0042\u003c/span\u003e\u003cspan address=\"10.1108/EL-02-2025-0042\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMajdik, Z. P., Graham, S. S., Edward, J. C. S., et al. (2024). Sample size considerations for fine-tuning large language models for named entity recognition tasks: Methodological study. \u003cem\u003eJMIR AI\u003c/em\u003e, 3: e52095. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/52095\u003c/span\u003e\u003cspan address=\"10.2196/52095\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePagnoni, A., Balachandran, V., \u0026amp; Tsvetkov, Y. (2021). Understanding factuality in abstractive summarization with FRANK: A benchmark for factuality metrics. In \u003cem\u003eProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologie\u003c/em\u003es (pp. 4812\u0026ndash;4829). Association for Computational Linguistics. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.18653/v1/2021.naacl-main.383\u003c/span\u003e\u003cspan address=\"10.18653/v1/2021.naacl-main.383\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQiu, H., Zhang, S., He, H., et al. (2024). Facilitating pornographic text detection for open-domain dialogue systems via knowledge distillation of large language models. \u003cem\u003eArXiv\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2403.13250\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2403.13250\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRadford, A., \u0026amp; Narasimhan, K. (2018). Improving language understanding by generative pre-training. \u003cem\u003eOpenAI\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://api.semanticscholar.org/CorpusID:49313245\u003c/span\u003e\u003cspan address=\"https://api.semanticscholar.org/CorpusID:49313245\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSakhi, T. (2023). *Effectiveness of natural language processing techniques in categorizing scientific articles by research methodology* [Master\u0026rsquo;s thesis, University of Twente].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShamseer, L., Moher, D., Maduekwe, O., et al. (2017). Potential predatory and legitimate biomedical journals: Can you tell the difference? A cross-sectional comparison. \u003cem\u003eBMC Medicine\u003c/em\u003e, 15(1), 28\u0026ndash;42. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12916-017-0785-9\u003c/span\u003e\u003cspan address=\"10.1186/s12916-017-0785-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSiler, K., Vincent-Lamarre, P., Sugimoto, C. R., et al. (2021). Predatory publishers\u0026rsquo; latest scam: Bootlegged and rebranded papers. \u003cem\u003eNature\u003c/em\u003e, 598(7882), 563\u0026ndash;565. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/d41586-021-02866-z\u003c/span\u003e\u003cspan address=\"10.1038/d41586-021-02866-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTonmoy, S. M., Zaman, S. M., Jain, V., et al. (2024). A comprehensive survey of hallucination mitigation techniques in large language models. Retrieved August 14, 2025, from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/2401.01313\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/2401.01313\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan Dis, E. A. M., Bollen, J., Zuidema, W., et al. (2023). ChatGPT: Five priorities for research. \u003cem\u003eNature\u003c/em\u003e, 614(7947), 224\u0026ndash;226. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/d41586-023-00288-7\u003c/span\u003e\u003cspan address=\"10.1038/d41586-023-00288-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, H., Fu, T., Du, Y., et al. (2023). Scientific discovery in the age of artificial intelligence. \u003cem\u003eNature\u003c/em\u003e, 620(7972), 47\u0026ndash;60. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41586-023-06221-2\u003c/span\u003e\u003cspan address=\"10.1038/s41586-023-06221-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang, W., \u0026amp; Qiao, H. (2024). How natural language processing enables AIGC recognition? Latest trends and future prospects. \u003cem\u003eIn Proceedings of the 2024 7th International Conference on Software Engineering and Information Management\u003c/em\u003e (pp. 103\u0026ndash;109). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1145/3647722.3647738\u003c/span\u003e\u003cspan address=\"10.1145/3647722.3647738\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu, J. H., Liu, T. Y., Mu, K. L., et al. (2024). Identification and causal analysis of predatory open access journals based on interpretable machine learning. \u003cem\u003eScientometrics\u003c/em\u003e, 129(4), 2121\u0026ndash;2158. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11192-024-05027-x\u003c/span\u003e\u003cspan address=\"10.1007/s11192-024-05027-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang, Y. F., Wang, Z. Y., He, Z. T., et al. (2024). BB-GeoGPT: A framework for learning a large language model for geographic information science. \u003cem\u003eInformation Processing \u0026amp; Management\u003c/em\u003e, 61(5), 103808. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ipm.2024.103808\u003c/span\u003e\u003cspan address=\"10.1016/j.ipm.2024.103808\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZheng, Y., Zhang, R., Zhang, J., et al. (2024). LlamaFactory: Unified efficient fine-tuning of 100\u0026thinsp;+\u0026thinsp;language models. \u003cem\u003eArXiv\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2403.13372\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2403.13372\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Predatory Journals, Large Language Models, Machine Learning, Open Access","lastPublishedDoi":"10.21203/rs.3.rs-9029371/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9029371/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis study investigates whether the method of fine-tuning language models can be applied to the task of predatory journal identification and seeks to identify the optimal fine-tuning strategy feasible in similar practical environments. This study employs the Low-Rank Adaptation (LoRA) method to perform instruction-supervised fine-tuning on open-source distilled models (such as DeepSeek-R1-Distill-Qwen-1.5B) based on different strategies. Additionally, three machine learning algorithms and a general large language model API call solution were introduced to compare the performance differences between fine-tuned models, traditional classifiers, and non-fine-tuned large models. The results indicate that a 1.5B model fine-tuned with 398 structured samples surpassed the performance of non-fine-tuned general large models in the specific task, achieving an accuracy of 76%. A 7B model fine-tuned using the same strategy achieved an accuracy of 92%. The comparison revealed that fine-tuning can enhance the performance of distilled models in executing domain-specific tasks, and an increase in the parameter scale of the baseline model can significantly improve the performance of its fine-tuned version in the specific task.\u003c/p\u003e","manuscriptTitle":"A Method for Identifying Predatory Journals Driven by Large Language Models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-20 11:05:50","doi":"10.21203/rs.3.rs-9029371/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"df336f2c-e7e2-49fd-a3af-ba8513ef1e43","owner":[],"postedDate":"March 20th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-03-20T11:05:50+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-20 11:05:50","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9029371","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9029371","identity":"rs-9029371","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00