ClinMED-LLAMA3 A Large Model of Clinical Medicine Based on LLAMA3 | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article ClinMED-LLAMA3 A Large Model of Clinical Medicine Based on LLAMA3 MingYi Wei, Xin Liu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4703651/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract ClinMED-LLAMA3 is a large language model designed specifically for the field of clinical medicine. Based on the LLAMA3-8B-Instruct model, it has been professionally adapted through supervised fine-tuning and LoRA technology. The novelty of this model lies in the construction of its dedicated clinical medicine 50K dataset and the establishment of the Alpaca dataset. Through dual verification by humans and AI, the high quality and diversity of the data are ensured, providing a rich set of clinical interactions for model training. The model also possesses the capability for offline private deployment, effectively addressing medical institutions' concerns about data privacy and security. ClinMED-LLAMA3 has been optimized in terms of energy consumption, reducing the demand for computational resources, while maintaining high performance. Large language model LoRA Supervised fine-tuning LLAMA3 Clinical medical engineering Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 1. Introduction Clinical Medical Engineering is an interdisciplinary field that combines medical knowledge, engineering techniques, and data analysis to enhance the quality and efficiency of medical services. With the explosive growth of medical data and the advancement of artificial intelligence technology, large models play an increasingly crucial role in this field [ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] . Large models, with their vast data capacity and complex network structures, can capture and learn subtle patterns and deep features in medical data. Compared to traditional machine learning models, large models demonstrate higher flexibility and accuracy when dealing with high-dimensional data. They can automatically extract features using deep learning technology, significantly reducing the dependency on specialized knowledge while improving the model's generalization ability. In the field of clinical medicine, large models refer to machine learning models with complex network structures trained on massive data. These models, with their powerful data processing capabilities, pattern recognition, and prediction accuracy, provide unprecedented support for clinical decision-making. They can automatically extract deep features from medical data, identifying patterns and associations that may be difficult for human experts to detect, playing a significant role in disease diagnosis, treatment recommendation, and patient monitoring. The self-learning and continuous optimization capabilities of large models enable them to adapt to changing medical environments, improving the efficiency and quality of medical services. Large models exhibit exceptional capabilities in handling clinical data. They can process and analyze large-scale medical datasets from various sources, such as Electronic Health Records (EHR), medical imaging, and genomic sequences. These data usually feature high dimensionality and complexity, making them challenging to handle with traditional analysis methods. Large models, through deep learning technology, can automatically identify and extract key features, providing support for clinical decision-making. For example, in radiology, large models can identify and categorize lesions, assisting physicians in making more accurate diagnoses. Large models have a significant advantage in pattern recognition. They can discover potential patterns and associations from complex medical data, which may be difficult for human experts to identify. In disease diagnosis, large models can predict the occurrence and progression of diseases by analyzing physiological data and medical history of patients. Furthermore, they can identify subtle connections between different diseases, providing a basis for personalized treatment. Large models surpass traditional methods in prediction accuracy. They can handle datasets with high-dimensional features, providing more precise prediction results. In clinical practice, accurate predictions are crucial for early diagnosis of diseases, formulation of treatment plans, and evaluation of patient prognosis. Large models, by learning from extensive historical data, can predict a patient's response to specific treatments, thereby assisting physicians in selecting the most appropriate treatment plan. Large models can automate many clinical processes, enhancing the efficiency of medical services. They can be used for patient classification, risk assessment, resource allocation, and other tasks, reducing the workload of doctors and nurses and allowing them to focus on more complex clinical decisions. Automated processes not only speed up medical services but also reduce human errors, improving the quality and safety of patient care. Unlike traditional static models, large models possess the ability for continuous learning and self-optimization. With the continuous input of new data, they can constantly adjust and optimize their parameters to adapt to changes in the medical environment. This ability enables large models to quickly adapt and provide effective solutions when dealing with emerging diseases or rare cases. Large models can integrate knowledge and data from different disciplines, providing comprehensive solutions for complex clinical problems [ 6 ][ 7 ][ 8 ] . They can fuse data and knowledge from fields such as bioinformatics, radiology, and epidemiology, providing comprehensive support for disease diagnosis, treatment, and prevention. Interdisciplinary integration not only promotes the exchange and innovation of knowledge but also provides patients with more comprehensive and personalized medical services. MedGPT is a medical large language model based on the Transformer architecture, specifically designed for predicting medical concepts in clinical narratives [ 9 ] . Unlike traditional prediction methods, MedGPT can extract valuable medical information from unstructured Electronic Health Records (EHRs) for disease prediction and diagnostic assistance. MedGPT has demonstrated superior performance, particularly in handling noisy and fine-grained data, with an accuracy ranging from 0.344 to 0.640, a significant improvement over the LSTM model's 0.329 to 0.633. A medical large model developed jointly by Huimei Technology and Intel, based on CPU large model inference technology, has achieved seamless integration with the Clinical Decision Support System (CDSS) deployed in hospitals [ 10 ] . This deployment method not only reduces costs but also enhances the accessibility and practicality of large models in medical institutions. The Huimei medical large model already possesses the ability for differential diagnosis and automatic generation of medical records, and is expected to show its potential in more diagnostic and treatment processes in the future. MedicalGPT is a Chinese-English medical question-answering model based on LLaMA-13B, fine-tuned using Low-Rank Adaptation (LoRA) technology. The project implements a four-stage training process: secondary pre-training, supervised fine-tuning, reward modeling, and reinforcement learning training. The training of Chimed-gpt used a large amount of Chinese medical data to enhance the model's performance in medical question answering [ 11 ] . Moreover, the Chimed-gpt project also provides a call example based on the textgen library, facilitating developers to use and integrate the LLaMA model for medical question answering. The code and model of Chimed-gpt have been open-sourced on GitHub for interested researchers and developers. Current medical large models have some significant deficiencies and challenges in research and application, which limit their widespread application and deep development in the field of clinical medicine. Medical large models have limitations in realizing private offline deployment, mainly because the architectural design of some models does not take into account the sensitivity of medical data and the high demands of medical institutions for data privacy. Medical institutions often need to process data locally to ensure compliance, but existing models may require cloud infrastructure support, restricting their applicability under strict data protection policies. The high demand of medical large models for computational resources leads to dual pressures of cost and environment in actual application. These models usually require high-performance GPU clusters for training and inference, which not only increases the financial burden of medical institutions but also contradicts global carbon reduction targets. Therefore, researchers need to explore more efficient algorithms and hardware optimizations to reduce the model's dependence on computational resources. Although medical large models have made progress in general medical information processing, their application in specific clinical medical fields is still lacking. The field of clinical medicine is highly segmented, with each area having its unique terminology, processes, and diagnostic standards. Existing models often lack in-depth learning and understanding of these segmented fields, resulting in limited accuracy and applicability in actual clinical applications. Some medical large models are based on older versions of LLaMA or other early model architectures. These architectures may not fully utilize the latest technological advances, such as innovative algorithms in natural language processing and machine learning. As medical knowledge continues to update, older architectures may not adapt to new data representation and processing needs, resulting in a lack of accuracy in understanding complex clinical problems. The performance of medical large models largely depends on the quality and diversity of training data. However, existing data may have biases, such as insufficient representation of patient groups or bias in data collection methods, which may limit the model's generalization ability in specific situations. Additionally, the timeliness and accuracy of data are critical factors; outdated or incorrect data will directly affect the quality of the model's output. In response to the limitations of existing medical large models, we propose ClinMed-LLAMA3, a large model aimed at providing more accurate, secure, and efficient clinical medical solutions through a series of innovative methods. One of the core contributions of ClinMed-LLAMA3 is the establishment of its dedicated clinical medical 50K dataset. This dataset is meticulously curated through in-depth analysis and collection of daily hospital consultation records, combined with the knowledge of human experts and the advanced language understanding capabilities of GPT4. This dual-channel data collection and screening mechanism ensures the high quality and coverage of the dataset, enabling the model to more accurately understand and respond to complex scenarios in clinical consultations. ClinMed-LLAMA3 adopts the latest LLAMA3-8B-Instruct model as its base, a large language model with 8 billion parameters, possessing powerful understanding and generation capabilities. Through specialized fine-tuning in the field of clinical medicine, ClinMed-LLAMA3 not only inherits the strong performance of the base model but also further adapts to human consultation processes and habits. It can clearly understand the descriptive language of patients' conditions and provide more professional and humanized medical consultations. ClinMed-LLAMA3 supports offline private deployment, a feature that effectively addresses the concerns of medical institutions regarding data privacy and security. The model can run on the local servers of medical institutions, ensuring that all sensitive data is retained within the institution, preventing leakage or unauthorized access, and meeting the strict requirements of the medical industry for data protection. In terms of energy consumption, ClinMed-LLAMA3 has undergone special optimization to achieve a lower energy consumption ratio. Through algorithm improvement and model simplification, ClinMed-LLAMA3 significantly reduces the demand for computational resources while maintaining high performance, reducing energy consumption, which is of significant importance for reducing the operating costs and environmental impact of medical institutions. The development of the ClinMed-LLAMA3 large model is an important supplement and improvement to existing medical large models. Through the establishment of a dedicated dataset, the application of advanced base models, the support for private deployment, and the optimization of the energy consumption ratio, it provides a more accurate, secure, and efficient intelligent tool for the field of clinical medicine. With continuous technological advancement and deepening application, ClinMed-LLAMA3 is expected to become an important aid in the medical industry, promoting the improvement of medical service quality and the development of medical innovation. 2. Related Work 2.1 Evolution of the LLAMA Series Models The LLAMA series represents significant advancements in the field of large language models, with each iteration enhancing capabilities and addressing limitations on the basis of its predecessor. This section outlines the developmental trajectory from LLAMA1 to LLAMA3 [ 12 ][ 13 ] . The initial LLAMA model laid the groundwork for advanced language processing capabilities. It employed a robust architecture to tackle complex natural language understanding tasks. LLAMA1 introduced a series of cutting-edge features at that time, setting the benchmark for subsequent models. However, like all pioneering models, it had its limitations, including limited context-awareness and a lack of nuanced understanding of specialized fields, such as clinical medicine. Building on LLAMA1, the second-generation model aimed to address some of the shortcomings of the initial model. LLAMA2 expanded the model's understanding of context, allowing for more coherent and relevant responses across a wider range of topics. Improvements in training techniques enabled the model to generate more human-like text and understand complex language structures. Despite these advancements, LLAMA2 still faced challenges in specialized fields where subtle language differences and domain-specific knowledge required more refined handling. The most recent evolution, LLAMA3, took significant strides in overcoming the limitations of previous models. With a substantial increase in model scale, LLAMA3, boasting an 8-billion parameter configuration, significantly enhanced its language understanding and generation capabilities. The model's architecture was fine-tuned to better capture the subtleties of human language, including idiomatic expressions and domain-specific terminology. The instruct adjustment feature of LLAMA3 was a game-changer, enabling it to follow instructions more accurately and generate responses that are not only contextually relevant but also in line with user intent. In the context of clinical medicine, the evolution from LLAMA1 to LLAMA3 is particularly noteworthy. The enhanced language understanding and generation capabilities of the latest model are crucial for interpreting complex and subtle dialogues occurring in clinical settings. The move towards more specialized and fine-tuned models like ClinMed-LLAMA3 signifies a commitment to harnessing the most advanced language models for clinical applications, where accuracy and understanding are paramount. The evolution of the LLAMA series reflects the ongoing progress of artificial intelligence and its growing relevance in specialized fields like clinical medicine. Each new version brings a deeper understanding of language and a greater ability to generate context-appropriate responses. Looking forward, integrating the capabilities of LLAMA3 into models like ClinMed-LLAMA3 brings immense hope for transforming clinical practice through AI-driven insights and support. 2.2 SFT and LoRA Fine-Tuning Techniques As the application of large language models (LLMs) extends to specialized fields such as medicine, fine-tuning these models to meet specific needs becomes increasingly important. Supervised Fine-Tuning (SFT) is a widely used approach for adapting pre-trained models to target tasks [ 14 ][ 15 ] . It involves training the model on a dataset specific to the task, allowing the model to learn patterns relevant to the task and adjust its weights accordingly. In the context of medical LLMs, SFT enables the model to absorb medical knowledge and terminology from datasets such as electronic health records or biomedical literature. This process typically involves presenting the model with examples of the desired task, such as medical question-answer pairs, and adjusting the model's parameters to minimize the difference between the model's predictions and the actual answers. Layer-wise Relevance Analysis (LoRA) is an efficient SFT fine-tuning method that modifies only a small portion of the model parameters, specifically the output weights of the linear layers. By adjusting the pre-trained weights using a low-rank matrix, LoRA allows the model to adapt to new tasks without significantly increasing computational costs or losing generalization capabilities. This method is particularly useful for large models like LLaMA, where the number of parameters is vast, and full fine-tuning would be resource-intensive. SFT provides the model with a deep understanding of the medical field through extensive training on task-specific data, while LoRA ensures this adaptation is achieved with minimal computational overhead. This synergy enables the model to develop a nuanced understanding of clinical concepts and medical language, crucial for high-quality medical AI applications. The benefits of using SFT and LoRA include performance improvements on specialized tasks, enhanced fine-tuning efficiency, and preservation of the model's original capabilities. In essence, LoRA is a technique for fine-tuning large pre-trained models. It achieves parameter updates by introducing a low-rank matrix into specific parts of the model, rather than updating the entire model's parameters. This approach significantly reduces the demand for computational resources and maintains the model's generalization capabilities. The central idea of LoRA is to multiply a low-rank matrix on the model's weight matrix to adjust the weights rather than directly updating them. Specifically, if we have an original weight matrix WW, LoRA introduces two low-rank matrices UU and VV. The formula can be expressed as Eq. (1) , where W_new is the updated weight matrix, W is the original weight matrix, U is an m * r matrix, m is the number of rows in W, r is the low-rank, V is an n * r matrix, n is the number of columns in W, and V^T is the transpose of V. LoRA updates the parameters by multiplying a low-rank matrix on the model's weight matrix. The low-rank matrices U and V allow LoRA to adjust weights using fewer parameters, thereby reducing the demand for computational resources. LoRA adapts to new tasks through small adjustments rather than massively changing the model's parameters, which helps maintain the model's generalization capabilities. W_new = W + U * V^T (1) 3. Method 3.1 Establishment of the Alpaca Dataset The dataset used in this study, Alpaca, was obtained through a rigorous process of manual and GPT4-assisted review and cleaning. Our data collection process began with the selection of common clinical medical questions from actual doctor consultation records. These questions were identified based on their ubiquity and relevance in the clinical setting. To ensure the quality and accuracy of the questions, a team of medical professionals thoroughly reviewed and validated the extracted data. Following the initial manual screening, these questions were further refined using the state-of-the-art language model GPT4. The task of GPT4 was to learn from the screened questions and generate answers that are both medically accurate and consistent with clinical practice. This dual screening process involving manual and AI review ensured high standard quality and diversity of the dataset. The Alpaca dataset comprises approximately 50,000 question-answer pairs, each meticulously designed to reflect real clinical scenarios. The dataset aims to cover a wide range of clinical topics, ensuring the model can learn to handle various medical consultations. To keep pace with the latest advancements in language model training, the Alpaca dataset employs an instruction format to structure the questions. This format aims to provide clear, concise prompts to guide the model in understanding the context and intent of each question. The instruction format also allows for the inclusion of specific directives to fine-tune the model's responses as per the clinical application's requirements. The establishment of the Alpaca dataset is a critical step in the development of the ClinMed-LLAMA3 model. By providing a rich and diverse set of clinical interactions, the dataset enables the model to learn the complexities of medical dialogues. Moreover, adherence to the instruction format ensures the model can effectively interpret and respond to complex clinical questions, making it a valuable asset in the field of clinical medicine. In summary, the Alpaca dataset, with its stringent manual and AI screening process and adherence to the instruction format, forms the cornerstone of our methodology. It aims to facilitate the training of a comprehensive and context-aware clinical medical language model, laying the foundation for advancements in AI-assisted clinical decision-making. 3.2 LoRA Fine-tuning for Localized LLM The fine-tuning of the LLaMA3-8B model using Low-Rank Adaptation (LoRA) is a complex process that allows us to efficiently adjust the large language model (LLM) locally to accommodate the specifics of the Alpaca dataset. LoRA modifies the weights of the pre-trained model by introducing a low-rank matrix, focusing on those specific components that significantly impact the model's performance. The fine-tuning process initiates with the model and tokenizer initialization, setting up the computational device, and defining key hyperparameters. The base_model parameter specifies the LLaMA3-8B model variant to be used, while the data_path points to the location of the Alpaca dataset. The parameters lora_r, lora_alpha, and lora_dropout in the LoraConfig class determine the rank, scaling factor, and dropout rate of the LoRA adaptation, respectively. The model is then prepared for fine-tuning by setting the LoRA configuration, which adjusts specific modules of the model's attention mechanism such as the query (q_proj), key (k_proj), value (v_proj), and output (o_proj) projection layers. The function prepare_model_for_int8_training optimizes the model for 8-bit precision training, reducing computational demands. During training, custom training parameters are utilized with transformers.Trainer. The batch_size and micro_batch_size define the overall and per-device batch sizes, respectively, while gradient_accumulation_steps aggregate multiple micro-batches into a single training step. The learning_rate sets the pace of learning, and num_epochs specifies the number of passes through the entire dataset. The LoRA configuration is integrated by applying the get_peft_model function. If a resume_from_checkpoint is provided, any existing checkpoint weights are loaded using the set_peft_model_state_dict function, allowing training to resume from a specific point. Optionally, training progress can be logged and monitored using Weights and Biases (WandB) through the wandb_project, wandb_run_name, and other related parameters. The model's performance is evaluated periodically, with checkpoints saved to the output_dir. The best-performing model based on the validation set is saved for future use. Table 1 summarizes the key parameters of the LoRA fine-tuning process, highlighting the critical steps and their associated parameters, providing a clear overview of the methodology applied to the LLaMA3-8B model using the Alpaca dataset. Table 1 LoRA finetune process Step Description Parameters 1. Initialize model and tokenizer base_model, tokenizer 2. Prepare model for LoRA lora_r, lora_alpha, lora_dropout 3. Configure training parameters batch_size, learning_rate, num_epochs 4. Integrate LoRA lora_target_modules 5. Train model train_data, val_data 6. Evaluate and log (optional) wandb_project, wandb_run_name 7. Save checkpoints and final model output_dir 3.3 System Architecture As depicted in Fig. 1 , the system architecture for fine-tuning the LLAMA3-8B model using the Alpaca dataset is organized into distinct levels, each built upon the preceding one, forming a coherent and efficient training environment. The topmost layer is the user interface, serving as the point of interaction for users and researchers. It provides access to the system's functionalities, enabling the initiation of the training process, monitoring progress, and managing fine-tuning parameters. Directly beneath the user interface is the LLAMA3-8B fine-tuning layer. This layer encompasses the fine-tuning process of the LLAMA3-8B model, responsible for executing the training routine, managing the data flow from the Alpaca dataset, and applying the LoRA technique for efficient parameter updates. The Peft adapter layer is an intermediary component that facilitates the application of the LoRA method. It acts as a bridge between the LLAMA3-8B model and the fine-tuning process, ensuring the low-rank matrix adaptation is correctly integrated into the model's parameters. The LoRA layer represents the core of the LoRA technology. Here, the low-rank matrix is defined and applied to the target modules of the LLAMA3-8B model. This layer is responsible for effectively adjusting the model's weights to adapt to the clinical language and patterns present in the Alpaca dataset. The LLAMA3-8B layer is the foundational layer, representing the pre-trained model itself. This is a robust language model with a multitude of parameters pre-trained on various datasets. This layer provides the foundation for the fine-tuning process. At the bottom of the architecture is the Alpaca dataset. It serves as the source of training data, consisting of a curated collection of clinical medical question-answer pairs. This dataset is crucial for training the model to understand and generate medically relevant responses. In summary, the system architecture aims to simplify the fine-tuning process of the LLAMA3-8B model for clinical applications. From the data provided by the Alpaca dataset to user interaction via the user interface, each layer plays a pivotal role, converging into an efficient and effective training system capable of producing specialized language models for clinical use. 4. Result 4.1 Alpaca Dataset Prompt Analysis As depicted in Fig. 2 , this section delves into an in-depth analysis of the prompts from the Alpaca dataset, exploring the impact of text length on model performance. The provided graph illustrates the relationship between the length of prompts and the number of fully covered examples, revealing the diversity of prompt lengths within the dataset. Despite the absence of specific numerical values, the visual analysis of the graph indicates a broad distribution of prompt lengths, ranging from shorter to longer texts, which is highly beneficial for training a model capable of adapting to varying input lengths. Further, although not directly deducible from the graph, we can infer that the average and median length of prompts likely reside in the central region of the distribution, offering an intuitive understanding of the dataset's central tendency. The length of the text directly influences the model's comprehension and generative abilities: shorter prompts necessitate a stronger generalization capability from the model, while longer prompts provide richer contextual information, aiding in-depth understanding. Based on this analysis, we have derived some strategies for optimizing prompts: firstly, concise prompts assist the model in quickly capturing the core of the problem; secondly, for situations requiring more context, increasing the text length appropriately is necessary; finally, maintaining a balance of different length prompts in the training data prevents the model from overfitting to a specific length range. In conclusion, the analysis of text length in the Alpaca dataset's prompts provides valuable insights, guiding us in constructing a robust clinical medical language model. The diversity and balance of text lengths are crucial for ensuring the model's efficacy in handling various clinical issues. Future work will take these factors into account to further optimize the dataset and model training strategies. 4.2 LoRA Fine-tuning on Alpaca Dataset As demonstrated in Fig. 3 , a detailed analysis of the experiment involving LoRA fine-tuning on the Alpaca dataset provides crucial insights into the model's learning dynamics via the trend in the loss function. Initially, with the model facing a completely new dataset, the initial loss value is relatively high due to the model's yet-to-be-effective capture of data features. However, as the training progresses, the model gradually adapts to the dataset's characteristics, a process intuitively reflected in the decreasing loss value. In the early stages of training, the loss value starts to decrease from 2.1883, an initial loss value that reflects the model's preliminary fit to the dataset. As the training deepens, with every certain number of sample iterations, a significant decrease in loss value is observed. For instance, when the sample size reaches 10%, the loss value drops to 2.11, revealing the model's rapid learning process in the initial phase. As the training continues, the rate of loss value decrease begins to slow, but a steady downward trend is still maintained. When the sample size reaches 20%, the loss value drops to 1.7616, and further to 1.4807 at 50% sample size. This continuous decrease indicates the model's ongoing optimization of its parameters to better adapt to the data distribution. In the mid-late phase of training, the decline in the loss value becomes more gradual but continues to decrease. For example, when the sample size reaches 80%, the loss value reduces to 1.3245, indicating a deeper understanding of the dataset by the model. Finally, towards the end of training, the decrease in the loss value stabilizes, suggesting that the model may have approached or achieved its optimal performance on the current dataset. It is noteworthy that the decrease in loss value during training is not always smooth. At certain stages, we may observe a slight increase in the loss value, possibly due to overfitting when the model encounters complex or anomalous samples in the dataset. However, through appropriate regularization techniques and learning rate adjustments, the model can continue to learn and improve. Moreover, the decrease in loss value is closely related to the model's generalization ability. A good model not only performs well on training data but also generalizes to unseen data. Therefore, while monitoring the loss value, we also need to pay attention to the model's performance on the validation set to ensure that the model does not lose its generalizability due to overfitting. In summary, a meticulous analysis of the loss value changes during the LoRA fine-tuning process on the Alpaca dataset provides a comprehensive understanding of the model's learning progress and performance. This information is crucial for further adjusting model parameters, improving model structure, and optimizing training strategies. 4.3 Professional Q&A Dialogue with ClinMED-LLAMA3 As illustrated in Fig. 4 and Fig. 5 , ClinMED-LLAMA3 is a large-scale language model specifically fine-tuned for the clinical medicine field. Based on the content provided in the images, we can evaluate the performance of this model in professional Q&A dialogues. From an accuracy perspective, ClinMED-LLAMA3 provides detailed and professional analyses in response to questions about sleep disorders. The model not only identifies the issues faced by the user, such as difficulty falling asleep, frequent dreaming, and easy awakening, but also suggests relevant tests to confirm the presence of insomnia or other conditions, such as depression and anxiety. Moreover, the model provides lifestyle adjustment suggestions to improve sleep quality, such as maintaining regular sleep schedules, avoiding overeating and caffeine intake, and engaging in appropriate exercise. These suggestions are based on clinical medical knowledge and meet the standards of professional medical advice. From a professionalism standpoint, the responses of ClinMED-LLAMA3 reflect the expertise and depth of clinical medicine. The model's answers include a brief introduction to clinical medicine, emphasizing its application in the diagnosis, treatment, rehabilitation, and prevention of diseases, as well as the importance of promoting health. Additionally, the model sets parameters to adjust the creativity and accuracy of its responses, such as setting creativity at 0.6, top-p at 0.9, top-k at 50, and penalty at 1.2, all aimed at optimizing the professionalism and relevance of the model's output. In its self-introduction, ClinMED-LLAMA3 also demonstrates its professionalism. The model describes itself as a clinical medicine expert with extensive practical experience and theoretical knowledge, capable of answering questions about health, disease prevention, and treatment, and encourages users to pose specific questions for assistance. From a user experience perspective, the responses of the ClinMED-LLAMA3 model are clear and logical, making it easy for users to understand and follow. The model's answers not only provide solutions to the problems but also guide users to seek professional medical help promptly when self-help measures are ineffective, reflecting the model's sense of responsibility and care in handling medical issues. In summary, ClinMED-LLAMA3 demonstrates high accuracy and professionalism in professional Q&A dialogues, providing users with reliable and professional medical information and advice. 5. Conclusion and Future Work The development of the ClinMED-LLAMA3 model marks a significant step forward in the application of large-scale language models in the field of clinical medicine. Through the combination of a carefully designed Clinical Medicine 50K dataset and LoRA fine-tuning technology, ClinMED-LLAMA3 displays high accuracy and professionalism in professional Q&A dialogues. The model's capability for offline private deployment effectively addresses the high standards for data privacy and security required by medical institutions. Additionally, ClinMED-LLAMA3's efforts in energy consumption optimization reduce the demand for computational resources while maintaining high performance, which is significant for reducing operating costs and environmental impact for medical institutions. Future work will focus on the continuous iteration and optimization of the ClinMED-LLAMA3 model to ensure it maintains cutting-edge performance in the ever-evolving field of clinical medicine. Declarations Author Contribution Wei Mingyi contributed to the overall methodology by collecting, cleaning, and checking data. Liu Xin was responsible for the implementation of the experiment and the drawing of charts Data Availability Data is provided within the manuscript or supplementary information files References Mak, Kit-Kay, and Mallikarjuna Rao Pichika. "Artificial intelligence in drug development: present status and future prospects." Drug discovery today 24, no. 3 (2019): 773-780. Wang, Ding‐Qiao, Long‐Yu Feng, Jin‐Guo Ye, Jin‐Gen Zou, and Ying‐Feng Zheng. "Accelerating the integration of ChatGPT and other large‐scale AI models into biomedical research and healthcare." MedComm–Future Medicine 2, no. 2 (2023): e43. Haug, Charlotte J., and Jeffrey M. Drazen. "Artificial intelligence and machine learning in clinical medicine, 2023." New England Journal of Medicine 388, no. 13 (2023): 1201-1208. Weissler, E. Hope, Tristan Naumann, Tomas Andersson, Rajesh Ranganath, Olivier Elemento, Yuan Luo, Daniel F. Freitag et al. "The role of machine learning in clinical research: transforming the future of evidence generation." Trials 22 (2021): 1-15. Shehab, Mohammad, Laith Abualigah, Qusai Shambour, Muhannad A. Abu-Hashem, Mohd Khaled Yousef Shambour, Ahmed Izzat Alsalibi, and Amir H. Gandomi. "Machine learning in medical applications: A review of state-of-the-art methods." Computers in Biology and Medicine 145 (2022): 105458. Shah, Pratik, Francis Kendall, Sean Khozin, Ryan Goosen, Jianying Hu, Jason Laramie, Michael Ringel, and Nicholas Schork. "Artificial intelligence and machine learning in clinical development: a translational perspective." NPJ digital medicine 2, no. 1 (2019): 69. Zhou, Tongxue, Su Ruan, and Stéphane Canu. "A review: Deep learning for medical image segmentation using multi-modality fusion." Array 3 (2019): 100004. Shen, Dinggang, Guorong Wu, and Heung-Il Suk. "Deep learning in medical image analysis." Annual review of biomedical engineering 19, no. 1 (2017): 221-248. Kraljevic, Zeljko, Anthony Shek, Daniel Bean, Rebecca Bendayan, James Teo, and Richard Dobson. "MedGPT: Medical concept prediction from clinical narratives." arXiv preprint arXiv:2107.03134 (2021). Levkovich, Inbar, and Zohar Elyoseph. "Identifying depression and its determinants upon initiating treatment: ChatGPT versus primary care physicians." Family Medicine and Community Health 11, no. 4 (2023). Tian, Yuanhe, Ruyi Gan, Yan Song, Jiaxing Zhang, and Yongdong Zhang. "Chimed-gpt: A chinese medical large language model with full training regime and better alignment to human preferences." arXiv preprint arXiv:2311.06025 (2023). Touvron, Hugo, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière et al. "Llama: Open and efficient foundation language models." arXiv preprint arXiv:2302.13971 (2023). Touvron, Hugo, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov et al. "Llama 2: Open foundation and fine-tuned chat models." arXiv preprint arXiv:2307.09288 (2023). Santacroce, Michael, Yadong Lu, Han Yu, Yuanzhi Li, and Yelong Shen. "Efficient rlhf: Reducing the memory usage of ppo." arXiv preprint arXiv:2309.00754 (2023). Hu, Edward J., Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. "Lora: Low-rank adaptation of large language models." arXiv preprint arXiv:2106.09685 (2021). Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4703651","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":335276671,"identity":"6d451c77-c2c3-40a2-aced-e067294a5590","order_by":0,"name":"MingYi Wei","email":"","orcid":"","institution":"Shenzhen University","correspondingAuthor":false,"prefix":"","firstName":"MingYi","middleName":"","lastName":"Wei","suffix":""},{"id":335276672,"identity":"47a3c228-7134-4c68-b766-9a903920797c","order_by":1,"name":"Xin Liu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+ElEQVRIiWNgGAWjYBACfv7GxgcfKmzq7dsbwAKMDYS0SM44fNhwxpm0BAOeA0RqMTiQlibN23Y4wUAigUgtDAfOGEjwnDmcZy75/OFjHgYb2Q0HmJ89wKeDsbnHwECiIr3YcnaOsTEPQ5rxhgNs5gb4tDAznDFIMDhjzdhwO4dNmofhcOKGAzxsEvi0sDHkGBxIbGNmbLh5/PlvHob/hLUAnZLYcLDNOXHDDQYzZh6GA4S1SEgcPszYcCbNWLInx1hyjkGy8czDbGZ4tdifb2z//afCRo6f/fjDD28q7GT7jjc/w6sFDYCCipkE9aNgFIyCUTAKsAMA/fRO7H99TFAAAAAASUVORK5CYII=","orcid":"","institution":"Southeast University","correspondingAuthor":true,"prefix":"","firstName":"Xin","middleName":"","lastName":"Liu","suffix":""}],"badges":[],"createdAt":"2024-07-08 07:58:29","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4703651/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4703651/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":61926292,"identity":"ad43504d-4567-474a-bf40-a0d1b96c7b53","added_by":"auto","created_at":"2024-08-07 07:14:01","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":15523,"visible":true,"origin":"","legend":"\u003cp\u003eSystem architecture of ClinMED-LLAMA3.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-4703651/v1/cd40c12bf13cb3a258e224d3.png"},{"id":61926288,"identity":"9341ab14-69ff-4bc2-8a0b-9910208fe68c","added_by":"auto","created_at":"2024-08-07 07:14:00","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":22388,"visible":true,"origin":"","legend":"\u003cp\u003eClinMED-LLAMA3 prompt and guide word text analysis.\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-4703651/v1/f7a7bb3d1853820b3d8999d0.jpeg"},{"id":61927053,"identity":"a5695760-cad8-4103-bac4-345a9c50d241","added_by":"auto","created_at":"2024-08-07 07:22:00","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":19873,"visible":true,"origin":"","legend":"\u003cp\u003eAs the training progress is fine tuned, the loss changes.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-4703651/v1/d933548c6aee4b8f8987bc51.png"},{"id":61926290,"identity":"452d12cb-03ed-44ea-92c1-001a6a2a35f3","added_by":"auto","created_at":"2024-08-07 07:14:01","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":44063,"visible":true,"origin":"","legend":"\u003cp\u003eThe model provides self introduction.\u003c/p\u003e","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-4703651/v1/88fc23b00341cf0a676a1e6c.jpeg"},{"id":61926291,"identity":"d0505646-c19a-486e-bf6f-4bbb5d673725","added_by":"auto","created_at":"2024-08-07 07:14:01","extension":"jpeg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":55628,"visible":true,"origin":"","legend":"\u003cp\u003eThe model provides consultation suggestions.\u003c/p\u003e","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-4703651/v1/743fd43f573c832426142d47.jpeg"},{"id":68135622,"identity":"6e4d80a0-5701-4752-959d-d988352b0091","added_by":"auto","created_at":"2024-11-04 03:31:59","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":497671,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4703651/v1/bc2c0ad7-5b6a-4d5d-b3c1-3f2cd8eec4c8.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"ClinMED-LLAMA3 A Large Model of Clinical Medicine Based on LLAMA3","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eClinical Medical Engineering is an interdisciplinary field that combines medical knowledge, engineering techniques, and data analysis to enhance the quality and efficiency of medical services. With the explosive growth of medical data and the advancement of artificial intelligence technology, large models play an increasingly crucial role in this field \u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e][\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e][\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e][\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e][\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]\u003c/sup\u003e. Large models, with their vast data capacity and complex network structures, can capture and learn subtle patterns and deep features in medical data. Compared to traditional machine learning models, large models demonstrate higher flexibility and accuracy when dealing with high-dimensional data. They can automatically extract features using deep learning technology, significantly reducing the dependency on specialized knowledge while improving the model's generalization ability. In the field of clinical medicine, large models refer to machine learning models with complex network structures trained on massive data. These models, with their powerful data processing capabilities, pattern recognition, and prediction accuracy, provide unprecedented support for clinical decision-making. They can automatically extract deep features from medical data, identifying patterns and associations that may be difficult for human experts to detect, playing a significant role in disease diagnosis, treatment recommendation, and patient monitoring. The self-learning and continuous optimization capabilities of large models enable them to adapt to changing medical environments, improving the efficiency and quality of medical services. Large models exhibit exceptional capabilities in handling clinical data. They can process and analyze large-scale medical datasets from various sources, such as Electronic Health Records (EHR), medical imaging, and genomic sequences. These data usually feature high dimensionality and complexity, making them challenging to handle with traditional analysis methods. Large models, through deep learning technology, can automatically identify and extract key features, providing support for clinical decision-making. For example, in radiology, large models can identify and categorize lesions, assisting physicians in making more accurate diagnoses. Large models have a significant advantage in pattern recognition. They can discover potential patterns and associations from complex medical data, which may be difficult for human experts to identify. In disease diagnosis, large models can predict the occurrence and progression of diseases by analyzing physiological data and medical history of patients. Furthermore, they can identify subtle connections between different diseases, providing a basis for personalized treatment. Large models surpass traditional methods in prediction accuracy. They can handle datasets with high-dimensional features, providing more precise prediction results. In clinical practice, accurate predictions are crucial for early diagnosis of diseases, formulation of treatment plans, and evaluation of patient prognosis. Large models, by learning from extensive historical data, can predict a patient's response to specific treatments, thereby assisting physicians in selecting the most appropriate treatment plan. Large models can automate many clinical processes, enhancing the efficiency of medical services. They can be used for patient classification, risk assessment, resource allocation, and other tasks, reducing the workload of doctors and nurses and allowing them to focus on more complex clinical decisions. Automated processes not only speed up medical services but also reduce human errors, improving the quality and safety of patient care. Unlike traditional static models, large models possess the ability for continuous learning and self-optimization. With the continuous input of new data, they can constantly adjust and optimize their parameters to adapt to changes in the medical environment. This ability enables large models to quickly adapt and provide effective solutions when dealing with emerging diseases or rare cases. Large models can integrate knowledge and data from different disciplines, providing comprehensive solutions for complex clinical problems \u003csup\u003e[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e][\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e][\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]\u003c/sup\u003e. They can fuse data and knowledge from fields such as bioinformatics, radiology, and epidemiology, providing comprehensive support for disease diagnosis, treatment, and prevention. Interdisciplinary integration not only promotes the exchange and innovation of knowledge but also provides patients with more comprehensive and personalized medical services.\u003c/p\u003e \u003cp\u003eMedGPT is a medical large language model based on the Transformer architecture, specifically designed for predicting medical concepts in clinical narratives\u003csup\u003e[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003c/sup\u003e. Unlike traditional prediction methods, MedGPT can extract valuable medical information from unstructured Electronic Health Records (EHRs) for disease prediction and diagnostic assistance. MedGPT has demonstrated superior performance, particularly in handling noisy and fine-grained data, with an accuracy ranging from 0.344 to 0.640, a significant improvement over the LSTM model's 0.329 to 0.633. A medical large model developed jointly by Huimei Technology and Intel, based on CPU large model inference technology, has achieved seamless integration with the Clinical Decision Support System (CDSS) deployed in hospitals\u003csup\u003e[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]\u003c/sup\u003e. This deployment method not only reduces costs but also enhances the accessibility and practicality of large models in medical institutions. The Huimei medical large model already possesses the ability for differential diagnosis and automatic generation of medical records, and is expected to show its potential in more diagnostic and treatment processes in the future. MedicalGPT is a Chinese-English medical question-answering model based on LLaMA-13B, fine-tuned using Low-Rank Adaptation (LoRA) technology. The project implements a four-stage training process: secondary pre-training, supervised fine-tuning, reward modeling, and reinforcement learning training. The training of Chimed-gpt used a large amount of Chinese medical data to enhance the model's performance in medical question answering\u003csup\u003e[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/sup\u003e. Moreover, the Chimed-gpt project also provides a call example based on the textgen library, facilitating developers to use and integrate the LLaMA model for medical question answering. The code and model of Chimed-gpt have been open-sourced on GitHub for interested researchers and developers. Current medical large models have some significant deficiencies and challenges in research and application, which limit their widespread application and deep development in the field of clinical medicine. Medical large models have limitations in realizing private offline deployment, mainly because the architectural design of some models does not take into account the sensitivity of medical data and the high demands of medical institutions for data privacy. Medical institutions often need to process data locally to ensure compliance, but existing models may require cloud infrastructure support, restricting their applicability under strict data protection policies. The high demand of medical large models for computational resources leads to dual pressures of cost and environment in actual application. These models usually require high-performance GPU clusters for training and inference, which not only increases the financial burden of medical institutions but also contradicts global carbon reduction targets. Therefore, researchers need to explore more efficient algorithms and hardware optimizations to reduce the model's dependence on computational resources. Although medical large models have made progress in general medical information processing, their application in specific clinical medical fields is still lacking. The field of clinical medicine is highly segmented, with each area having its unique terminology, processes, and diagnostic standards. Existing models often lack in-depth learning and understanding of these segmented fields, resulting in limited accuracy and applicability in actual clinical applications. Some medical large models are based on older versions of LLaMA or other early model architectures. These architectures may not fully utilize the latest technological advances, such as innovative algorithms in natural language processing and machine learning. As medical knowledge continues to update, older architectures may not adapt to new data representation and processing needs, resulting in a lack of accuracy in understanding complex clinical problems. The performance of medical large models largely depends on the quality and diversity of training data. However, existing data may have biases, such as insufficient representation of patient groups or bias in data collection methods, which may limit the model's generalization ability in specific situations. Additionally, the timeliness and accuracy of data are critical factors; outdated or incorrect data will directly affect the quality of the model's output.\u003c/p\u003e \u003cp\u003eIn response to the limitations of existing medical large models, we propose ClinMed-LLAMA3, a large model aimed at providing more accurate, secure, and efficient clinical medical solutions through a series of innovative methods. One of the core contributions of ClinMed-LLAMA3 is the establishment of its dedicated clinical medical 50K dataset. This dataset is meticulously curated through in-depth analysis and collection of daily hospital consultation records, combined with the knowledge of human experts and the advanced language understanding capabilities of GPT4. This dual-channel data collection and screening mechanism ensures the high quality and coverage of the dataset, enabling the model to more accurately understand and respond to complex scenarios in clinical consultations. ClinMed-LLAMA3 adopts the latest LLAMA3-8B-Instruct model as its base, a large language model with 8\u0026nbsp;billion parameters, possessing powerful understanding and generation capabilities. Through specialized fine-tuning in the field of clinical medicine, ClinMed-LLAMA3 not only inherits the strong performance of the base model but also further adapts to human consultation processes and habits. It can clearly understand the descriptive language of patients' conditions and provide more professional and humanized medical consultations. ClinMed-LLAMA3 supports offline private deployment, a feature that effectively addresses the concerns of medical institutions regarding data privacy and security. The model can run on the local servers of medical institutions, ensuring that all sensitive data is retained within the institution, preventing leakage or unauthorized access, and meeting the strict requirements of the medical industry for data protection. In terms of energy consumption, ClinMed-LLAMA3 has undergone special optimization to achieve a lower energy consumption ratio. Through algorithm improvement and model simplification, ClinMed-LLAMA3 significantly reduces the demand for computational resources while maintaining high performance, reducing energy consumption, which is of significant importance for reducing the operating costs and environmental impact of medical institutions. The development of the ClinMed-LLAMA3 large model is an important supplement and improvement to existing medical large models. Through the establishment of a dedicated dataset, the application of advanced base models, the support for private deployment, and the optimization of the energy consumption ratio, it provides a more accurate, secure, and efficient intelligent tool for the field of clinical medicine. With continuous technological advancement and deepening application, ClinMed-LLAMA3 is expected to become an important aid in the medical industry, promoting the improvement of medical service quality and the development of medical innovation.\u003c/p\u003e "},{"header":"2. Related Work","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Evolution of the LLAMA Series Models\u003c/h2\u003e \u003cp\u003eThe LLAMA series represents significant advancements in the field of large language models, with each iteration enhancing capabilities and addressing limitations on the basis of its predecessor. This section outlines the developmental trajectory from LLAMA1 to LLAMA3\u003csup\u003e[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e][\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/sup\u003e. The initial LLAMA model laid the groundwork for advanced language processing capabilities. It employed a robust architecture to tackle complex natural language understanding tasks. LLAMA1 introduced a series of cutting-edge features at that time, setting the benchmark for subsequent models. However, like all pioneering models, it had its limitations, including limited context-awareness and a lack of nuanced understanding of specialized fields, such as clinical medicine. Building on LLAMA1, the second-generation model aimed to address some of the shortcomings of the initial model. LLAMA2 expanded the model's understanding of context, allowing for more coherent and relevant responses across a wider range of topics. Improvements in training techniques enabled the model to generate more human-like text and understand complex language structures. Despite these advancements, LLAMA2 still faced challenges in specialized fields where subtle language differences and domain-specific knowledge required more refined handling. The most recent evolution, LLAMA3, took significant strides in overcoming the limitations of previous models. With a substantial increase in model scale, LLAMA3, boasting an 8-billion parameter configuration, significantly enhanced its language understanding and generation capabilities. The model's architecture was fine-tuned to better capture the subtleties of human language, including idiomatic expressions and domain-specific terminology. The instruct adjustment feature of LLAMA3 was a game-changer, enabling it to follow instructions more accurately and generate responses that are not only contextually relevant but also in line with user intent. In the context of clinical medicine, the evolution from LLAMA1 to LLAMA3 is particularly noteworthy. The enhanced language understanding and generation capabilities of the latest model are crucial for interpreting complex and subtle dialogues occurring in clinical settings. The move towards more specialized and fine-tuned models like ClinMed-LLAMA3 signifies a commitment to harnessing the most advanced language models for clinical applications, where accuracy and understanding are paramount. The evolution of the LLAMA series reflects the ongoing progress of artificial intelligence and its growing relevance in specialized fields like clinical medicine. Each new version brings a deeper understanding of language and a greater ability to generate context-appropriate responses. Looking forward, integrating the capabilities of LLAMA3 into models like ClinMed-LLAMA3 brings immense hope for transforming clinical practice through AI-driven insights and support.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 SFT and LoRA Fine-Tuning Techniques\u003c/h2\u003e \u003cp\u003eAs the application of large language models (LLMs) extends to specialized fields such as medicine, fine-tuning these models to meet specific needs becomes increasingly important. Supervised Fine-Tuning (SFT) is a widely used approach for adapting pre-trained models to target tasks\u003csup\u003e[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e][\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]\u003c/sup\u003e. It involves training the model on a dataset specific to the task, allowing the model to learn patterns relevant to the task and adjust its weights accordingly. In the context of medical LLMs, SFT enables the model to absorb medical knowledge and terminology from datasets such as electronic health records or biomedical literature. This process typically involves presenting the model with examples of the desired task, such as medical question-answer pairs, and adjusting the model's parameters to minimize the difference between the model's predictions and the actual answers. Layer-wise Relevance Analysis (LoRA) is an efficient SFT fine-tuning method that modifies only a small portion of the model parameters, specifically the output weights of the linear layers. By adjusting the pre-trained weights using a low-rank matrix, LoRA allows the model to adapt to new tasks without significantly increasing computational costs or losing generalization capabilities. This method is particularly useful for large models like LLaMA, where the number of parameters is vast, and full fine-tuning would be resource-intensive. SFT provides the model with a deep understanding of the medical field through extensive training on task-specific data, while LoRA ensures this adaptation is achieved with minimal computational overhead. This synergy enables the model to develop a nuanced understanding of clinical concepts and medical language, crucial for high-quality medical AI applications. The benefits of using SFT and LoRA include performance improvements on specialized tasks, enhanced fine-tuning efficiency, and preservation of the model's original capabilities. In essence, LoRA is a technique for fine-tuning large pre-trained models. It achieves parameter updates by introducing a low-rank matrix into specific parts of the model, rather than updating the entire model's parameters. This approach significantly reduces the demand for computational resources and maintains the model's generalization capabilities. The central idea of LoRA is to multiply a low-rank matrix on the model's weight matrix to adjust the weights rather than directly updating them. Specifically, if we have an original weight matrix WW, LoRA introduces two low-rank matrices UU and VV. The formula can be expressed as \u003cb\u003eEq.\u0026nbsp;(1)\u003c/b\u003e, where W_new is the updated weight matrix, W is the original weight matrix, U is an m * r matrix, m is the number of rows in W, r is the low-rank, V is an n * r matrix, n is the number of columns in W, and V^T is the transpose of V. LoRA updates the parameters by multiplying a low-rank matrix on the model's weight matrix. The low-rank matrices U and V allow LoRA to adjust weights using fewer parameters, thereby reducing the demand for computational resources. LoRA adapts to new tasks through small adjustments rather than massively changing the model's parameters, which helps maintain the model's generalization capabilities.\u003c/p\u003e \u003cp\u003eW_new\u0026thinsp;=\u0026thinsp;W\u0026thinsp;+\u0026thinsp;U * V^T (1)\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Method","content":" \u003ch2\u003e3.1 Establishment of the Alpaca Dataset\u003c/h2\u003e \u003cp\u003eThe dataset used in this study, Alpaca, was obtained through a rigorous process of manual and GPT4-assisted review and cleaning. Our data collection process began with the selection of common clinical medical questions from actual doctor consultation records. These questions were identified based on their ubiquity and relevance in the clinical setting. To ensure the quality and accuracy of the questions, a team of medical professionals thoroughly reviewed and validated the extracted data. Following the initial manual screening, these questions were further refined using the state-of-the-art language model GPT4. The task of GPT4 was to learn from the screened questions and generate answers that are both medically accurate and consistent with clinical practice. This dual screening process involving manual and AI review ensured high standard quality and diversity of the dataset. The Alpaca dataset comprises approximately 50,000 question-answer pairs, each meticulously designed to reflect real clinical scenarios. The dataset aims to cover a wide range of clinical topics, ensuring the model can learn to handle various medical consultations. To keep pace with the latest advancements in language model training, the Alpaca dataset employs an instruction format to structure the questions. This format aims to provide clear, concise prompts to guide the model in understanding the context and intent of each question. The instruction format also allows for the inclusion of specific directives to fine-tune the model's responses as per the clinical application's requirements. The establishment of the Alpaca dataset is a critical step in the development of the ClinMed-LLAMA3 model. By providing a rich and diverse set of clinical interactions, the dataset enables the model to learn the complexities of medical dialogues. Moreover, adherence to the instruction format ensures the model can effectively interpret and respond to complex clinical questions, making it a valuable asset in the field of clinical medicine. In summary, the Alpaca dataset, with its stringent manual and AI screening process and adherence to the instruction format, forms the cornerstone of our methodology. It aims to facilitate the training of a comprehensive and context-aware clinical medical language model, laying the foundation for advancements in AI-assisted clinical decision-making.\u003c/p\u003e \u003ch2\u003e3.2 LoRA Fine-tuning for Localized LLM\u003c/h2\u003e \u003cp\u003eThe fine-tuning of the LLaMA3-8B model using Low-Rank Adaptation (LoRA) is a complex process that allows us to efficiently adjust the large language model (LLM) locally to accommodate the specifics of the Alpaca dataset. LoRA modifies the weights of the pre-trained model by introducing a low-rank matrix, focusing on those specific components that significantly impact the model's performance. The fine-tuning process initiates with the model and tokenizer initialization, setting up the computational device, and defining key hyperparameters. The base_model parameter specifies the LLaMA3-8B model variant to be used, while the data_path points to the location of the Alpaca dataset. The parameters lora_r, lora_alpha, and lora_dropout in the LoraConfig class determine the rank, scaling factor, and dropout rate of the LoRA adaptation, respectively. The model is then prepared for fine-tuning by setting the LoRA configuration, which adjusts specific modules of the model's attention mechanism such as the query (q_proj), key (k_proj), value (v_proj), and output (o_proj) projection layers. The function prepare_model_for_int8_training optimizes the model for 8-bit precision training, reducing computational demands. During training, custom training parameters are utilized with transformers.Trainer. The batch_size and micro_batch_size define the overall and per-device batch sizes, respectively, while gradient_accumulation_steps aggregate multiple micro-batches into a single training step. The learning_rate sets the pace of learning, and num_epochs specifies the number of passes through the entire dataset. The LoRA configuration is integrated by applying the get_peft_model function. If a resume_from_checkpoint is provided, any existing checkpoint weights are loaded using the set_peft_model_state_dict function, allowing training to resume from a specific point. Optionally, training progress can be logged and monitored using Weights and Biases (WandB) through the wandb_project, wandb_run_name, and other related parameters. The model's performance is evaluated periodically, with checkpoints saved to the output_dir. The best-performing model based on the validation set is saved for future use. Table \u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e summarizes the key parameters of the LoRA fine-tuning process, highlighting the critical steps and their associated parameters, providing a clear overview of the methodology applied to the LLaMA3-8B model using the Alpaca dataset.\u003c/p\u003e \u003cp\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e Table 1 \u003cp\u003eLoRA finetune process\u003c/p\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cthead\u003e \u003ctr\u003e \u003cth colname=\"c1\"\u003e \u003cp\u003eStep\u003c/p\u003e \u003c/th\u003e \u003cth colname=\"c2\"\u003e \u003cp\u003eDescription\u003c/p\u003e \u003c/th\u003e \u003cth colname=\"c3\"\u003e \u003cp\u003eParameters\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd colname=\"c1\"\u003e \u003cp\u003e1.\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c2\"\u003e \u003cp\u003eInitialize model and tokenizer\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c3\"\u003e \u003cp\u003ebase_model, tokenizer\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd colname=\"c1\"\u003e \u003cp\u003e2.\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c2\"\u003e \u003cp\u003ePrepare model for LoRA\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c3\"\u003e \u003cp\u003elora_r, lora_alpha, lora_dropout\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd colname=\"c1\"\u003e \u003cp\u003e3.\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c2\"\u003e \u003cp\u003eConfigure training parameters\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c3\"\u003e \u003cp\u003ebatch_size, learning_rate, num_epochs\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd colname=\"c1\"\u003e \u003cp\u003e4.\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c2\"\u003e \u003cp\u003eIntegrate LoRA\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c3\"\u003e \u003cp\u003elora_target_modules\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd colname=\"c1\"\u003e \u003cp\u003e5.\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c2\"\u003e \u003cp\u003eTrain model\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c3\"\u003e \u003cp\u003etrain_data, val_data\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd colname=\"c1\"\u003e \u003cp\u003e6.\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c2\"\u003e \u003cp\u003eEvaluate and log (optional)\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c3\"\u003e \u003cp\u003ewandb_project, wandb_run_name\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd colname=\"c1\"\u003e \u003cp\u003e7.\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c2\"\u003e \u003cp\u003eSave checkpoints and final model\u003c/p\u003e \u003c/td\u003e \u003ctd colname=\"c3\"\u003e \u003cp\u003eoutput_dir\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e \u003c/p\u003e \u003ch2\u003e3.3 System Architecture\u003c/h2\u003e \u003cp\u003eAs depicted in Fig. \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, the system architecture for fine-tuning the LLAMA3-8B model using the Alpaca dataset is organized into distinct levels, each built upon the preceding one, forming a coherent and efficient training environment. The topmost layer is the user interface, serving as the point of interaction for users and researchers. It provides access to the system's functionalities, enabling the initiation of the training process, monitoring progress, and managing fine-tuning parameters. Directly beneath the user interface is the LLAMA3-8B fine-tuning layer. This layer encompasses the fine-tuning process of the LLAMA3-8B model, responsible for executing the training routine, managing the data flow from the Alpaca dataset, and applying the LoRA technique for efficient parameter updates. The Peft adapter layer is an intermediary component that facilitates the application of the LoRA method. It acts as a bridge between the LLAMA3-8B model and the fine-tuning process, ensuring the low-rank matrix adaptation is correctly integrated into the model's parameters. The LoRA layer represents the core of the LoRA technology. Here, the low-rank matrix is defined and applied to the target modules of the LLAMA3-8B model. This layer is responsible for effectively adjusting the model's weights to adapt to the clinical language and patterns present in the Alpaca dataset. The LLAMA3-8B layer is the foundational layer, representing the pre-trained model itself. This is a robust language model with a multitude of parameters pre-trained on various datasets. This layer provides the foundation for the fine-tuning process. At the bottom of the architecture is the Alpaca dataset. It serves as the source of training data, consisting of a curated collection of clinical medical question-answer pairs. This dataset is crucial for training the model to understand and generate medically relevant responses. In summary, the system architecture aims to simplify the fine-tuning process of the LLAMA3-8B model for clinical applications. From the data provided by the Alpaca dataset to user interaction via the user interface, each layer plays a pivotal role, converging into an efficient and effective training system capable of producing specialized language models for clinical use.\u003c/p\u003e "},{"header":"4. Result","content":" \u003ch2\u003e4.1 Alpaca Dataset Prompt Analysis\u003c/h2\u003e \u003cp\u003eAs depicted in Fig. \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, this section delves into an in-depth analysis of the prompts from the Alpaca dataset, exploring the impact of text length on model performance. The provided graph illustrates the relationship between the length of prompts and the number of fully covered examples, revealing the diversity of prompt lengths within the dataset. Despite the absence of specific numerical values, the visual analysis of the graph indicates a broad distribution of prompt lengths, ranging from shorter to longer texts, which is highly beneficial for training a model capable of adapting to varying input lengths. Further, although not directly deducible from the graph, we can infer that the average and median length of prompts likely reside in the central region of the distribution, offering an intuitive understanding of the dataset's central tendency. The length of the text directly influences the model's comprehension and generative abilities: shorter prompts necessitate a stronger generalization capability from the model, while longer prompts provide richer contextual information, aiding in-depth understanding. Based on this analysis, we have derived some strategies for optimizing prompts: firstly, concise prompts assist the model in quickly capturing the core of the problem; secondly, for situations requiring more context, increasing the text length appropriately is necessary; finally, maintaining a balance of different length prompts in the training data prevents the model from overfitting to a specific length range. In conclusion, the analysis of text length in the Alpaca dataset's prompts provides valuable insights, guiding us in constructing a robust clinical medical language model. The diversity and balance of text lengths are crucial for ensuring the model's efficacy in handling various clinical issues. Future work will take these factors into account to further optimize the dataset and model training strategies.\u003c/p\u003e \u003ch2\u003e4.2 LoRA Fine-tuning on Alpaca Dataset\u003c/h2\u003e \u003cp\u003eAs demonstrated in Fig. \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, a detailed analysis of the experiment involving LoRA fine-tuning on the Alpaca dataset provides crucial insights into the model's learning dynamics via the trend in the loss function. Initially, with the model facing a completely new dataset, the initial loss value is relatively high due to the model's yet-to-be-effective capture of data features. However, as the training progresses, the model gradually adapts to the dataset's characteristics, a process intuitively reflected in the decreasing loss value. In the early stages of training, the loss value starts to decrease from 2.1883, an initial loss value that reflects the model's preliminary fit to the dataset. As the training deepens, with every certain number of sample iterations, a significant decrease in loss value is observed. For instance, when the sample size reaches 10%, the loss value drops to 2.11, revealing the model's rapid learning process in the initial phase. As the training continues, the rate of loss value decrease begins to slow, but a steady downward trend is still maintained. When the sample size reaches 20%, the loss value drops to 1.7616, and further to 1.4807 at 50% sample size. This continuous decrease indicates the model's ongoing optimization of its parameters to better adapt to the data distribution. In the mid-late phase of training, the decline in the loss value becomes more gradual but continues to decrease. For example, when the sample size reaches 80%, the loss value reduces to 1.3245, indicating a deeper understanding of the dataset by the model. Finally, towards the end of training, the decrease in the loss value stabilizes, suggesting that the model may have approached or achieved its optimal performance on the current dataset. It is noteworthy that the decrease in loss value during training is not always smooth. At certain stages, we may observe a slight increase in the loss value, possibly due to overfitting when the model encounters complex or anomalous samples in the dataset. However, through appropriate regularization techniques and learning rate adjustments, the model can continue to learn and improve. Moreover, the decrease in loss value is closely related to the model's generalization ability. A good model not only performs well on training data but also generalizes to unseen data. Therefore, while monitoring the loss value, we also need to pay attention to the model's performance on the validation set to ensure that the model does not lose its generalizability due to overfitting. In summary, a meticulous analysis of the loss value changes during the LoRA fine-tuning process on the Alpaca dataset provides a comprehensive understanding of the model's learning progress and performance. This information is crucial for further adjusting model parameters, improving model structure, and optimizing training strategies.\u003c/p\u003e \u003ch2\u003e4.3 Professional Q\u0026amp;A Dialogue with ClinMED-LLAMA3\u003c/h2\u003e \u003cp\u003eAs illustrated in Fig. \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e and Fig. \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, ClinMED-LLAMA3 is a large-scale language model specifically fine-tuned for the clinical medicine field. Based on the content provided in the images, we can evaluate the performance of this model in professional Q\u0026amp;A dialogues. From an accuracy perspective, ClinMED-LLAMA3 provides detailed and professional analyses in response to questions about sleep disorders. The model not only identifies the issues faced by the user, such as difficulty falling asleep, frequent dreaming, and easy awakening, but also suggests relevant tests to confirm the presence of insomnia or other conditions, such as depression and anxiety. Moreover, the model provides lifestyle adjustment suggestions to improve sleep quality, such as maintaining regular sleep schedules, avoiding overeating and caffeine intake, and engaging in appropriate exercise. These suggestions are based on clinical medical knowledge and meet the standards of professional medical advice. From a professionalism standpoint, the responses of ClinMED-LLAMA3 reflect the expertise and depth of clinical medicine. The model's answers include a brief introduction to clinical medicine, emphasizing its application in the diagnosis, treatment, rehabilitation, and prevention of diseases, as well as the importance of promoting health. Additionally, the model sets parameters to adjust the creativity and accuracy of its responses, such as setting creativity at 0.6, top-p at 0.9, top-k at 50, and penalty at 1.2, all aimed at optimizing the professionalism and relevance of the model's output. In its self-introduction, ClinMED-LLAMA3 also demonstrates its professionalism. The model describes itself as a clinical medicine expert with extensive practical experience and theoretical knowledge, capable of answering questions about health, disease prevention, and treatment, and encourages users to pose specific questions for assistance. From a user experience perspective, the responses of the ClinMED-LLAMA3 model are clear and logical, making it easy for users to understand and follow. The model's answers not only provide solutions to the problems but also guide users to seek professional medical help promptly when self-help measures are ineffective, reflecting the model's sense of responsibility and care in handling medical issues. In summary, ClinMED-LLAMA3 demonstrates high accuracy and professionalism in professional Q\u0026amp;A dialogues, providing users with reliable and professional medical information and advice.\u003c/p\u003e "},{"header":"5. Conclusion and Future Work","content":"\u003cp\u003eThe development of the ClinMED-LLAMA3 model marks a significant step forward in the application of large-scale language models in the field of clinical medicine. Through the combination of a carefully designed Clinical Medicine 50K dataset and LoRA fine-tuning technology, ClinMED-LLAMA3 displays high accuracy and professionalism in professional Q\u0026amp;A dialogues. The model's capability for offline private deployment effectively addresses the high standards for data privacy and security required by medical institutions. Additionally, ClinMED-LLAMA3's efforts in energy consumption optimization reduce the demand for computational resources while maintaining high performance, which is significant for reducing operating costs and environmental impact for medical institutions. Future work will focus on the continuous iteration and optimization of the ClinMED-LLAMA3 model to ensure it maintains cutting-edge performance in the ever-evolving field of clinical medicine.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eWei Mingyi contributed to the overall methodology by collecting, cleaning, and checking data. Liu Xin was responsible for the implementation of the experiment and the drawing of charts\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eData is provided within the manuscript or supplementary information files\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eMak, Kit-Kay, and Mallikarjuna Rao Pichika. \u0026quot;Artificial intelligence in drug development: present status and future prospects.\u0026quot; Drug discovery today 24, no. 3 (2019): 773-780.\u003c/li\u003e\n\u003cli\u003eWang, Ding‐Qiao, Long‐Yu Feng, Jin‐Guo Ye, Jin‐Gen Zou, and Ying‐Feng Zheng. \u0026quot;Accelerating the integration of ChatGPT and other large‐scale AI models into biomedical research and healthcare.\u0026quot; MedComm\u0026ndash;Future Medicine 2, no. 2 (2023): e43.\u003c/li\u003e\n\u003cli\u003eHaug, Charlotte J., and Jeffrey M. Drazen. \u0026quot;Artificial intelligence and machine learning in clinical medicine, 2023.\u0026quot; New England Journal of Medicine 388, no. 13 (2023): 1201-1208.\u003c/li\u003e\n\u003cli\u003eWeissler, E. Hope, Tristan Naumann, Tomas Andersson, Rajesh Ranganath, Olivier Elemento, Yuan Luo, Daniel F. Freitag et al. \u0026quot;The role of machine learning in clinical research: transforming the future of evidence generation.\u0026quot; Trials 22 (2021): 1-15.\u003c/li\u003e\n\u003cli\u003eShehab, Mohammad, Laith Abualigah, Qusai Shambour, Muhannad A. Abu-Hashem, Mohd Khaled Yousef Shambour, Ahmed Izzat Alsalibi, and Amir H. Gandomi. \u0026quot;Machine learning in medical applications: A review of state-of-the-art methods.\u0026quot; Computers in Biology and Medicine 145 (2022): 105458.\u003c/li\u003e\n\u003cli\u003eShah, Pratik, Francis Kendall, Sean Khozin, Ryan Goosen, Jianying Hu, Jason Laramie, Michael Ringel, and Nicholas Schork. \u0026quot;Artificial intelligence and machine learning in clinical development: a translational perspective.\u0026quot; NPJ digital medicine 2, no. 1 (2019): 69.\u003c/li\u003e\n\u003cli\u003eZhou, Tongxue, Su Ruan, and St\u0026eacute;phane Canu. \u0026quot;A review: Deep learning for medical image segmentation using multi-modality fusion.\u0026quot; Array 3 (2019): 100004.\u003c/li\u003e\n\u003cli\u003eShen, Dinggang, Guorong Wu, and Heung-Il Suk. \u0026quot;Deep learning in medical image analysis.\u0026quot; Annual review of biomedical engineering 19, no. 1 (2017): 221-248.\u003c/li\u003e\n\u003cli\u003eKraljevic, Zeljko, Anthony Shek, Daniel Bean, Rebecca Bendayan, James Teo, and Richard Dobson. \u0026quot;MedGPT: Medical concept prediction from clinical narratives.\u0026quot; arXiv preprint arXiv:2107.03134 (2021).\u003c/li\u003e\n\u003cli\u003eLevkovich, Inbar, and Zohar Elyoseph. \u0026quot;Identifying depression and its determinants upon initiating treatment: ChatGPT versus primary care physicians.\u0026quot; Family Medicine and Community Health 11, no. 4 (2023).\u003c/li\u003e\n\u003cli\u003eTian, Yuanhe, Ruyi Gan, Yan Song, Jiaxing Zhang, and Yongdong Zhang. \u0026quot;Chimed-gpt: A chinese medical large language model with full training regime and better alignment to human preferences.\u0026quot; arXiv preprint arXiv:2311.06025 (2023).\u003c/li\u003e\n\u003cli\u003eTouvron, Hugo, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u0026eacute;e Lacroix, Baptiste Rozi\u0026egrave;re et al. \u0026quot;Llama: Open and efficient foundation language models.\u0026quot; arXiv preprint arXiv:2302.13971 (2023).\u003c/li\u003e\n\u003cli\u003eTouvron, Hugo, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov et al. \u0026quot;Llama 2: Open foundation and fine-tuned chat models.\u0026quot; arXiv preprint arXiv:2307.09288 (2023).\u003c/li\u003e\n\u003cli\u003eSantacroce, Michael, Yadong Lu, Han Yu, Yuanzhi Li, and Yelong Shen. \u0026quot;Efficient rlhf: Reducing the memory usage of ppo.\u0026quot; arXiv preprint arXiv:2309.00754 (2023).\u003c/li\u003e\n\u003cli\u003eHu, Edward J., Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. \u0026quot;Lora: Low-rank adaptation of large language models.\u0026quot; arXiv preprint arXiv:2106.09685 (2021).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Large language model, LoRA, Supervised fine-tuning, LLAMA3, Clinical medical engineering","lastPublishedDoi":"10.21203/rs.3.rs-4703651/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4703651/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eClinMED-LLAMA3 is a large language model designed specifically for the field of clinical medicine. Based on the LLAMA3-8B-Instruct model, it has been professionally adapted through supervised fine-tuning and LoRA technology. The novelty of this model lies in the construction of its dedicated clinical medicine 50K dataset and the establishment of the Alpaca dataset. Through dual verification by humans and AI, the high quality and diversity of the data are ensured, providing a rich set of clinical interactions for model training. The model also possesses the capability for offline private deployment, effectively addressing medical institutions' concerns about data privacy and security. ClinMED-LLAMA3 has been optimized in terms of energy consumption, reducing the demand for computational resources, while maintaining high performance.\u003c/p\u003e","manuscriptTitle":"ClinMED-LLAMA3 A Large Model of Clinical Medicine Based on LLAMA3","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-07 07:13:56","doi":"10.21203/rs.3.rs-4703651/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"69462262-661b-442a-8466-04b2a39e8853","owner":[],"postedDate":"August 7th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-11-04T03:23:50+00:00","versionOfRecord":[],"versionCreatedAt":"2024-08-07 07:13:56","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4703651","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4703651","identity":"rs-4703651","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.