Unlocking Scholarly Article Insights: Creating a Scholarly Article Content Extraction Tool –Objectives, Methodology, and a Comparative Analysis of Summary Results

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 88,304 characters · extracted from preprint-html · click to expand
Unlocking Scholarly Article Insights: Creating a Scholarly Article Content Extraction Tool –Objectives, Methodology, and a Comparative Analysis of Summary Results | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Method Article Unlocking Scholarly Article Insights: Creating a Scholarly Article Content Extraction Tool –Objectives, Methodology, and a Comparative Analysis of Summary Results Wei Ma, Michael E. Spagna This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5961778/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract We developed an intuitive prototype scholarly article content extraction tool to help individuals and students extract key content from scholarly articles. Although scholarly articles typically include an abstract, they often lack depth. Existing summarization tools are not always user-friendly and often require setup and learning processes. Most existing tools are designed for the information industry to extract metadata and key content efficiently. This paper introduces our innovative Scholarly Article Content Extraction Tool (SACET), which extracts content from a single scholarly article and instantly returns an accurate natural language summary. SACET is accessible anytime/anywhere without setup or extensive learning. We compare SACET's summaries with those from ChatGPT, highlighting the differences, value, and uniqueness of our innovation. Publishing our innovation is crucial due to its potential to advance content extraction applications. This article discusses the objectives, architecture, content extraction methodology, underlying algorithms, operational workflow, and additional work needed to improve the tool. Information Retrieval and Management Library Science Artificial Intelligence and Machine Learning Artificial Intelligence (AI) Scholarly article content extraction Summarize key content of a scholarly article Text summarization tools Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Research articles are often lengthy and filled with technical language, making them challenging for students and nonprofessionals to fully understand. This can discourage readers, particularly undergraduate students, from engaging in research learning. Universities in the United States are encouraging and supporting faculty to provide opportunities for students to engage in relevant research. This initiative aims to promote student success, increase graduation rates, prepare students for the demands of the workplace, thereby enhancing their critical thinking skills. As a result, college students frequently need to read and comprehend primary research/scholarly articles for their coursework, but many students struggle to do so effectively. Faculty members often note that students lack critical thinking skills, such as analysis and synthesis, which are essential for understanding research processes. Additionally, as highlighted by Nature magazine, “There is an information overload in scientific literature” (Landhuis, 2016). This information overload often overwhelms students with the sheer volume of information, exacerbating the difficulty of reading the scholarly articles. This information overload also presents a significant challenge for faculty, professionals, and scientists who must read and comprehend a large quantity of research reports and articles. There is a clear need for a system that can extract relevant information from scholarly articles and transform it into more readable and understandable formats. As Hong (Hong, 2021) notes, “In recent years, significant progress has been made by the computer science community on techniques for automated information extraction from free text. Yet, transformative application of these techniques to scientific literature remains elusive—due not to a lack of interest or effort but to technical and logistical challenges.” Additionally, while some summarization tools exist, few offer the capabilities needed for developing comprehensive summaries of scholarly articles, and even fewer are easily accessible for individual or student use. Most of these tools were designed for the information industry to efficiently extract metadata and key sentence summaries for information discovery and retrieval. Recognizing the significant needs and challenges, and motivated by the gaps between state-of-the-art information extraction methods and their practical application to scientific texts, we embarked on developing a scholarly article content extraction tool. Our goal was to create a tool that provides better summaries and descriptions of scholarly article content, while being user-friendly and accessible to individual student users, particularly undergraduate students, without requiring additional applications, equipment, or learning processes. Related Work Although significant progress has been made in recent years by research scientists and technical companies in extracting information from scientific literature, our literature review focused solely on single scholarly article summarization, excluding metadata, key sentences, and AI article summarization released after 2019. We reviewed online platforms, company projects, and academic research projects prior to 2019. The purpose of this literature review was to highlight the significant systems available at the time we started this project and to demonstrate the contributions our Extraction Tool could make to this field. Early systems for summarization had been focusing on extracting metadata, finding the most salient sentences from source documents (Mihalcea, 2004) (Erkan, 2004), highlighting important sentences, and adding annotations, such as CiteSeer (developed and became public in 1998 (Wikipedia contributors, 2024), MALLET (released in the early 2000s), ParsCit (developed in the mid-2000s), CERMINE (Released in 2014) (Tkaczyk, 2024). More advanced systems developed later, like MEAD, which was released in early 2001. MEAD includes two baseline summarizers (lead-based and random based systems), and was experimenting with a platform for multi-document, multilingual text and individual document summarization(Radev, 2001). SUMMA, a text summarization toolkit released in 2014, developed an adaptive summarization application and the computation algorithms for computation of various sentence relevance features and functionality for single and multi-document summarization in various languages (Saggion, 2014). Mendeley was a reference manager and academic social network that allowed users to extract highlights, annotations, and important sections from individual scholarly articles (Reiswig, 2010). GROBID (GeneRation Of Bibliographic Data), launched in 2015, was a high performing software environment to extract metadata, bibliographic references or entities in scientific texts. The extracted content was not much more than the original article’s Abstract (Lopez P. R., 2010). Semantic Scholar Tool, launched in 2015, is a huge aggregate corpus that consists of rich metadata, paper abstracts, resolved bibliographic references, as well as structured full text of around 8.1M open access papers. The system allowed a user to search and retrieve relevant papersonly from Semantic Scholar’s aggregated document pool and used Semantic Scholar as a sole search engine . The major problem for this application is that Semantic Scholar deprives students of the opportunity to learn the research processes, and to explore and search subject databases in their learning areas.SummerTime, released in 2021 (a couple years after our set time 2019), was a complete toolkit for text summarization, including various models, datasets and evaluation metrics, for a full spectrum of summarization-related tasks (Ni, 2021). However, users have to learn the “simple” API codes and read explanations for models and evaluation metrics to learn and understand the model behaviors and select models that best suit their needs. By 2019, numerous toolkits had been developed to perform document content summarizations, mostly for the information industry to produce metadata and brief summaries for information discovery and retrieval. While a few systems allowed end users to search for papers on specific subjects and extract sections like abstracts, figures, and references (Arum, 2016), they did not enable users to extract content from their selected scholarly articles, such as single journal articles which a student selected for their research assignments. Most of these toolkits required users to download apps and learn their structures and modules before use. Additionally, since these toolkits were designed for large-scale operations, they could be intimidating for new users, particularly undergraduate students. We believe that such extensive toolkits and document aggregators can hinder students from fully engaging in the research process. It’s essential for students, particularly undergraduates, to develop skills in topic analysis, searching relevant subject databases, and selecting appropriate resources for their research assignments. This hands-on experience is crucial for their academic growth. Our Scholarly Article Content Extraction Tool (SACET): Recognizing the significant needs and challenges, and motivated by the gaps between state-of-the-art information extraction methods and their practical applications to scientific texts, we began exploring the development process of a scholarly article content extraction in 2019. By 2020, we built our first prototype of SACET using Python. However, it did not perform as expected and had numerous limitations and technical issues. We restarted the project in late 2021, utilizing the Java programming language and Apache OpenSource software to convert PDF to text. By March 2022, we successfully developed a new version and began testing the prototype (Prototype v 0.16.0). This web-based/cloud tool is designed for users without a background in natural language processing (NLP) or API coding. The user interface is very simple and accessible anytime and anywhere. Users can access SACET from a personal computer or mobile device to extract content from individually selected scholarly articles. The tool can instantly execute the extraction and deliver the content directly to the user’s computer within the same window browser. Our SACET generates comprehensive summaries that provide the depth needed for a thorough understanding of the original article. Users can obtain the most essential content using our tool. We believe it is crucial to publish our innovation due to its significant potential to advance the field of content extraction. However, we do not intend to replace any existing summarization work. Instead, we aim to offer student users an easy-to-use tool to facilitate their literature searches, enhance their research learning experience, and assist with their research assignments or projects. Method Our SACET Tool comprises three modules, which we will describe in detail in the following sections. Section 1 covers the operating system, developed using Java programming language and Apache OpenSource software to convert PDF to text. This operating system was programmed based on the Content Extraction Process & Governing Algorithm described below. Section 2 shows the user document submission interface, and Section 3 is the summary result display interface. Section 1: Operating system The operating system was developed using the Java programming language to execute the Content Extraction Process & Governing Algorithm. In this section, we will describe the architecture, content extraction methodology, underlying algorithms, operational workflow, which form the core of our SACET. Figure 1. Content Extraction Process & Governing Algorithm Description of the Content Extraction Process & Governing Algorithm (For more details, see Appendix: https://docs.google.com/document/d/10KZiKDS0PCfdtYNV2oizWqjz8jjShYEv/edit?usp=sharing&ouid=105352857917939044173&rtpof=true&sd=true ) : Step 1: Upload PDF article to the server Step 2: Convert original article (PDF) to text format using Apache PDFBox TM OpenSource. (See Appendix p.2) Step 3: Clean the article text body: (See Appendix p.2) 1) Remove article keywords 2) Remove “Categories and Subject” section. 3) If there is “Article Info”, remove the “Article Info” section, and all info about the author(s) section. 4) Remove/navigate publication platforms (See Appendix P. 2 for more info) 5) Remove footnote. (See Appendix P. 3) 6) Remove the section(s) at the end of the article body, which is the stopping point to stop document scanning. (See Appendix P. 3) 7) Remove “.” of the page numbering. (See Appendix p.4) 8) Remove In-text citation formats (See Appendix P. 4) 9) Remove the following [##] or (#,#) (in-text citation), only if it is a word before the “.”. (See Appendix p.5) 10) Clean & protect the abbreviations with “.” if the “.” appears in the middle of a sentence. Click this link for a list of the abbreviations: https://docs.google.com/document/d/1YsehkF8x3hNgTR7V31VRqsF4TwsDTu-t32BAI4opjaw/edit?usp=sharing ) (See Appendix p.5) 11) Protects special characters (used in different subject areas). The following is the list of special characters used in many subject areas. Click on the link for the list: https://docs.google.com/document/d/1ZxId6knRwGqFd10136-Q1ciBIB4BGXJJdNpdVQgkyfA/edit?usp=sharing (See Appendix p.6) Step 4: Keywords extraction scheme: (the whole article body now is cleaned with only the article body left) Start scanning the text body to pull out the keywords (See Appendix p.6). · Keyword “List 1” is the keywords from the title and abstract: · Scan and listed all words from title and abstract, excluding the stop-words, but including nouns, adjectives, and verbs (as List 1). · Ignore stop-words. Stop-word list is below: (https://docs.google.com/document/d/17X7OaUaZgBJoNwNKtrYOevkVCsg9tTwV/edit?usp=sharing&ouid=105352857917939044173&rtpof=true&sd=true ) · Keyword “List 2” is the keywords from the article body: - Start scanning article body, from the paragraph after Abstract, or from the 1 st paragraph, or Introduction paragraph. - Scan and list all words in the whole article body, excluding all stop-words (Stop-word list is above). o List all words from the article body, named “List 2” · Match “List 1” – (against) “List 2” = “List 3” (List 3 is the keywords that are matched between List 1 and List 2) Step 5: Sentence extraction scheme: (See Appendix p.7) · Start scanning from the 1st paragraph (or Introduction). · Extract and list the sentences which include those selected keywords based on the following scheme: · Select the most repeated 3 keywords in List 3 and the following 4 (or 5) middle-repeated keywords in List 3. · Extract the sentences which include ONE (1) of the 3 most repeated keywords + ONE (1) of the 4 (or 5?) medium repeated keywords: 1 of 3 most repeated + 1 of 4 (or 5) medium repeated = sentence to be extracted Step 6: Content extraction: (See Appendix p. 7) · Sentence starting point and stopping point o Extract ONLY the sentence with the selected & qualified keywords (described in keywords extraction scheme above). § Sentence extraction starts after one of the following: xxxx .^ xxxx!^ xxxx?^ xxxx.”^ § Sentence ends at one of the following: xxxx .^ xxxx!^ xxxx?^ xxxx.”^ · Extract section headings if there are section headings (See Appendix p.8) o Extract section headings: o Section Headings to be recognized as: § Extra line space, followed by Capital (1 st letter) words and boldface (less than 9 capital words): Example: one line spacing (sometimes two line spacing) from the previous paragraph: Introduction Literature Review Methodology Result Further Discussion o List the extracted sentences under the appropriate Section headings (if the article has section headings). Step 7: Convert extracted text content to PDF (See Appendix p.8) Step 8: Push the PDF extracted content to the user. (See Appendix p.8) The end. Section 2. User Interface The user interface features a web-based document submission form. Users upload their PDF articles through this form, as illustrated in Figure 2. The form then sends the document to the Java system for processing and extraction. The system follows the outlined steps in Figure 1 to extract content from the scholarly article. Figure 2. User Web-based submission form: Section 3: The Interface Displays Summary Document This section includes the article summary document in PDF format, which is returned to the user after having been processed. After the user submits a PDF article using the Web submission form (see Figure 2.), the system processes it through eight steps (see Figure 1) to extract the content from the original article. The system then returns a summary version of the article in PDF format. At the top of the PDF summary document, the user will see the original file name, the word count of the original article, the word count of the extracted content, the percentage out of the original word count, and the summary content. Please refer to Figure 3. Figure 3. This interface displays the result of the extracted information to the user. The URL of the Prototype Scholarly Article Content Extraction Tool (SACET) is below. This URL takes a user to the Web submission form. http://www.extract-article.com:8080/ Results We invited faculty and students to test using our Tool to extract the scholarly article and received the extracted content. We received positive responses. We compared our SACET’s article summary result with ChatGPT’s summary result. We found that our Tool has a unique feature and extracts better content than ChatGPT’ summary version. A user can get more information out of our SACET’s summary and may skip reading the lengthy original article. But ChatGPT’s summary is so brief (not much more than the abstract) that a user would need to read the original article in order to get the necessary details of the original article. The following is one of the examples of our SACET’s summary result. We compared this summary with ChatGPT’s summary. The original article’s word count is 3939. SACET’s extracted content word count is 1079, which is 27% of the original article’s word count. Figure 4 . is the front page of the original article in PDF format. Table 1 . Is the comparison of the article summaries generated by our SACET and ChatGPT: Table 1 A comparison of the article summaries generated by our SACET and ChatGPT: SACET’s summary: ChatGPT’s summary: PDF file name : new SOD mimic-original.pdf Original article word count : 3939 Extracted content word count : 1079 (27% of the original word count) Title of the article : A new SOD mimic, Mn(III) ortho N-butoxyethylpyridylporphyrin, combines superb potency and lipophilicity with low toxicity The content extracted is as follows : Title : A new SOD mimic, Mn(III) ortho N-butoxyethylpyridylporphyrin, combines superb potency and lipophilicity with low toxicity Authors : Zrinka Rajic, Artak Tovmasyan, Ivan Spasojevic, Huaxin Sheng, Miaomiao Lu, Alice M. Li, Edith B. Gralla, David S. Warner, Ludmil Benov, Ines Batinic-Haberle Journal : Free Radical Biology & Medicine, Volume 52, 2012, Pages 1828–1834 Introduction Mn porphyrin-based SOD mimics, proximities scavengers, and redox modulators of cellular signaling pathways have been developed for over 20 years. .- Based on SAR and our simple O2-specie in vivo model of the aerobic growth of SOD-deficient E. coli, the ortho isomeric Mn(III) N-substituted pyridylporphyrins have emerged as the most potent and sat able SOD mimics with log K cat ~8. Among them, MnTE-2-PyP 5+ has been the most frequently studies compound. Among the lipophilic analogs, MnTnHex-2-PyP 5+ has been the most frequently studied porphyrin. The toxicity of MnTnHex-2-PyP 5+ at higher concentrations/doses is at least in part due to its micellar property, and thus ability to damage membranes. We already observed that the replacement of a CH2 group by oxygen atom in each of the four butyl chains suppressed the toxicity of MnTnBu-2-PyP5+, but unfortunately, it greatly decreased its lipophilicity also. We applied the same approach to the modification of a lipophilic MnTnHex2PyP 5+ , attaching methoxy groups at the end of the hexyl chains, hoping that in this case the lipophilicity of the longer hexyl chains would outbalance the polarity of oxygens. In this work, we showed that if the oxygen atoms are buried within the long-alkyl chains (closer to the pyridyl rings) of MnTnBuOE-2-PyP 5+ , and are thus protected from the extensive solvation, the high lipophilicity is fully preserved. The toxicity of MnTnBuOE-2-PyP 5+ to both mice and Saccharomyces cerevisiae is greatly decreased relative to either MnTnHex-2-PyP (bearing the same number of carbon atoms in pyridyl substituents) or MnTnHep-2-PyP 5+ (of the same length of pyridyl substituents). Experimental The porphyrins (MnTE-2-PyP 5+ , MnTnHex-2-PyP 5+ , and MnTnHep- 2-PyP 5+ ) used throughout this study were synthesized according to the procedures described earlier. While stereoisomers with lipophilic analogs of the ortho alkyl series separate on the TLC plate, under the same conditions with H2TnBuOE-2-PyP 4+ and its Mn complex they do not. Aqueous solutions of Mn porphyrins were filter-sterilized (0.22-um filter, Whatman, Middlesex, UK). In order to more accurately compare the toxicity among MnTnBuOE-2- PyP 5+ , MnTnHex-2 -PyP 5+ , and MnTnHep-2-PyP 5+ , an additional four mice per dose were also injected with single injection of 2.5 or 5 mg/kg MnTnHex-2-PyP 5+ , 5 or 10 mg/ kg MnTnBuOE-2-PyP 5+ , and 2.5, 5 or 10 mg/kg MnTnHep-2-PyP 5+ . Results and discussion To understand the origin of much higher amount of the porphyrin with three me thoxyhexyl groups and one methyl group, we synthesized a number of porphyrins with different chain lengths and different positions of the oxygen atoms in the alkyl chains. The different position of the oxygens in alkoxyalkyl p-toluenesulfonate leads to the formation of cycles of different size and therefore different stability via intramolecular rearrangement; consequently, more or less of the porphyrin species bearing one undesirable alkyl chain will be formed. With porphyrins that contain long alkyl chains, the steric hindrance prevented the easy approach of Mn ion to the porphyrin ring; the metalation of H 2 TnHex2-PyP 5+ occurs at 100°C, while within a few hours at room temperature in the case of MnTnBuOE-2-PyP 5+ . When the pyridyl substituents are lipophilic, the cationic charges on nitrogen’s are not excessively solvated, and in turn exert a stronger electron-withdrawing effect on the Mn site: both MnTnHex-2PyP 5+ and MnTnHep-2-PyP 5+ have ~ 100 mV more positive E1/2 than MnTE-2-PyP 5+ , indicating the less electron deficiency of the latter than of the former two porphyrins (Table 3). However, due to the presence of oxygens in alkyl chains, but oxyalkyl chains are more solvated than hexyl or heptyl chains, which in turn hinders nitrogen charges (and thus sup presses their electron withdrawing effect on the Mn site; consequently, the E1/2 is less positive (Table 3). Additionally, the decrease in E1/2 may occur as a consequence of the electron-donating properties of the alkoxyalkyl substituents. Further, the solvation of the porphyrin cavity formed by pyridyl substituents benefits the catalysis of O 2 − dismutation as it involves the interaction of ionic species: singly charged Mn site and superoxide. Consequently, MnTnBuOE-2-PyP5 + is ~ 1.5- to 2-fold more potent catalyst of O 2 . − dismutation than its alkyl analogs, which has either the same number of carbon atoms in pyridyl substituents (MnTnHex-2-5+), or whose pyridyl substituents are of similar length (MnTnHep-2-PyP5+). They exert toxicity at > 1 M concentrations, which is in part due to their high cellular accumulation, micellar properties, and different redox activity of Mn site. We have, however, observed that mammalian cells are less sensitive to lipophilic Mn porphyrins. Our preliminary data indicated that eukaryotic yeast is also much less sensitive to the lipophilic Mn porphyrins and can tolerate wellupto30 M MnTnHex2-PyP 5+ . If injected ip at2 mg/ kg, MnTnHex-2-PyP 5+ exerted transient toxicity; mice shivered and sat hunched for 60–90 min. The toxicity was more pronouncedwith2.5 mg/kg; mice were barely walking and sometimes showed tail twist. Immediately on the single ip injection of 10 mg/kg of MnTnBuOE-2-PyP 5+ , the mice looked sleepy and did not spontaneously ambulate; the next day they appeared better but not fully recovered. A tail twist and body shaking were seen wit h one mouse, but no mice died. Thus, the toxicity of 10 mg/kgMnTnBuOE-2PyP 5+ appears similar or lower than that observed with 2.5 mg/kg of MnTnHex-2-PyP 5+ . MnTnBuOE-2-PyP 5+ was then injected ip twice daily for 7 days at doses up to 4.5 mg/kg (9 mg/kg total dose per day); no signs of toxicity were observed as measured by rotarod and body weight loss (Fig. 5). The lower toxicity of MnTnBuOE-2-PyP 5+ relative to it s hexyl and heptyl analogs is not fully understood at this point. Further experiments are under way to uncover possible relationships among the toxicity data, E1/2 values for Mn III P/Mn II P redox couple, solvation, charge distribution, and Mn-oxo chemistry of these water-soluble Mn porphyrin-based therapeutics. Conclusions While studying the mechanistic aspects involved in the preparation of the alkoxyalkyl series of Mn porphyrins (MnTMOHex-2(and 3)-PyP 5+ ), with the goal to reduce the toxicity of lipophilic Nalkylpyridylporphyrins, serendipity led us to the preparation of a new porphyrin molecule with a high therapeutic potential, MnTnBuOE-2-PyP 5+ . The presence of the oxygen atoms and their positioning deeper within the hydrophobic long-alkyl chains make them less exposed to the solvent molecules, which in turn results in: (1) preserved lipophilicity of the molecule; (2) increased kcat due to the more solvated cavity around the metal site; (3) disrupted micellar character which in turn resulted in a greatly diminished toxicity of MnTnBuOE-2-PyP 5+ relative to its lipophilic alkyl analogs, MnTnHex-2-PyP 5+ and MnTnHep-2-PyP 5+ in both S. cerevisiae and mouse toxicity studies. In a very simple O. 2 – specific in vivo model, MnTnBuOE-2-PyP5 + proved efficacious in supporting aerobic growth of SOD-deficient S. cerevisiae to the extent of a wild yeast strain counterpart. Abstract : The study introduces Mn(III) meso-tetrakis(N-n-butoxyethylpyridinium-2-yl)porphyrin (MnTnBuOE-2-PyP5+), a new superoxide dismutase (SOD) mimic with high catalytic activity and lipophilicity but low toxicity. This compound was developed by modifying the structure of a previous SOD mimic, MnTnHex-2-PyP5+, to reduce toxicity associated with its micellar properties. The new compound demonstrated protection in SOD-deficient yeast and lower toxicity in mice compared to its predecessor. Introduction : Mn porphyrins are SOD mimics that can also scavenge peroxynitrite and modulate redox reactions. The development process included three phases: 1. Achieving high SOD enzyme potency. • Enhancing lipophilicity to improve bioavailability and efficacy in treating central nervous system injuries. 3. Reducing toxicity by modifying the compound's structure to prevent micelle formation. Experimental Section : The study describes the synthesis and characterization of MnTnBuOE-2-PyP5+. This included standard chemical synthesis techniques and analyses such as UV/vis spectroscopy and elemental analysis. Results : MnTnBuOE-2-PyP5 + retained high SOD-like activity and exhibited significantly lower toxicity in biological models compared to MnTnHex-2-PyP5+. It was protective in yeast at concentrations of 5–30 µM, whereas MnTnHex-2-PyP5 + was toxic within this range. In mouse studies, MnTnBuOE-2-PyP5 + showed lower toxicity. Conclusion : The new Mn porphyrin, MnTnBuOE-2-PyP5+, is a potent and lipophilic SOD mimic with reduced toxicity, making it a promising therapeutic candidate for oxidative stress-related conditions and diseases. This summary highlights the key points of the article, focusing on the development, characterization, and advantages of the new Mn porphyrin compound. Table 1 highlights the differences clearly: ChatGPT provided the original article abstract along with one or two sentences summary of each section. Our SACET provided a better concise summary for each section, emphasizing the main research findings. Each summary’s word count is approximately 20–27% of the original article’s section content. Users felt that our SACET summaries gave them sufficient information and necessary depth for thorough understanding of the original aricle, whereas they needed to read the original article after reviewing ChatGPT’s summary to obtain the necessary details. Conclusion We unveiled our innovative Scholarly Article Content Extraction Tool (SACET), detailing its architecture, content extraction methodology, underlying algorithms, and operational workflow. Additionally, we compared SACET’s summary outputs with those generated by ChatGPT. Users of SACET have noted that it generates more comprehensive summaries, providing the necessary depth for thorough understanding. We believe our research offers significant contributions to the field of scholarly publication summarization. Our SACET is designed for simplicity and user-friendliness. Users can instantly obtain summary results directly on the browser screen without the need to upload or install any apps or access another application. When comparing the summary results of our tool with those of ChatGPT, we found that SACET provides students with a basic understanding of the original article, often eliminating the need to read the original full text. In contrast, ChatGPT’s summaries can be too brief, resembling the original article abstract and lacking the depth required for a comprehensive understanding. Consequently, users may still need to read the original articles after reviewing ChatGPT’s summaries to grasp the full scope of the research. While ChatGPT’s summaries are useful for those seeking a brief overview, our SACET content extraction strategy and algorithm offer more detailed insights. Meanwhile, using our SACET does not disrupt the research learning process for students, especially undergraduates. They can still learn essential skills such as topic analysis, searching subject databases or literature, and selecting resources in which they are interested for their research assignments. By sharing the development strategy and algorithm of our SACET, we aim to drive innovation in creating more effective scholarly publication summary tools. Our goal is to support research and development in scholarly content extraction and to enhance the advancement of these tools. We believe our contributions will enable researchers and developers to produce more advanced tools, facilitating better access to academic research, promoting content dissemination, and enhancing student research learning. Discussion and Further Research There are still areas in SACET that need improvement: Section heading extraction: More work is needed, especially when section headings and subheadings are inconsistent. Non-text content extraction: Further development is required to enable the extraction of relevant non-text content, such as tables, figures, and mathematical expressions. We observed that when converting single PDF articles to text format using Apache PDFBox™ OpenSource, the journal title could be mistaken for the article title if the journal title’s font is larger. This limitation of Apache PDFBox™ OpenSource needs to be addressed. Additionally, old or image-based PDF articles continue to pose challenges for accurate extraction, which is a well-known issue in the content recognition industry. Declarations Consent to participate and publish was obtained References Arum, N. S. (2016). A look at semantic scholar and Google scholar. https://www.academia.edu/download/51616762/1712353.pdf Erkan, G. a. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, , 22: 457–479. https://www.jair.org/index.php/jair/article/download/10396/24901/ Hong, Z. W. (2021). Challenges and advances in information extraction from scientific literature: A review. JOM , 73(11), 3383-3400. https://doi.org/10.1007/s11837-021-04902-9 Landhuis, E. (2016). Scientific literature: Information overload. Nature , 535, 457–458. https://doi.org/10.1038/nj7612-457a. Lopez, P. R. et al (2010, July). HUMB: Automatic key term extraction from scientific articles in GROBID. Proceedings of the 5th international workshop on semantic evaluation. Retrieved from https://aclanthology.org/S10-1055.pdf Mihalcea, R. a. (2004). TextRank: Bringing Order into Text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (pp. pages 404–411). Barcelona, Spain: Association for Computational Linguistics. Retrieve from https://aclanthology.org/W04-3252.pdf Ni, A., Azerbayev, Z., ed al (2021). SummerTime: Text summarization toolkit for non-experts. arXiv preprint arXiv:2108.12738 . Radev, D. R. (2001). Experiments in single and multidocument summarization using MEAD. In First document understanding conference (pp. 1-7). https://arxiv.org/pdf/2108.12738. Reiswig, J. (2010). Mendeley. ournal of the Medical Library Association: JMLA , 98(2), 193. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859264/. Saggion, H. (2014, May). Creating Summarization Systems with SUMMA. In LREC (pp. 4157-4163). Tkaczyk, D. S. (2024). Cermine--automatic extraction of metadata and references from scientific literature. 11th IAPR international workshop on document analysis systems, IEEE. , (pp. 217-221). Retrieve from https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=d60802552c232bcfe486c781bbe3c71ccbcadf5b Wikipedia contributors. (2024, October 13). Wikipedia . Retrieved from Wikipedia: https://en.wikipedia.org/w/index.php?title=CiteSeerX&oldid=1221864764 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5961778","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Method Article","associatedPublications":[],"authors":[{"id":411307221,"identity":"7e86edbb-f80f-43ec-8898-3c09ef278157","order_by":0,"name":"Wei Ma","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAn0lEQVRIiWNgGAWjYHACxgcgYgOQkCBWC7MByVrYJEjTYnDt8LOKH38Oy25nYD54m4coLbfTzG72th023tnAlmxNlBaz2wlmtxkbDiduOMBjJk2klvRvxQx/QFr4vxGrJceMmYENbAsbcVrsb+cUS/a2pRtvOMxmbDmHGC2Ss9M3fvjxx1p2w/HmhzfeEKMFAZhJUz4KRsEoGAWjAB8AAGGXNG2cBDqSAAAAAElFTkSuQmCC","orcid":"","institution":"California State University, Dominguez Hills","correspondingAuthor":true,"prefix":"","firstName":"Wei","middleName":"","lastName":"Ma","suffix":""},{"id":411307228,"identity":"aa354a9e-5f74-4f3e-88c7-639396ed523a","order_by":1,"name":"Michael E. Spagna","email":"","orcid":"","institution":"California State University, Dominguez Hills","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"E.","lastName":"Spagna","suffix":""}],"badges":[],"createdAt":"2025-02-05 03:06:01","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":true,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":true},"doi":"10.21203/rs.3.rs-5961778/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5961778/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":75991339,"identity":"e0536c2a-6055-4ef5-b81e-8e226dfcd386","added_by":"auto","created_at":"2025-02-11 09:16:25","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":346325,"visible":true,"origin":"","legend":"\u003cp\u003eContent Extraction Process \u0026amp; Governing Algorithm\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-5961778/v1/265fb86631c87f10736066be.png"},{"id":75991336,"identity":"75e0a300-35de-450e-bc12-f48b439da091","added_by":"auto","created_at":"2025-02-11 09:16:25","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":185604,"visible":true,"origin":"","legend":"\u003cp\u003eUser Web-based submission form:\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-5961778/v1/0dbec0bdbadcd08762f7238f.png"},{"id":75991335,"identity":"64bb3259-a65e-42e3-a8de-47dd0c7eb348","added_by":"auto","created_at":"2025-02-11 09:16:25","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":62823,"visible":true,"origin":"","legend":"\u003cp\u003eThis interface displays the result of the extracted information to the user.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-5961778/v1/5347b24ef573489f4969fc1b.png"},{"id":75993266,"identity":"a9465f42-0871-48f5-8418-2ab136ebddfd","added_by":"auto","created_at":"2025-02-11 09:32:25","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":433943,"visible":true,"origin":"","legend":"\u003cp\u003eis the front page of the original article in PDF format:\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-5961778/v1/f1cede74fc59834ebafc25e1.png"},{"id":75993269,"identity":"8acccf16-29eb-44f9-b34b-47e793d03fbb","added_by":"auto","created_at":"2025-02-11 09:32:35","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1632677,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5961778/v1/4ae6c4a2-b316-4ddd-a21d-5f5980186c86.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eUnlocking Scholarly Article Insights: Creating a Scholarly Article Content Extraction Tool –Objectives, Methodology, and a Comparative Analysis of Summary Results\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eResearch articles are often lengthy and filled with technical language, making them challenging for students and nonprofessionals to fully understand. This can discourage readers, particularly undergraduate students, from engaging in research learning. Universities in the United States are encouraging and supporting faculty to provide opportunities for students to engage in relevant research. This initiative aims to promote student success, increase graduation rates, prepare students for the demands of the workplace, thereby enhancing their critical thinking skills. As a result, college students frequently need to read and comprehend primary research/scholarly articles for their coursework, but many students struggle to do so effectively. Faculty members often note that students lack critical thinking skills, such as analysis and synthesis, which are essential for understanding research processes. Additionally, as highlighted by Nature magazine, “There is an information overload in scientific literature”\u0026nbsp;(Landhuis, 2016). This information overload often overwhelms students with the sheer volume of information, exacerbating the difficulty of reading the scholarly articles. This information overload also presents a significant challenge for faculty, professionals, and scientists who must read and comprehend a large quantity of research reports and articles.\u003c/p\u003e\n\u003cp\u003eThere is a clear need for a system that can extract relevant information from scholarly articles and transform it into more readable and understandable formats. As Hong (Hong, 2021) notes, “In recent years, significant progress has been made by the computer science community on techniques for automated information extraction from free text. Yet, transformative application of these techniques to scientific literature remains elusive—due not to a lack of interest or effort but to technical and logistical challenges.” Additionally, while some summarization tools exist, few offer the capabilities needed for developing comprehensive summaries of scholarly articles, and even fewer are easily accessible for individual or student use. Most of these tools were designed for the information industry to efficiently extract metadata and key sentence summaries for information discovery and retrieval.\u003c/p\u003e\n\u003cp\u003eRecognizing the significant needs and challenges, and motivated by the gaps between state-of-the-art information extraction methods and their practical application to scientific texts, we embarked on developing a scholarly article content extraction tool. Our goal was to create a tool that provides better summaries and descriptions of scholarly article content, while being user-friendly and accessible to individual student users, particularly undergraduate students, without requiring additional applications, equipment, or learning processes.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eRelated Work\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAlthough significant progress has been made in recent years by research scientists and technical companies in extracting information from scientific literature, our literature review focused solely on single scholarly article summarization, excluding metadata, key sentences, and AI article summarization released after 2019. We reviewed online platforms, company projects, and academic research projects prior to 2019. The purpose of this literature review was to highlight the significant systems available at the time we started this project and to demonstrate the contributions our Extraction Tool could make to this field.\u003c/p\u003e\n\u003cp\u003eEarly systems for summarization had been focusing on extracting metadata, finding the most salient sentences from source documents (Mihalcea, 2004) (Erkan, 2004), highlighting important sentences, and adding annotations, such as CiteSeer (developed and became public in 1998\u0026nbsp;(Wikipedia contributors, 2024), MALLET (released in the early 2000s), ParsCit (developed in the mid-2000s), CERMINE (Released in 2014)\u0026nbsp;(Tkaczyk, 2024).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eMore advanced systems developed later, like MEAD, which was released in early 2001. MEAD includes two baseline summarizers (lead-based and random based systems), and was experimenting with a platform for multi-document, multilingual text and individual document summarization(Radev, 2001). SUMMA, a text summarization toolkit released in 2014, developed an adaptive summarization application and the computation algorithms for computation\u0026nbsp;of various sentence relevance features and functionality for single and multi-document summarization in various languages\u0026nbsp;(Saggion, 2014). Mendeley was a reference manager and academic social network that allowed users to extract highlights, annotations, and important sections from individual scholarly articles (Reiswig, 2010). GROBID (GeneRation Of Bibliographic Data), launched in 2015, was a high performing software environment to extract metadata, bibliographic references or entities in scientific texts. \u0026nbsp;The extracted content was not much more than the original article’s Abstract (Lopez P. R., 2010). Semantic Scholar Tool, launched in 2015, is a huge aggregate corpus that consists of rich metadata, paper abstracts, resolved bibliographic references, as well as structured full text of around 8.1M open access papers. The system allowed a user to search and retrieve relevant papersonly from Semantic Scholar’s aggregated document pool and used Semantic Scholar as a sole search engine\u003cstrong\u003e.\u0026nbsp;\u003c/strong\u003eThe major problem for this application is that Semantic Scholar deprives students of the opportunity to learn the research processes, and to explore and search subject databases in their learning areas.SummerTime, released in 2021 (a couple years after our set time 2019), was a complete toolkit for text summarization, including various models, datasets and evaluation metrics, for a full spectrum of summarization-related tasks\u0026nbsp;(Ni, 2021). However, users have to learn the “simple” API codes and read explanations for models and evaluation metrics to learn and understand the model behaviors and select models that best suit their needs.\u003c/p\u003e\n\u003cp\u003eBy 2019, numerous toolkits had been developed to perform document content summarizations, mostly for the information industry to produce metadata and brief summaries for information discovery and retrieval. While a few systems allowed end users to search for papers on specific subjects and extract sections like abstracts, figures, and references (Arum, 2016), they did not enable users to extract content from their selected scholarly articles, such as single journal articles which a student selected for their research assignments. Most of these toolkits required users to download apps and learn their structures and modules before use. Additionally, since these toolkits were designed for large-scale operations, they could be intimidating for new users, particularly undergraduate students. We believe that such extensive toolkits and document aggregators can hinder students from fully engaging in the research process. It’s essential for students, particularly undergraduates, to develop skills in topic analysis, searching relevant subject databases, and selecting appropriate resources for their research assignments. This hands-on experience is crucial for their academic growth.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eOur Scholarly Article Content Extraction Tool (SACET):\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eRecognizing the significant needs and challenges, and motivated by the gaps between state-of-the-art information extraction methods and their practical applications to scientific texts, we began exploring the development process of a scholarly article content extraction in 2019. By 2020, we built our first prototype of SACET using Python. However, it did not perform as expected and had numerous limitations and technical issues.\u003c/p\u003e\n\u003cp\u003eWe restarted the project in late 2021, utilizing the Java programming language and Apache OpenSource software to convert PDF to text. By March 2022, we successfully developed a new version and began testing the prototype (Prototype v 0.16.0). This web-based/cloud tool is designed for users without a background in natural language processing (NLP) or API coding. The user interface is very simple and accessible anytime and anywhere. Users can access SACET from a personal computer or mobile device to extract content from individually selected scholarly articles. The tool can instantly execute the extraction and deliver the content directly to the user’s computer within the same window browser. Our SACET generates comprehensive summaries that provide the depth needed for a thorough understanding of the original article. Users can obtain the most essential content using our tool.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWe believe it is crucial to publish our innovation due to its significant potential to advance the field of content extraction. However, we do not intend to replace any existing summarization work. Instead, we aim to offer student users an easy-to-use tool to facilitate their literature searches, enhance their research learning experience, and assist with their research assignments or projects.\u003c/p\u003e"},{"header":"Method","content":"\u003cp\u003eOur SACET Tool comprises three modules, which we will describe in detail in the following sections. Section 1 covers the operating system, developed using Java programming language and Apache OpenSource software to convert PDF to text. This operating system was programmed based on the Content Extraction Process \u0026amp; Governing Algorithm described below. Section 2 shows the user document submission interface, and Section 3 is the summary result display interface.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eSection 1: Operating system\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe operating system was developed using the Java programming language to execute the Content Extraction Process \u0026amp; Governing Algorithm. In this section, we will describe the architecture, content extraction methodology, underlying algorithms, operational workflow, which form the core of our SACET.\u003c/p\u003e\n\u003cp\u003eFigure 1. Content Extraction Process \u0026amp; Governing Algorithm\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eDescription of the\u003c/em\u003e\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003cem\u003eContent Extraction Process \u0026amp; Governing Algorithm\u0026nbsp;\u003c/em\u003e\u003c/strong\u003e(For more details, see Appendix: https://docs.google.com/document/d/10KZiKDS0PCfdtYNV2oizWqjz8jjShYEv/edit?usp=sharing\u0026amp;ouid=105352857917939044173\u0026amp;rtpof=true\u0026amp;sd=true )\u003cem\u003e:\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStep 1:\u003c/strong\u003e Upload PDF article to the server\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStep 2:\u003c/strong\u003e Convert original article (PDF) to text format using Apache PDFBox\u003csup\u003eTM\u003c/sup\u003e OpenSource. (See Appendix p.2)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStep 3:\u003c/strong\u003e Clean the article text body: (See Appendix p.2)\u003c/p\u003e\n\u003cp\u003e1)\u0026nbsp; \u0026nbsp;Remove article keywords\u003c/p\u003e\n\u003cp\u003e2)\u0026nbsp; \u0026nbsp;Remove \u0026ldquo;Categories and Subject\u0026rdquo; section.\u003c/p\u003e\n\u003cp\u003e3)\u0026nbsp; \u0026nbsp;If there is \u0026ldquo;Article Info\u0026rdquo;, remove the \u0026ldquo;Article Info\u0026rdquo; section, and all info about the author(s) section.\u003c/p\u003e\n\u003cp\u003e4)\u0026nbsp; \u0026nbsp;Remove/navigate publication platforms (See Appendix P. 2 for more info)\u003c/p\u003e\n\u003cp\u003e5)\u0026nbsp; \u0026nbsp;Remove footnote. (See Appendix P. 3)\u003c/p\u003e\n\u003cp\u003e6)\u0026nbsp; \u0026nbsp;Remove the section(s) at the end of the article body, which is the stopping point to stop document scanning. (See Appendix P. 3)\u003c/p\u003e\n\u003cp\u003e7)\u0026nbsp; \u0026nbsp;Remove \u0026ldquo;.\u0026rdquo; of the page numbering. (See Appendix p.4)\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e8)\u0026nbsp; \u0026nbsp;Remove In-text citation formats (See Appendix P. 4)\u003c/p\u003e\n\u003cp\u003e9)\u0026nbsp; \u0026nbsp;Remove the following [##] or (#,#) (in-text citation), only if it is a word before the \u0026ldquo;.\u0026rdquo;. (See Appendix p.5)\u003c/p\u003e\n\u003cp\u003e10) Clean \u0026amp; protect the abbreviations with \u0026ldquo;.\u0026rdquo; if the \u0026ldquo;.\u0026rdquo; appears in the middle of a sentence. Click this link for a list of the abbreviations: https://docs.google.com/document/d/1YsehkF8x3hNgTR7V31VRqsF4TwsDTu-t32BAI4opjaw/edit?usp=sharing ) (See Appendix p.5)\u003c/p\u003e\n\u003cp\u003e11) Protects special characters (used in different subject areas). The following is the list of \u003cstrong\u003especial characters\u003c/strong\u003e used in many subject areas. Click on the link for the list: https://docs.google.com/document/d/1ZxId6knRwGqFd10136-Q1ciBIB4BGXJJdNpdVQgkyfA/edit?usp=sharing \u0026nbsp;(See Appendix p.6)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStep 4:\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eKeywords extraction scheme:\u0026nbsp;\u003c/strong\u003e(the whole article body now is cleaned with only the article body left)\u003c/p\u003e\n\u003cp\u003eStart scanning the text body to pull out the keywords (See Appendix p.6).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026middot; Keyword \u0026ldquo;List 1\u0026rdquo; is the keywords from the title and abstract:\u003c/p\u003e\n\u003cp\u003e\u0026middot; Scan and listed all words from title and abstract, excluding the stop-words, but including nouns, adjectives, and verbs (as List 1).\u003c/p\u003e\n\u003cp\u003e\u0026middot; Ignore stop-words. Stop-word list is below: (https://docs.google.com/document/d/17X7OaUaZgBJoNwNKtrYOevkVCsg9tTwV/edit?usp=sharing\u0026amp;ouid=105352857917939044173\u0026amp;rtpof=true\u0026amp;sd=true )\u003c/p\u003e\n\u003cp\u003e\u0026middot; Keyword \u0026ldquo;List 2\u0026rdquo; is the keywords from the article body:\u003c/p\u003e\n\u003cp\u003e- Start scanning article body, from the paragraph after Abstract, or from the 1\u003csup\u003est\u003c/sup\u003e paragraph, or Introduction paragraph.\u003c/p\u003e\n\u003cp\u003e- Scan and list all words in the whole article body, excluding all stop-words (Stop-word list is above).\u003c/p\u003e\n\u003cp\u003eo List all words from the article body, named \u0026ldquo;List 2\u0026rdquo;\u003c/p\u003e\n\u003cp\u003e\u0026middot; \u003cstrong\u003eMatch \u0026ldquo;List 1\u0026rdquo; \u0026ndash;\u0026nbsp;\u003c/strong\u003e(against)\u003cstrong\u003e\u0026nbsp;\u0026ldquo;List 2\u0026rdquo; = \u0026ldquo;List 3\u0026rdquo; (List 3 is the keywords that are matched between List 1 and List 2)\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStep 5:\u0026nbsp;\u003c/strong\u003eSentence extraction scheme: (See Appendix p.7)\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026middot; Start scanning from the 1st paragraph (or Introduction).\u003c/p\u003e\n\u003cp\u003e\u0026middot; Extract and list the sentences which include those selected keywords based on the following scheme:\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026middot; Select the most repeated 3 keywords in List 3 and the following 4 (or 5) middle-repeated keywords in List 3.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u0026middot; Extract the sentences which include ONE (1) of the 3 most repeated keywords + ONE (1) of the 4 (or 5?) \u0026nbsp;medium repeated keywords:\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e1 of 3 most repeated + 1 of 4 (or 5) medium repeated = sentence to be extracted\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStep 6:\u0026nbsp;\u003c/strong\u003eContent extraction: (See Appendix p. 7)\u003c/p\u003e\n\u003cp\u003e\u0026middot; Sentence starting point and stopping point\u003c/p\u003e\n\u003cp\u003eo Extract ONLY the sentence with the selected \u0026amp; qualified keywords (described in keywords extraction scheme above).\u003c/p\u003e\n\u003cp\u003e\u0026sect; Sentence extraction\u003cstrong\u003e\u0026nbsp;starts\u0026nbsp;\u003c/strong\u003eafter one of the following: \u0026nbsp;\u003c/p\u003e\n\u003cp\u003exxxx\u003cstrong\u003e.^\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003exxxx!^\u003c/p\u003e\n\u003cp\u003exxxx?^\u003c/p\u003e\n\u003cp\u003exxxx.\u0026rdquo;^\u003c/p\u003e\n\u003cp\u003e\u0026sect; Sentence\u003cstrong\u003e\u0026nbsp;ends\u003c/strong\u003e at one of the following:\u0026nbsp;\u003c/p\u003e\n\u003cp\u003exxxx\u003cstrong\u003e.^\u003c/strong\u003e \u0026nbsp;\u003c/p\u003e\n\u003cp\u003exxxx!^\u003c/p\u003e\n\u003cp\u003exxxx?^\u0026nbsp;\u003c/p\u003e\n\u003cp\u003exxxx.\u0026rdquo;^\u003c/p\u003e\n\u003cp\u003e\u0026middot; Extract section headings if there are section headings (See Appendix p.8)\u003c/p\u003e\n\u003cp\u003eo Extract section headings:\u003c/p\u003e\n\u003cp\u003eo Section Headings to be recognized as:\u003c/p\u003e\n\u003cp\u003e\u0026sect; Extra line space, followed by Capital (1\u003csup\u003est\u003c/sup\u003e letter) words and boldface (less than 9 capital words):\u003c/p\u003e\n\u003cp\u003eExample: one line spacing (sometimes two line spacing) from the previous paragraph:\u003c/p\u003e\n\u003cp\u003eIntroduction\u003c/p\u003e\n\u003cp\u003eLiterature Review\u003c/p\u003e\n\u003cp\u003eMethodology\u003c/p\u003e\n\u003cp\u003eResult\u003c/p\u003e\n\u003cp\u003eFurther Discussion\u003c/p\u003e\n\u003cp\u003eo List the extracted sentences under the appropriate Section headings (if the article has section headings).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStep 7:\u003c/strong\u003e Convert extracted text content to PDF (See Appendix p.8)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eStep 8:\u003c/strong\u003e Push the PDF extracted content to the user. (See Appendix p.8)\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe end.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eSection 2.\u003c/em\u003e\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003cem\u003eUser Interface\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe user interface features a web-based document submission form. Users upload their PDF articles through this form, as illustrated in Figure 2. The form then sends the document to the Java system for processing and extraction. The system follows the outlined steps in Figure 1 to extract content from the scholarly article.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;Figure 2. User Web-based submission form:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u003cem\u003eSection 3: The Interface Displays Summary Document\u003c/em\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis section includes the article summary document in PDF format, which is returned to the user after having been processed. After the user submits a PDF article using the Web submission form (see Figure 2.), the system processes it through eight steps (see Figure 1) to extract the content from the original article. The system then returns a summary version of the article in PDF format. At the top of the PDF summary document, the user will see the original file name, the word count of the original article, the word count of the extracted content, the percentage out of the original word count, and the summary content. Please refer to Figure 3.\u003c/p\u003e\n\u003cp\u003eFigure 3. This interface displays the result of the extracted information to the user.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;The URL of the Prototype Scholarly Article Content Extraction Tool (SACET) is below. This URL takes a user to the Web submission form. http://www.extract-article.com:8080/\u003ca href=\"about%3Ablank\"\u003e\u0026nbsp;\u003c/a\u003e\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eWe invited faculty and students to test using our Tool to extract the scholarly article and received the extracted content. We received positive responses.\u003c/p\u003e\n\u003cp\u003eWe compared our SACET\u0026rsquo;s article summary result with ChatGPT\u0026rsquo;s summary result. We found that our Tool has a unique feature and extracts better content than ChatGPT\u0026rsquo; summary version. A user can get more information out of our SACET\u0026rsquo;s summary and may skip reading the lengthy original article. But ChatGPT\u0026rsquo;s summary is so brief (not much more than the abstract) that a user would need to read the original article in order to get the necessary details of the original article.\u003c/p\u003e\n\u003cp\u003eThe following is one of the examples of our SACET\u0026rsquo;s summary result. We compared this summary with ChatGPT\u0026rsquo;s summary. The original article\u0026rsquo;s word count is 3939. SACET\u0026rsquo;s extracted content word count is 1079, which is 27% of the original article\u0026rsquo;s word count.\u003c/p\u003e\n\u003cp\u003eFigure \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e. is the front page of the original article in PDF format. Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e. Is the comparison of the article summaries generated by our SACET and ChatGPT:\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eA comparison of the article summaries generated by our SACET and ChatGPT:\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"2\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eSACET\u0026rsquo;s summary:\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eChatGPT\u0026rsquo;s summary:\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003ePDF file name\u003c/strong\u003e: new SOD mimic-original.pdf\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eOriginal article word count\u003c/strong\u003e: 3939\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eExtracted content word count\u003c/strong\u003e: 1079 \u003cstrong\u003e(27% of the original word count)\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eTitle of the article\u003c/strong\u003e: A new SOD mimic, Mn(III) ortho N-butoxyethylpyridylporphyrin, combines superb potency and lipophilicity with low toxicity\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eThe content extracted is as follows\u003c/strong\u003e:\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eTitle\u003c/strong\u003e: A new SOD mimic, Mn(III) ortho N-butoxyethylpyridylporphyrin, combines superb potency and lipophilicity with low toxicity\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eAuthors\u003c/strong\u003e: Zrinka Rajic, Artak Tovmasyan, Ivan Spasojevic, Huaxin Sheng, Miaomiao Lu, Alice M. Li, Edith B. Gralla, David S. Warner, Ludmil Benov, Ines Batinic-Haberle\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eJournal\u003c/strong\u003e: Free Radical Biology \u0026amp; Medicine, Volume 52, 2012, Pages 1828\u0026ndash;1834\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eIntroduction\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eMn porphyrin-based SOD mimics, proximities scavengers, and redox modulators of cellular signaling pathways have been developed for over 20 years.\u003c/p\u003e\n \u003cp\u003e.-\u003c/p\u003e\n \u003cp\u003eBased on SAR and our simple O2-specie in vivo model of the aerobic growth of SOD-deficient E. coli, the ortho isomeric Mn(III) N-substituted pyridylporphyrins have emerged as the most potent and sat able SOD mimics with log K\u003csub\u003ecat\u003c/sub\u003e ~8. Among them, MnTE-2-PyP\u003csup\u003e5+\u003c/sup\u003e has been the most frequently studies compound.\u003c/p\u003e\n \u003cp\u003eAmong the lipophilic analogs, MnTnHex-2-PyP\u003csup\u003e5+\u003c/sup\u003ehas been the most frequently studied porphyrin.\u003c/p\u003e\n \u003cp\u003eThe toxicity of MnTnHex-2-PyP\u003csup\u003e5+\u003c/sup\u003eat higher concentrations/doses is at least in part due to its micellar property, and thus ability to damage membranes.\u003c/p\u003e\n \u003cp\u003eWe already observed that the replacement of a CH2 group by oxygen atom in each of the four butyl chains suppressed the toxicity of MnTnBu-2-PyP5+, but unfortunately, it greatly decreased its lipophilicity also. We applied the same approach to the modification of a lipophilic MnTnHex2PyP\u003csup\u003e5+\u003c/sup\u003e, attaching methoxy groups at the end of the hexyl chains, hoping that in this case the lipophilicity of the longer hexyl chains would outbalance the polarity of oxygens.\u003c/p\u003e\n \u003cp\u003eIn this work, we showed that if the oxygen atoms are buried within the long-alkyl chains (closer to the pyridyl rings) of MnTnBuOE-2-PyP\u003csup\u003e5+\u003c/sup\u003e, and are thus protected from the extensive solvation, the high lipophilicity is fully preserved. The toxicity of MnTnBuOE-2-PyP\u003csup\u003e5+\u003c/sup\u003eto both mice and Saccharomyces cerevisiae is greatly decreased relative to either MnTnHex-2-PyP (bearing the same number of carbon atoms in pyridyl substituents) or MnTnHep-2-PyP\u003csup\u003e5+\u003c/sup\u003e (of the same length of pyridyl substituents).\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eExperimental\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eThe porphyrins (MnTE-2-PyP\u003csup\u003e5+\u003c/sup\u003e, MnTnHex-2-PyP\u003csup\u003e5+\u003c/sup\u003e, and MnTnHep- 2-PyP\u003csup\u003e5+\u003c/sup\u003e) used throughout this study were synthesized according to the procedures described earlier.\u003c/p\u003e\n \u003cp\u003eWhile stereoisomers with lipophilic analogs of the ortho alkyl series separate on the TLC plate, under the same conditions with H2TnBuOE-2-PyP\u003csup\u003e4+\u003c/sup\u003eand its Mn complex they do not.\u003c/p\u003e\n \u003cp\u003eAqueous solutions of Mn porphyrins were filter-sterilized (0.22-um filter, Whatman, Middlesex, UK).\u003c/p\u003e\n \u003cp\u003eIn order to more accurately compare the toxicity among MnTnBuOE-2- PyP\u003csup\u003e5+\u003c/sup\u003e, MnTnHex-2 -PyP\u003csup\u003e5+\u003c/sup\u003e, and MnTnHep-2-PyP\u003csup\u003e5+\u003c/sup\u003e, an additional four mice per dose were also injected with single injection of 2.5 or 5 mg/kg MnTnHex-2-PyP\u003csup\u003e5+\u003c/sup\u003e, 5 or 10 mg/ kg MnTnBuOE-2-PyP\u003csup\u003e5+\u003c/sup\u003e, and 2.5, 5 or 10 mg/kg MnTnHep-2-PyP\u003csup\u003e5+\u003c/sup\u003e.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eResults and discussion\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eTo understand the origin of much higher amount of the porphyrin with three me thoxyhexyl groups and one methyl group, we synthesized a number of porphyrins with different chain lengths and different positions of the oxygen atoms in the alkyl chains.\u003c/p\u003e\n \u003cp\u003eThe different position of the oxygens in alkoxyalkyl p-toluenesulfonate leads to the formation of cycles of different size and therefore different stability via intramolecular rearrangement; consequently, more or less of the porphyrin species bearing one undesirable alkyl chain will be formed.\u003c/p\u003e\n \u003cp\u003eWith porphyrins that contain long alkyl chains, the steric hindrance prevented the easy approach of Mn ion to the porphyrin ring; the metalation of H\u003csub\u003e2\u003c/sub\u003eTnHex2-PyP\u003csup\u003e5+\u003c/sup\u003eoccurs at 100\u0026deg;C, while within a few hours at room temperature in the case of MnTnBuOE-2-PyP\u003csup\u003e5+\u003c/sup\u003e.\u003c/p\u003e\n \u003cp\u003eWhen the pyridyl substituents are lipophilic, the cationic charges on nitrogen\u0026rsquo;s are not excessively solvated, and in turn exert a stronger electron-withdrawing effect on the Mn site: both MnTnHex-2PyP\u003csup\u003e5+\u003c/sup\u003e and MnTnHep-2-PyP\u003csup\u003e5+\u003c/sup\u003ehave ~\u0026thinsp;100 mV more positive E1/2 than MnTE-2-PyP\u003csup\u003e5+\u003c/sup\u003e, indicating the less electron deficiency of the latter than of the former two porphyrins (Table\u0026nbsp;3). However, due to the presence of oxygens in alkyl chains, but oxyalkyl chains are more solvated than hexyl or heptyl chains, which in turn hinders nitrogen charges (and thus sup presses their electron withdrawing effect on the Mn site; consequently, the E1/2 is less positive (Table\u0026nbsp;3).\u003c/p\u003e\n \u003cp\u003eAdditionally, the decrease in E1/2 may occur as a consequence of the electron-donating properties of the alkoxyalkyl substituents. Further, the solvation of the porphyrin cavity formed by pyridyl substituents benefits the catalysis of O\u003csub\u003e2\u003c/sub\u003e\u003csup\u003e\u0026minus;\u003c/sup\u003e dismutation as it involves the interaction of ionic species: singly charged Mn site and superoxide.\u003c/p\u003e\n \u003cp\u003eConsequently, MnTnBuOE-2-PyP5\u0026thinsp;+\u0026thinsp;is ~\u0026thinsp;1.5- to 2-fold more potent catalyst of O\u003csub\u003e2\u003c/sub\u003e.\u003csup\u003e\u0026minus;\u003c/sup\u003e dismutation than its alkyl analogs, which has either the same number of carbon atoms in pyridyl substituents (MnTnHex-2-5+), or whose pyridyl substituents are of similar length (MnTnHep-2-PyP5+).\u003c/p\u003e\n \u003cp\u003eThey exert toxicity at \u0026gt;\u0026thinsp;1 M concentrations, which is in part due to their high cellular accumulation, micellar properties, and different redox activity of Mn site.\u003c/p\u003e\n \u003cp\u003eWe have, however, observed that mammalian cells are less sensitive to lipophilic Mn porphyrins. Our preliminary data indicated that eukaryotic yeast is also much less sensitive to the lipophilic Mn porphyrins and can tolerate wellupto30 M MnTnHex2-PyP\u003csup\u003e5+\u003c/sup\u003e .\u003c/p\u003e\n \u003cp\u003eIf injected ip at2 mg/ kg, MnTnHex-2-PyP\u003csup\u003e5+\u003c/sup\u003eexerted transient toxicity; mice shivered and sat hunched for 60\u0026ndash;90 min. The toxicity was more pronouncedwith2.5 mg/kg; mice were barely walking and sometimes showed tail twist.\u003c/p\u003e\n \u003cp\u003eImmediately on the single ip injection of 10 mg/kg of MnTnBuOE-2-PyP\u003csup\u003e5+\u003c/sup\u003e, the mice looked sleepy and did not spontaneously ambulate; the next day they appeared better but not fully recovered. A tail twist and body shaking were seen wit\u003csub\u003eh\u003c/sub\u003e one mouse, but no mice died. Thus, the toxicity of 10 mg/kgMnTnBuOE-2PyP\u003csub\u003e5+\u003c/sub\u003eappears similar or lower than that observed with 2.5 mg/kg of MnTnHex-2-PyP\u003csub\u003e5+\u003c/sub\u003e.\u003c/p\u003e\n \u003cp\u003eMnTnBuOE-2-PyP\u003csup\u003e5+\u003c/sup\u003ewas then injected ip twice daily for 7 days at doses up to 4.5 mg/kg (9 mg/kg total dose per day); no signs of toxicity were observed as measured by rotarod and body weight loss (Fig.\u0026nbsp;5).\u003c/p\u003e\n \u003cp\u003eThe lower toxicity of MnTnBuOE-2-PyP\u003csup\u003e5+\u003c/sup\u003erelative to it\u003csup\u003es\u003c/sup\u003e hexyl and heptyl analogs is not fully understood at this point. Further experiments are under way to uncover possible relationships among the toxicity data, E1/2 values for Mn\u003csup\u003eIII\u003c/sup\u003eP/Mn\u003csup\u003eII\u003c/sup\u003eP redox couple, solvation, charge distribution, and Mn-oxo chemistry of these water-soluble Mn porphyrin-based therapeutics.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eConclusions\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eWhile studying the mechanistic aspects involved in the preparation of the alkoxyalkyl series of Mn porphyrins (MnTMOHex-2(and 3)-PyP\u003csup\u003e5+\u003c/sup\u003e), with the goal to reduce the toxicity of lipophilic Nalkylpyridylporphyrins, serendipity led us to the preparation of a new porphyrin molecule with a high therapeutic potential, MnTnBuOE-2-PyP\u003csup\u003e5+\u003c/sup\u003e. The presence of the oxygen atoms and their positioning deeper within the hydrophobic long-alkyl chains make them less exposed to the solvent molecules, which in turn results in: (1) preserved lipophilicity of the molecule; (2) increased kcat due to the more solvated cavity around the metal site; (3) disrupted micellar character which in turn resulted in a greatly diminished toxicity of MnTnBuOE-2-PyP\u003csup\u003e5+\u003c/sup\u003erelative to its lipophilic alkyl analogs, MnTnHex-2-PyP\u003csup\u003e5+\u003c/sup\u003eand MnTnHep-2-PyP\u003csup\u003e5+\u003c/sup\u003ein both S. cerevisiae and mouse toxicity studies. In a very simple O.\u003csub\u003e2\u003c/sub\u003e \u003csup\u003e\u0026ndash;\u003c/sup\u003e specific in vivo model, MnTnBuOE-2-PyP5\u0026thinsp;+\u0026thinsp;proved efficacious in supporting aerobic growth of SOD-deficient S. cerevisiae to the extent of a wild yeast strain counterpart.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eAbstract\u003c/strong\u003e: The study introduces Mn(III) meso-tetrakis(N-n-butoxyethylpyridinium-2-yl)porphyrin (MnTnBuOE-2-PyP5+), a new superoxide dismutase (SOD) mimic with high catalytic activity and lipophilicity but low toxicity. This compound was developed by modifying the structure of a previous SOD mimic, MnTnHex-2-PyP5+, to reduce toxicity associated with its micellar properties. The new compound demonstrated protection in SOD-deficient yeast and lower toxicity in mice compared to its predecessor.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eIntroduction\u003c/strong\u003e: Mn porphyrins are SOD mimics that can also scavenge peroxynitrite and modulate redox reactions. The development process included three phases:\u003c/p\u003e\n \u003cp\u003e1. Achieving high SOD enzyme potency.\u003c/p\u003e\n \u003cp\u003e\u0026bull; Enhancing lipophilicity to improve bioavailability and efficacy in treating central nervous system injuries.\u003c/p\u003e\n \u003cp\u003e3. Reducing toxicity by modifying the compound\u0026apos;s structure to prevent micelle formation.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eExperimental Section\u003c/strong\u003e: The study describes the synthesis and characterization of MnTnBuOE-2-PyP5+. This included standard chemical synthesis techniques and analyses such as UV/vis spectroscopy and elemental analysis.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e: MnTnBuOE-2-PyP5\u0026thinsp;+\u0026thinsp;retained high SOD-like activity and exhibited significantly lower toxicity in biological models compared to MnTnHex-2-PyP5+. It was protective in yeast at concentrations of 5\u0026ndash;30 \u0026micro;M, whereas MnTnHex-2-PyP5\u0026thinsp;+\u0026thinsp;was toxic within this range. In mouse studies, MnTnBuOE-2-PyP5\u0026thinsp;+\u0026thinsp;showed lower toxicity.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eConclusion\u003c/strong\u003e: The new Mn porphyrin, MnTnBuOE-2-PyP5+, is a potent and lipophilic SOD mimic with reduced toxicity, making it a promising therapeutic candidate for oxidative stress-related conditions and diseases.\u003c/p\u003e\n \u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n \u003cp\u003eThis summary highlights the key points of the article, focusing on the development, characterization, and advantages of the new Mn porphyrin compound.\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eTable \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e highlights the differences clearly:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eChatGPT provided the original article abstract along with one or two sentences summary of each section.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eOur SACET provided a better concise summary for each section, emphasizing the main research findings. Each summary\u0026rsquo;s word count is approximately 20\u0026ndash;27% of the original article\u0026rsquo;s section content.\u003c/p\u003e\n \u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eUsers felt that our SACET summaries gave them sufficient information and necessary depth for thorough understanding of the original aricle, whereas they needed to read the original article after reviewing ChatGPT\u0026rsquo;s summary to obtain the necessary details.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eWe unveiled our innovative Scholarly Article Content Extraction Tool (SACET), detailing its architecture, content extraction methodology, underlying algorithms, and operational workflow. Additionally, we compared SACET\u0026rsquo;s summary outputs with those generated by ChatGPT. Users of SACET have noted that it generates more comprehensive summaries, providing the necessary depth for thorough understanding. We believe our research offers significant contributions to the field of scholarly publication summarization.\u003c/p\u003e \u003cp\u003eOur SACET is designed for simplicity and user-friendliness. Users can instantly obtain summary results directly on the browser screen without the need to upload or install any apps or access another application. When comparing the summary results of our tool with those of ChatGPT, we found that SACET provides students with a basic understanding of the original article, often eliminating the need to read the original full text. In contrast, ChatGPT\u0026rsquo;s summaries can be too brief, resembling the original article abstract and lacking the depth required for a comprehensive understanding. Consequently, users may still need to read the original articles after reviewing ChatGPT\u0026rsquo;s summaries to grasp the full scope of the research. While ChatGPT\u0026rsquo;s summaries are useful for those seeking a brief overview, our SACET content extraction strategy and algorithm offer more detailed insights. Meanwhile, using our SACET does not disrupt the research learning process for students, especially undergraduates. They can still learn essential skills such as topic analysis, searching subject databases or literature, and selecting resources in which they are interested for their research assignments.\u003c/p\u003e \u003cp\u003eBy sharing the development strategy and algorithm of our SACET, we aim to drive innovation in creating more effective scholarly publication summary tools. Our goal is to support research and development in scholarly content extraction and to enhance the advancement of these tools. We believe our contributions will enable researchers and developers to produce more advanced tools, facilitating better access to academic research, promoting content dissemination, and enhancing student research learning.\u003c/p\u003e"},{"header":"Discussion and Further Research","content":"\u003cp\u003eThere are still areas in SACET that need improvement:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eSection heading extraction: More work is needed, especially when section headings and subheadings are inconsistent.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eNon-text content extraction: Further development is required to enable the extraction of relevant non-text content, such as tables, figures, and mathematical expressions.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eWe observed that when converting single PDF articles to text format using Apache PDFBox\u0026trade; OpenSource, the journal title could be mistaken for the article title if the journal title\u0026rsquo;s font is larger. This limitation of Apache PDFBox\u0026trade; OpenSource needs to be addressed. Additionally, old or image-based PDF articles continue to pose challenges for accurate extraction, which is a well-known issue in the content recognition industry.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eConsent to participate and publish was obtained\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eArum, N. S. (2016). \u003cem\u003eA look at semantic scholar and Google scholar.\u003c/em\u003e https://www.academia.edu/download/51616762/1712353.pdf\u003c/li\u003e\n \u003cli\u003eErkan, G. a. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. \u003cem\u003eJournal of artificial intelligence research,\u0026nbsp;\u003c/em\u003e, 22: 457\u0026ndash;479. https://www.jair.org/index.php/jair/article/download/10396/24901/\u003c/li\u003e\n \u003cli\u003eHong, Z. W. (2021). Challenges and advances in information extraction from scientific literature: A review. \u003cem\u003eJOM\u003c/em\u003e, 73(11), 3383-3400. https://doi.org/10.1007/s11837-021-04902-9\u003c/li\u003e\n \u003cli\u003eLandhuis, E. (2016). Scientific literature: Information overload. \u003cem\u003eNature\u003c/em\u003e, 535, 457\u0026ndash;458. https://doi.org/10.1038/nj7612-457a.\u003c/li\u003e\n \u003cli\u003eLopez, P. R. et al (2010, July). HUMB: Automatic key term extraction from scientific articles in GROBID. \u003cem\u003eProceedings of the 5th international workshop on semantic evaluation.\u003c/em\u003e Retrieved from https://aclanthology.org/S10-1055.pdf\u003c/li\u003e\n \u003cli\u003eMihalcea, R. a. (2004). TextRank: Bringing Order into Text. \u003cem\u003eProceedings of the 2004 Conference on Empirical Methods in Natural Language Processing\u003c/em\u003e (pp. pages 404\u0026ndash;411). Barcelona, Spain: Association for Computational Linguistics. Retrieve from https://aclanthology.org/W04-3252.pdf\u003c/li\u003e\n \u003cli\u003eNi, A., Azerbayev, Z., ed al (2021). SummerTime: Text summarization toolkit for non-experts. \u003cem\u003earXiv preprint arXiv:2108.12738\u003c/em\u003e. Radev, D. R. (2001). Experiments in single and multidocument summarization using MEAD. \u003cem\u003eIn First document understanding conference\u003c/em\u003e (pp. 1-7). https://arxiv.org/pdf/2108.12738.\u003c/li\u003e\n \u003cli\u003eReiswig, J. (2010). Mendeley. \u003cem\u003eournal of the Medical Library Association: JMLA\u003c/em\u003e, 98(2), 193. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2859264/.\u003c/li\u003e\n \u003cli\u003eSaggion, H. (2014, May). Creating Summarization Systems with SUMMA. In \u003cem\u003eLREC\u003c/em\u003e (pp. 4157-4163). Tkaczyk, D. S. (2024). Cermine--automatic extraction of metadata and references from scientific literature. \u003cem\u003e11th IAPR international workshop on document analysis systems, IEEE.\u003c/em\u003e, (pp. 217-221). Retrieve from https://citeseerx.ist.psu.edu/document?repid=rep1\u0026amp;type=pdf\u0026amp;doi=d60802552c232bcfe486c781bbe3c71ccbcadf5b\u003c/li\u003e\n \u003cli\u003eWikipedia contributors. (2024, October 13). \u003cem\u003eWikipedia\u003c/em\u003e. Retrieved from Wikipedia: https://en.wikipedia.org/w/index.php?title=CiteSeerX\u0026amp;oldid=1221864764\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"California State University, Dominguez Hills","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Artificial Intelligence (AI), Scholarly article content extraction, Summarize key content of a scholarly article, Text summarization tools","lastPublishedDoi":"10.21203/rs.3.rs-5961778/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5961778/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eWe developed an intuitive prototype scholarly article content extraction tool to help individuals and students extract key content from scholarly articles. Although scholarly articles typically include an abstract, they often lack depth. Existing summarization tools are not always user-friendly and often require setup and learning processes. Most existing tools are designed for the information industry to extract metadata and key content efficiently.\u003c/p\u003e \u003cp\u003eThis paper introduces our innovative Scholarly Article Content Extraction Tool (SACET), which extracts content from a single scholarly article and instantly returns an accurate natural language summary. SACET is accessible anytime/anywhere without setup or extensive learning. We compare SACET's summaries with those from ChatGPT, highlighting the differences, value, and uniqueness of our innovation. Publishing our innovation is crucial due to its potential to advance content extraction applications. This article discusses the objectives, architecture, content extraction methodology, underlying algorithms, operational workflow, and additional work needed to improve the tool.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e","manuscriptTitle":"Unlocking Scholarly Article Insights: Creating a Scholarly Article Content Extraction Tool –Objectives, Methodology, and a Comparative Analysis of Summary Results","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-02-11 09:16:12","doi":"10.21203/rs.3.rs-5961778/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"7144d314-e0c6-4bd0-9a59-20293f939c15","owner":[],"postedDate":"February 11th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":44154567,"name":"Information Retrieval and Management"},{"id":44154568,"name":"Library Science"},{"id":44154569,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2025-02-11T09:16:14+00:00","versionOfRecord":[],"versionCreatedAt":"2025-02-11 09:16:12","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5961778","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5961778","identity":"rs-5961778","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-24T02:00:01.246996+00:00
License: CC-BY-4.0