An automated decision-making procedure for ranking and selecting species in biodiversity projects | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article An automated decision-making procedure for ranking and selecting species in biodiversity projects Torsten H. Struck, Thomas Marcussen, Astrid Böhne, Rosa Fernández, and 14 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7957242/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 7 You are reading this latest preprint version Abstract In large-scale biodiversity genomics projects, the number of species that could be sequenced exceeds the resources available. Species selection is therefore a crucial component, requiring clear criteria and procedures. In a bottom-up approach, the Biodiversity Genomics Europe project implemented an Automated Decision-Making (ADM) process for species selection based on objective criteria and tested it on simulated and empirical data. Here, we present our species ranking ADM process. It includes three stages: exclusion, ranking, and feasibility-check. The composition of selected species retained the diversity of the community-nominated species pool for key taxonomic, geographic, and demographic assessment criteria while reducing bias. Feasibility and funding limits influenced the final selection more than other factors, indicating that investments in these areas would improve available reference genome diversity. The ADM achieved species selection for genome sequencing in a large-scale biodiversity project in a relatively objective manner consistent with the broader European biodiversity genomic community’s priorities. Biological sciences/Computational biology and bioinformatics Biological sciences/Ecology Earth and environmental sciences/Ecology Biological sciences/Genetics Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Molecular biodiversity studies, most notably large-scale genome sequencing initiatives 1 – 6 , are typically resource-intensive (see also Box 1). However, the number of potential target species is, in general, much higher than the actual number of species that can be processed. Consequently, data are generated for a limited subset of species within a project’s funding period. This normally requires that species are prioritised and selected for further analyses by a complex decision-making procedure based on various criteria, which may be scientific, technical, or social 7 , 8 . Numerous criteria are potentially relevant for the ranking of candidate species for genome sequencing. These can involve characteristics of the species itself, such as taxonomic representation, characteristics of the available specimen(s), such as karyotype (sex), or geographical distribution. Technical criteria, e.g., related to sampling procedures such as access to liquid nitrogen or maintenance of a cold chain can be crucial. Political or strategic decisions within the consortium, such as regional and institutional representation, can also be relevant. These criteria can also conflict with one another, making objective decision making difficult. Accordingly, there is a high chance that in such species selection procedures, inherently subjective components can enter the decision process. For example, there are generally intrinsic advantages for English native speakers in many decision processes 9 . Unconscious biases can also exist towards, e.g., gender 10 . Similar biases can also exist towards more iconic and better-known taxa, or economically important taxa, which is to the detriment to both conservation goals 11 and the goals of EBP 12 . Ideally, species selection procedures should be implemented that are transparent and objective. Automated decision-making (ADM) processes are widely used across various fields in society to ensure consistency, efficiency, and accuracy in decision-making when the number of criteria is high and the evaluation is complex 13 , 14 . A distinction may be made between prioritisation procedures, by which targets are queued for action, and ranking procedures, by which targets are ordered for evaluation. Automated prioritisation procedures are used in e.g., emergency room triage, cybersecurity threat detection, and task management systems to continuously ensure that high-value or urgent items receive attention first 15 . Automated ranking procedures are used to sort options based on pre-defined criteria such as value or relevance, e.g., in search engines, recommendation systems, and accreditation rankings 16 , 17 . The purpose of implementing ADM processes is to save time, reduce human biases and ensure a higher degree of consistency in decision-making. Specifically, the implementation of an ADM requires a set of clearly defined criteria. The selection of this set of criteria can be achieved, through mechanisms such as a general vote or other bottom-up, transparent decisions. We therefore argue that an ADM process may ease resource allocation in biodiversity genome projects, while also making the process more transparent and democratic. The Biodiversity Genomics Europe (BGE) project (EU Horizon Europe BGE Project, Grant Agreement No 101059492) was based on large community involvement. Accordingly, one goal of the BGE project aimed to sequence a set of reference genomes through a community-driven approach. In this approach, scientists nominated a species and then, if selected, provided samples/specimens. To this aim, BGE created a transparent ADM process for selecting species based on a set of a priori objective criteria from the pool of nominated species. To develop the ADM process and its criteria, BGE leveraged its foundation in the European Reference Genome Atlas (ERGA) community and its experience with a community-based pilot project 6 , 7 . The ERGA community brings together broad taxonomic expertise across the eukaryotic tree of life, along with deep knowledge of the laboratory methods needed for sequencing reference genomes. Moreover, through its governance structure, which includes a council of national representatives, ERGA stands for a broad representation across Europe and democratic decisions. Here, we present the ADM procedure for ranking nominated species from across Europe as implemented by BGE. As part of this presentation, we contribute (1) a community-defined criteria set, (2) a reproducible ranking algorithm, and (3) a validation of the algorithm by simulated and empirical data. To our knowledge, such an ADM-based ranking procedure is unique among biodiversity genomics projects. This also includes that it was decided on in an open, bottom-up process. Therefore, we anticipate that this outcome from BGE has value to other large-scale biodiversity projects, and hence we provide (4) some guidance for other consortia. Automated species selection process A prerequisite for any selection procedure is that the number of candidates exceeds the number that can be processed. Within the available funding for the community-driven BGE task, sequencing was limited to a total of 150 Gb of estimated haploid genome size. To obtain a list of candidate species for genome sequencing we held two consecutive open calls, in which scientists and citizens could nominate European species for reference genome sequencing. This nomination was completed by filling in an online questionnaire with information that was later used to rank the species (see “Questionnaire for the nomination in round #2” in the Supplementary Information). The nomination process resulted in a list of 593 species nominations, and, based on the information provided in the nomination forms, summed to an estimated haploid genome size of about 1000 Gigabases (Gb). This means that the total genome size of the nominations was ca. seven times higher than the project's capacity. After the ranking procedure, 99 species (17% of the total) with a combined projected 150 Gb genome size were selected for genome sequencing. The species selection process implemented in the BGE project was a three-stage ADM process (Fig. 1 ) that comprises (1) an exclusion stage, (2) a general ranking stage employing a decision-tree and additional ranking for country and individual researcher representation, and (3) a feasibility check followed by an additional adjustment for genera with multiple species suggestions. The result of the ADM process is different lists including the one of selected species and a waiting list. 1st stage. Species exclusion . Three criteria at the first stage needed to be fulfilled for a species to proceed to stage 2. If any of them was not fulfilled, the species was excluded. These specific criteria are “Availability of published reference genomes”, “Redundancy”, and “Availability of voucher specimen” (Table 1 & Supplementary Data 01, please also see “Individual selection criteria” in the Supplementary Information for details). At this stage, 148 species were excluded, and 445 species moved forward to stage 2 (Fig. 1 ). 2nd stage. Species ranking . In the second stage, all species were first ranked using a decision model based on 11 criteria (Table 1 & Supplementary Data 01), information for those was obtained from the sample providers when they nominated the species. These 11 criteria are grouped into six categories to make the process easier. Moreover, some of the individual criteria assess different aspects of the same topic. The six categories are: “Taxonomic representation” (1 criterion; “Taxonomic representation on Tree of Life”), “Certainty” (2 criteria; “Certainty of species identification” & “Type locality”), “Country representation” (2 criteria; “A minimum of one species per country” & “Countries with fewer genomic resources”), “JEDI” (3 criteria; “Gender of researcher or gender balance of team”, “Researcher of underrepresented minority” & “Diversity & inclusiveness of team”), “Novel leader” (1 criterion; “Novel sample coordinator”), and “Applicability” (2 criteria; “Application of the genomes” & “Reach/breadth of community”). A weighting scheme is assigned to each category (explained in detail in “Weighting scheme of each category” in the Supplementary Information). In brief, the scheme of the category “Taxonomic representation”, ranging from genus to phylum (and beyond) level applied weights from 1–5, with higher weights for higher taxonomic units. In the one of “Certainty”, weights of 1, 2, 5 & 6 depended on how close to the type locality the sample was collected and how certain the species identification could be accomplished. Higher weights were given to species with higher taxonomic certainty. In the category “Country representation”, one criterion contributes to the knowledge transfer and hence has been linked to the country of the sample coordinator. The other one contributes to an equal representation of species from across Europe and is accordingly connected to the collection site. This means one species can represent two countries in this category. Weights of 1, 4 & 5 were applied with higher weights for countries from the list of EU Widening countries and the single species being nominated from the country. In the category “JEDI”, weights of 1–5 are applied depending on the sample coordinator’s gender, minority background, and the inclusiveness of the genome team, with higher weights assigned to increase diversity and inclusiveness. The category “Novel leader” used weights of 0 & 2 to reflect if the sample coordinator has already gotten a species sequenced via BGE/ERGA resources. Finally, the category “Applicability” had weights of 0–2 depending on how quickly and broadly the reference genome could be applied outside the nominating group. Concerning the selection model of the ADM process, eight different ranking models had been developed and tested (for more details on this step of the process, please see “Development of different prioritization models” in the Supplementary Information). Seven of these were variations of the final model presented below. These explored different combinations of “Certainty” and “Country representation” and of “Novel leader” and “Applicability” at different levels of a decision tree. The eighth model did not implement a decision tree but summed up the weights of all categories and ranked the species based on this sum. To explore the effect of the different models on the selection process and whether they generated the expected outcome, the models were applied to simulated and empirical data. For a detailed description of the procedures and results, please see “Simulation studies” and “Empirical data” in the Supplementary Information. For the simulated data, three probability distributions for the weighting scheme of each category were used to randomly assign a weight for each category and simulated species. In one distribution, all weights were drawn with equal probability, and in the other two, the probability was skewed either towards low or high weights being more probable. All possible combinations of these probability distributions were generated, resulting in 729 (3 6 ) combinations. For each combination, 100 datasets with 1,000 species were generated, resulting in 72,900 simulated datasets. The eight models described above were applied to each dataset to select either the top 100, 200, or 300 species, resulting in 218,700 sets of selected species. For each dataset, model, and top-selected species, an enrichment factor was calculated for each category. The enrichment factor measures the relative enrichment of a category in the top-selected species in relation to all species as the difference of the mean weight values of the top-selected species to all 1,000 species set in relation to the maximum possible difference. Hence, it is a relative measurement of the maximally possible positive enrichment of a category in the dataset with a maximal value of 1. Negative values are not bound, indicating a lower representation in the top-selected species. A heatmap of the mean enrichment factor across the 729 combinations for each model and number of top-selected species for each category showed that model #8 is set apart from all other models (Fig. 2 A). Moreover, all categories are more or less equally enriched, with the strongest enrichment in “Certainty” and the least in “Applicability”. Among the seven other models, the next split is between the selection of only 100 out of 1,000 species (10%). The models #1 and #3 emphasizing “Country representation” over “Certainty” in the species selection are thereby placed a bit closer to the other numbers of top-selected species than the other five. The major difference between these is that “Country representation” is more strongly enriched than “Certainty”, while it is the other way around in the models #2, #4–7. Moreover, in the models #5–7, which assess both categories together, “Certainty” has a stronger influence than “Country representation”. Also, for 200 and 300 selected species, the models #1 and #3 are set apart from the other five, but less prominently. Generally, a similar pattern occurs as with the 100 selected species. In summary, models #1 and #3, #2 and #4, as well as #5–7 have no strong differences, while model #8 is set apart by having less emphasis on the “Taxonomic representation”. For the empirical data, we used the 230 suggested species of the first round of suggestions by the community that passed the exclusion stage. Applying the eight different models to the dataset, we roughly selected 10% (25), 20% (50) and 30% (75) of the species, calculated the enrichment factor as above and also determined the intersection of the selected species between the models with the same number of selected species (i.e., for 25, 50 or 75 species). As in the simulation data, model #8 is different from all other models (Fig. 2 B). Within model #8, the differences are more pronounced with fewer species selected. The strongest enrichment is “Certainty”, followed by “Country representation”, and “Applicability”. While “JEDI” and “Taxon representation” have a low enrichment or none, “Novel leader” has negative values. In the case of the other seven models, the separation is between the number of selected species. Within the 25 selected species, there is no difference between the models. “Taxon representation” is strongly enriched, and “Applicability” is intermediate, while all others are not enriched. Selecting 50 species, enrichment of “Taxon representation” is not as strong, while enrichment of “Certainty”, “Country representation”, and “Applicability” are increased to intermediate values. “JEDI” shows no enrichment, while “Novel leader” has strong negative values. Moreover, models #2 and #4 are slightly different from the other five. Finally, the picture with 75 species looks similar, but for “Taxon representation” and “Novel leader” the values are not so strong any longer, and the differences of models #2 and #4, as well as #1 and #3, are a bit more pronounced. Looking at the intersections of the 50 selected species shows that the overlap of selected species between the models is actually very high, with equal to or more than 62% of species shared among any pair of models (Fig. 2 C). Nonetheless, three groups can be recognized. One contains again only model #8. The other ones are models #2 and #4, which select the same species. The last group contains the remaining five models, which also select the same species. The similarity in species selection between these two last groups is also very high, with 94%. Hence, like with the simulation studies, the similarity between the models #1–7 is very high. However, a more detailed investigation of the empirical data also showed that the country representation after the decision model was very biased towards certain countries, independent of the applied model (Fig. 3 ). For example, Croatia was very strongly represented and relatively more than among the nominating countries. Similarly, Poland and Italy were also highly represented. The original models also introduced a strong bias towards individual researchers as they had predominantly suggested species from underrepresented taxonomic groups (for more details, see “Additional step for prioritization” in Supplementary Information). Given the results from the simulated and empirical studies, a final decision by the ERGA council was made on two of the eight models tested (i.e., models #7 and #8) and additional rounds of ranking (for more details, see “Decisions by the ERGA council” in the Supplementary Information). Model #7 was elected. The final model, hence, comprised four decision levels. The first level of ranking is based on the category “Taxonomic representation” as this was regarded as the most important category by both the larger scientific community and the BGE project (Fig. 1 ). Hence, species with a higher score in this category were ranked above those with a lower score. Species with the same score in this category are then ranked amongst them based on the combined categories “Certainty” and “Country representation”. The scores for these two categories are summed for each species. Then the species were ranked based on the sum. All species that had the same scores in the two higher levels were ranked at the third level based on the category “JEDI”. Finally, all species that had the same scores in the three top levels were ranked at the last level based on the summed scores of the categories of “Novel leader” and “Applicability”. Additionally, after the ranking by the decision tree, additional rounds of ranking were conducted (Fig. 1 ). The ranking of the top-10 species was left unchanged. Next, the best-ranked species of each country that was not included in the top-10 species was ranked after the top-10 species. The order of ranking within these best-ranked species by country is based on the ranking by the decision tree. The country of interest in this process is the country of the collection site and not the country of the institution of the sample coordinator. This addition aimed to achieve a better representation of species from different European countries among the selected species. After this country-based ranking, the best-ranked species of each researcher, suggesting a species that was not included in the top-10 species or the best-ranked country species, is ranked after these species. The procedure is the same as for the country ranking. This ranking aimed to distribute the benefits of the BGE project as widely as possible across the scientific community in Europe. Finally, the remaining species are ranked following the ranking from the decision model. The ranked list of all species after these processes was then transferred to the next stage. No species are excluded during the ranking stage. 3rd stage. Feasibility. This stage consists of 10 criteria that assess whether a suggested species is feasible to be sequenced, given the available funding, current methodological capabilities 8 , and sample availability (Table 1). The feasibility thresholds were determined together with the sequencing centers. Species failing any of the feasibility criteria were flagged as non-feasible and removed. After the feasibility check, a final round of ranking was conducted. An examination of the selected species after the check had shown that more than one species was present for very few genera, due to the fact that multiple species had been suggested for these same genera. This was assessed as not being in agreement with the goal to accomplish a broad taxonomic representation of the eukaryotic biodiversity. Accordingly, the same principle as for the best-ranked species for countries and individual researchers was applied to these genera. The best-ranked species of each genus was kept at its position in the ranking order; all other species were moved to the bottom of the list. Among all the species across the different genera that were placed at the bottom of the list, the same ranking order was kept as it had been in the original list. After the final ranking, a ranked list of feasible and non-excluded species is available. From this list, the top-ranked species are selected for sequencing, given the limit imposed by the sequencing capacity, plus additional species for a waiting list. In our case, we applied 50% of the sequencing capacity as the threshold for the waiting list. Hence, the selection process until stage 3 generates lists of feasible species (selected, on waiting list, or non-selected species), non-feasible species, and excluded species (due to lack of a voucher specimen or the availability of a reference genome, or an ongoing reference-genome project). Check of permits. The check of all necessary legal documentation is placed before the actual sequencing. The legal responsibility that all permits necessary to access the sample for genome generation and genome data release are present, is with the sample coordinator (usually the researcher nominating the species). Before the sample can be sent to the sequencing centres, the sample coordinator must also submit a standardised set of metadata to BGE 7 , 18 . Upon submission, the sample coordinator acknowledges that all necessary permits are present and must upload them. This crucial part has deliberately been integrated late into the process as the package of necessary permits may depend on the attributed sequencing centre and the country it is based in (e.g., export and import permits). Hence, they can only be provided at this stage of the process. Results from the final species ranking Effect of the selection process As mentioned above, we thoroughly tested the selection process and the different possibilities at different stages. Here, we will only present the outcome of the entire ADM process (i.e., the finally selected species) in comparison to the pool of the nominated species and the species that would have been selected without applying stage 3 (Figs. 4 & 5 ). We will also briefly present the results of the feasibility check and the reasons why species were not feasible. BGE funding allows for the sequencing of 150 Gb in the task “T5.5 Critical biodiversity community sequencing”. If the selection process had been done without the feasibility check directly after stage 2, 67 species would have added up to 150 Gb and hence been selected. Adding up the predicted genome sizes of the ranked species after the feasibility check (stage 3), 99 species were finally selected. The lower number of possibly selected species after stage 2 is due to the fact that this list would have included some species with very large genomes, such as Somniosus microcephalus (Chordata) with a genome size of ~ 10 Gb, Calotriton asper (Chordata) of ~ 27 Gb, or Apocalathium malmogiense (Myzozoa) of ~ 30 Gb. These three species alone would have comprised about 45% of the total 150 Gb. However, the increased number of species is not reflected in a broader taxonomic representation. In contrast, the selected 99 species comprise 15 phyla of the originally 24 nominated phyla (Fig. 4 ). The 67 species at stage 2 would have comprised 18 phyla. Due to the feasibility check, the only species representing Chlorophyta, Rhodophyta, Myzozoa, and Tardigrada were excluded, while a representative of Bacillariophyta was added. The phyla Arthropoda (35 species instead of 21), Chordata (19 instead of 9), and especially Mollusca (9 instead of 1) benefited the most from the feasibility check. Finally, even though 148 out of the original 593 nominated species were excluded at stage 1, only one representative of Acanthocephala was removed. Considering the phyla originally nominated with more than 10 species, Arthropoda and Annelida are roughly represented at the same percentage in the finally selected species as in the nominated species. Chordata, Streptophyta, and Basidiomycota are less represented, as relatively more species have been nominated, while Mollusca benefited from the process. Additionally, Platyhelminthes is also more strongly represented in the final list. Without the feasibility check, Arthropoda, Chordata, and Mollusca would also have been less represented compared with the original nominations, with the effect being especially prominent for the latter two. Concerning the countries, all nominated species comprised 32 countries (Fig. 5 ). The species from Cyprus was excluded at stage 1. Selecting species directly at stage 2 would have resulted in 25 countries. Hence, six countries would have dropped out at this stage (i.e., the Czech Republic, Georgia, Hungary, Montenegro, Sweden, and Slovenia). The actual species selection after the feasibility check reduced the number of selected countries to 22. Specifically, Belgium, the Faroe Islands, Georgia, Iceland, the Netherlands, Romania, Serbia, Slovakia, and the United Kingdom were excluded. Hence, the Czech Republic, Hungary, Montenegro, Sweden, and Slovenia were not excluded as would have happened at stage 2. Generally, the distribution across the other countries is relatively even after both stages, with 14 countries having more than one species considered. However, Spain, France, Croatia, Italy, Switzerland, Germany, and Portugal benefited from the feasibility check with a substantial increase in species coming from these countries. Hence, the spread became less even after the feasibility check. The distribution of individual researchers among the selected 99 species was very even, with a total of 92 researchers providing species and only five with two or three species. A similar pattern is also found for the 67 species at stage 2. This is in contrast to all species, where several individuals nominated multiple species. Hence, the goal to distribute the generation of reference genomes across Europe at both the country and individual levels has been generally accomplished, even though it was not perfect at the country level. Only 5% of the selected 99 species had known taxonomic problems, but on the other hand, only about 7% of the 445 species after stage 1 had known taxonomic problems. Similarly, the proportion of species collected at or close to the type locality is about 49%. This is slightly better than all species after stage 1, with 45%. The relative composition of the procedures applied to identify species is not substantially different between the species after stage 1 and the selected species. Nonetheless, identification procedures applying only one approach are generally reduced among the selected species, especially if it is only the identification by a taxonomic expert for the group. Among the selected species, most (~ 75%) applied more than one method of species identification. In contrast, among all nominated species, “only” ~64% applied more than one method. With respect to the gender distribution, while male researchers comprised about 49% across the species after stage 1, among the selected species, there was an increase to 54%. Similarly, researchers not disclosing their gender also increased among the selected species (~ 5% from ~ 3%). Accordingly, female and non-binary researchers were both less represented. The proportion of underrepresented minorities (2.5%) or persons who did not want to disclose it (6.5%) was small in the pool among all species after stage 1, but it still slightly increased due to the process to 3.0% and 7.1%. Moreover, among the selected species, about 52% of the genome teams are purely scientific, while it is only 45% for all species after stage 1. Otherwise, the composition of the genome teams in the selected species is not strongly different from the nominations. Hence, concerning JEDI criteria, the ADM process did not enrich these criteria and sometimes had to some degree the opposite effect. General effect of the feasibility check at stage 3 Of the non-feasible species, 21.6% were excluded due to a genome size of larger than 6 Gb, and 22.1% due to a sample size smaller than a Drosophila fly or less than 100,000 nucleated cells. 41.3% of the species were excluded because they fulfilled one of the two criteria or both. 2.4% of the species have a small body size with large genomes. All of these species are challenging even for new sequencing technologies. Moreover, 29.1% of the species were excluded exclusively due to small body size and/or large genome size. Additionally, 19.7% of the species did not fulfill the criterion that they were already collected or easy to obtain (i.e., they were either common, widespread, and abundant or rare, but locally abundant). Of these 19.7%, 26.2% were also regarded as non-feasible due to the other criteria, but 73,8% of them, or 14.6% of all non-feasible species, are just challenging to collect. Hence, almost half (29.1%+14.6%+3.3%=47.0%) of the species were considered not feasible purely due to biological properties. Considering non-biological reasons, 45.3% of the species were considered not feasible due to criteria related to the sampling and sample processing. 29.3% of the species could not be snap-frozen or frozen on dry ice for sample preservation. For 21.3% of the non-feasible species, it was not possible to preserve them within 5 minutes of their death, and for 27.2% it was not possible to maintain a strict cold chain at -70°C. Of the non-feasible species, 31.6% were purely excluded because they could not properly be preserved or maintained. Hence, about one-third of the exclusions happened just due to non-biological criteria. Discussion We showed that automated decision-making (ADM) (i) worked on a large-scale biodiversity project, including community input at all levels, (ii) feasibility and funding shaped the final set more than anything else, and (iii) representation improved in places but still reflected the original composition of the nomination pool. Moreover, before a final decision on the ADM process is made, it can be tested using both simulated and empirical data to assess if the desired outcome is achieved. For example, for the ERGA and BGE community, taxonomic representation across the tree of life was the most important factor, in line with the goals of EBP in general 12 . The next important goals were a broad representation across the European scientific community (both on the country and individual level) as well as taxonomic certainty in the species identification. The former one is in line with the EU goals for knowledge transfer to the so-called widening countries 19 . The latter strengthens the goals of good taxonomist practice and reliable species identification as the foundation of biological science 20 – 24 . This is also reflected not only by the applied decision model but also by the additional ranking steps included in the process, which aim to increase the representation of taxa, countries, and individuals, as well as the requirement for specimen vouchers. Across both rounds of selection in the BGE project, the process generally accomplished these goals. However, both taxonomic and country representation were more strongly affected by the different stages of the process. While about 25% of species were excluded at stage 1, only one phylum and one country were no longer represented. Hence, stage 1 had little impact on the taxonomic and country representation. In contrast, both following stages had a stronger impact on both the taxonomic and the country representation. Nonetheless, both taxonomic and country representation were still relatively high and broad due to the ADM process and are not biased by subjective decisions. As we will show in the following, the inclusion and exclusion of phyla and countries can be essentially explained by the combination of feasibility, funding, and nomination. The effect of feasibility criteria and funding limitations on species selection. Feasibility, either direct or indirect, severely affected which species were selected for genome sequencing and, as a consequence, conflicts with the EBP’s goal to sequence every eukaryotic species on Earth 12 . A large proportion of the BGE samples were omitted because they failed to meet one or several of the a priori defined feasibility criteria. Interestingly, even though substantially more species were selected after stage 3 than would have been after stage 2, the number of phyla selected reduced further instead of increasing due to the higher number of species. Several of the feasibility criteria are linked to features of the species, such as abundance, genome size, and size of the sample, and about half of the non-feasible species were excluded due to these biological criteria. These characteristics are also prone to have a systematic and phylogenetic bias; for example, meiofaunal species are not evenly distributed across the animal tree of life 25 – 27 , genome size is not across the eukaryotic tree of life 28 or protists pose challenges to genomics in several aspects 29 . Accordingly, the four phyla excluded due to the feasibility check had either too large genomes (Myzozoa), too small body size (Tardigrada), or were too difficult to sample (Chlorophyta, Rhodophyta). However, in the BGE data, we did not see a clear phylum-level pattern, but sample size and nominations limited what one can conclude. Final coverage largely tracked what was nominated per phylum. For example, only a few species of phyla with small body sizes, such as Tardigrada, Rotifera, or Nematoda, were nominated or none at all, like for Gnathostomulida, Gastrotricha, or several unicellular eukaryotic phyla. Hence, an implicit filtering could have already occurred at the nomination level, as we clearly and transparently communicated the feasibility criteria. Nonetheless, enforcing upper or lower limits to these biological criteria may be valid for practical reasons, but it is also obviously in conflict with EBP’s goal to sequence every species on Earth 12 . It is therefore of pivotal importance that methods be developed or improved to reduce potentially systematic bias 8 . The accumulation of expertise in processing diverse samples in the lab has led to the development of improved ultra-low extraction protocols that can better handle smaller organisms with larger genomes 30 – 32 . Also, the development of specific protocols for different organism groups is helpful in this matter 8 , 33 – 35 . The ever-decreasing costs of genome sequencing help to gradually lower the threshold for what is considered an unfeasibly large genome by genome sequencing consortia. Indirectly, the large genomes would have exerted a strong impact on the selection of species at stage 2 and hence on the species included. Due to the large genomes and the limited funding for total genome size, the inclusion of large genomes would have substantially reduced the number of species compared to the actual selection. This reduction in numbers would have a strong impact on the country's representation. All of the six countries that would not have been considered at stage 2 would only have been included if the sequencing capacity had been three times higher. This effect is also highlighted by the fact that five of them were re-included in the actual selection, which included the exclusion of these large genomes as non-feasible, among others, due to funding limitations. Our results also identify the ability to snap-freeze and maintain a cold chain for the collected sample as a seriously limiting factor to the feasibility of samples for genome sequencing, as almost one-third of the species were considered not feasible due to these factors. The proportion of non-feasible species from EU widening countries among the non-feasible species is almost identical to the proportion of all species included in stage 2. In general, for country representation, the original number of species nominated and included in stage 2 had a much stronger impact. Countries, which were on the borderline of being included or not, had usually nominated less than 1% of all species. The exceptions were Sweden, Hungary, and Romania, with 2.5%, 2.5% and 1.8% nominated species. The former two were re-included at the feasibility check, while Romania dropped out. On the other hand, all countries benefiting from the feasibility check by an increased number of selected species had nominated 5–18% of all species. Perhaps unsurprisingly, as with taxonomic representation, the number of nominated species exerted a strong impact on the final country representation. Nonetheless, the problem of maintaining a cold chain can disproportionately affect the selection of species occurring in remote locations, such as oceanic islands like the Azores, which also frequently harbor a high proportion of endemic biota, or the deep sea. An obvious solution is access to dry-shippers, i.e., shipping containers whose cooling elements work by absorbing liquid nitrogen with no risk of spillage, which allow them to be brought on regular public transport, including planes. Additionally, this can be avoided through careful logistic planning, e.g., by “bioblitz” excursions with intensive sampling of taxa within a remote location. Finally, the development of new laboratory methods and chemicals to preserve DNA at higher temperatures might counteract this type of bias. Applicability of our criteria to other projects Adopting similar exclusion criteria as herein might be sensible for other genome projects as well. For example, to accomplish the goals of EBP, redundancy across different projects should be avoided as much as possible, allowing a more effective use of resources. In the same vein, avoidance of redundancy is also enhanced if the species is identified accurately, and proof of the identification can be provided by the presence of a voucher for the identification. This will ensure high certainty that the genomes belong to the species it is supposed to represent. This will also allow better guidance of future sequencing efforts across the tree of life. Concerning the ranking criteria, taking the distribution across the tree of life into account will also be beneficial for other genome projects in accomplishing the first phase of the EBP, generating reference genomes for each eukaryotic family, as well as for later phases, generating genomes for all species. Such a criterion will more quickly result in a better representation of genomic resources across the tree of life. Accordingly, knowledge gaps in our understanding of the evolution and ecology of biodiversity on Earth, and thus the lack of knowledge on how to protect and preserve it, can be closed. Similarly, knowledge transfer and capacity building from economically and scientifically stronger countries to less strong countries (e.g., from EU non-widening countries to EU widening ones) can also enable a broader participation in biodiversity genomics and generate new initiatives and funding opportunities to generate genomic resources for different questions in biodiversity research. A diverse input of knowledge and viewpoints into biodiversity genomics is not only a value in itself but shows ways of a new understanding of biodiversity and new research avenues. This can lead to out-of-the-box hypotheses, a shifting focus and view on biodiversity and nature, as well as pointing to new frontiers in research. Following this, consideration of applicability and anchorage in the communities (e.g., local, research, taxonomic) will also facilitate such new research and enhance existing research as more are interested in using the public goods generated by such publicly funded genome projects. Feasibility checks are important for any larger genome project to avoid the unnecessary waste of limited resources. However, more importantly, feasibility checks as conducted herein can also reveal the obstacles still in the way of obtaining reference genomes for certain species. Analysing the rejected species can show if they are rejected due to limitations in our sequencing technology, such as genome or sample size, or in the necessary sampling procedure, such as snap-freezing them and keeping a strict cold chain at all times. This can guide R&D efforts for such species. Community engagement in biodiversity genomics. Another strength of the ADM process and its implementation, as presented herein and part of the BGE project, was that it was explicitly designed for performing each of the key steps using a bottom-up approach. The general community nominated all of the species, as well as decided on the ranking criteria and ranking mechanism applied to the species. This effort ensured that this ADM process is less subjective, based upon the needs of the ERGA community at large and hence beyond the BGE project, and that the communication about the different criteria and how they are applied was transparent from the very beginning. It can be seen that the BGE efforts of generating reference genomes are widely recognized among individual researchers across Europe, as there were almost as many individual researchers as possible species (i.e., 92 for 99). Moreover, the ability to test the outcomes a priori allowed adjustment to the process early on, minimizing the risk of unintentional outcomes in contradiction to the desired main goals. Despite the challenges with feasibility criteria and funding limitations, the ADM process also ensured a relatively high taxonomic and country representation, with 62.5% and 68.8% of the nominated phyla and countries being considered. However, an unevenness in both remained due to strong biases in the nominated species with respect to the number of species nominated for a phylum or a country. Hence, while the ADM process was able to level the playing field to some degree, it still could not resolve the initial bias in preferences. This could also be considered a strength of the approach, as the nomination process also shows where the majority of the community sees a need for reference genomes. Alternatively, independent of the limitations imposed by the feasibility criteria, it could also indicate that more efforts are needed to engage the scientific community from the underrepresented countries and phyla. Finally, even though not the main goal in this ADM process, one aim was also to improve JEDI aspects in the distribution of the BGE resources across Europe. This is in line with the broader goal of the ERGA community to consider such aspects 36 . However, due to the low weight assigned to this criterion in the decision model and the absence of additional steps addressing it, no enrichment in these factors was achieved. In fact, in some cases, like for gender balance, the opposite occurred. The reason for this is that the JEDI category became only relevant for a very few species; those on the borderline of being included (if at all), and only when the species were identical in the previous two levels of the model. Hence, its overall impact is limited, and accordingly, any intrinsic bias present in the species scoring highest in the first two levels is transgressed through the process. The reasons for this can be multiple. For example, concerning gender balance, male researchers are slightly less represented among the non-feasible species, with 48% instead of 49%, slightly favoring them for the final list of selected species. Similarly, male researchers were present from 26 countries and female researchers from 22 countries, considering all species at stage 2. Given the goal to increase country representation, among the selected species, male researchers are present from 18 countries and female researchers from 13 countries. Hence, given the low sample size of 99 species, these small intrinsic biases with gender representation among countries and feasible species in the pool of nominated species can explain the slight relative increase in male researchers. Again, a balanced representation already in the nomination pool seems more crucial here than tweaking the ADM process itself. This can, for example, be accomplished in the future by targeted outreach in underrepresented groups and countries, encouraging their participation and lowering barriers to nominating species with an even better information flow. Future directions of ADMs in biodiversity genomics. Herein, we applied a fixed ranking list with only two rounds of nomination. However, for future (and ongoing) biodiversity projects, it is possible to envision a dynamic list of candidate species. The ranking could run on a rolling basis as new species come in. Species will get added to the initial candidate list, and the species ranking is repeated each time. A process with repeated (or continuous) species additions will be more similar to triage, i.e., a prioritisation process 15 . Specifically, species that have started being processed will not be stopped, but newly added species may be prioritised over species that were already on the list. This could be the situation if, for instance, the project aims at acquiring species by means of subsequent bioblitzes, i.e., intensive sampling events where numerous species are collected at once and repeatedly throughout the project. This would ensure a more effective usage of the capacity in line with the set goals. For example, important gaps in taxonomic representation could be filled much faster by such an approach, as species filling such gaps would instantaneously be prioritized and processed. In principle, such an approach could already be conducted with the ADM process presented here. New species are added to the initial list, and species already being processed are removed from the list. Additionally, scores such as taxonomic representation are adjusted for all species in light of new information. For example, taxonomic representation of a species could be set to a lower taxonomic level if a species of the same higher taxonomic level is already being processed within the project or by other projects. The ADM itself takes only a few minutes, and the maintenance of the list is probably more time-demanding than the ADM. Conclusion With the BGE project, we show that involving the community in the nomination and selection of species is an efficient and equitable way to maximise and ensure diversity in sampling species for biodiversity studies. The lack of reduction in diversity in the two key ranking criteria highlighted here, i.e., taxonomic representation and country representation, demonstrates that the automated species ranking applied by BGE respected the diversity of needs from the broader ERGA community. These findings reflect the strength of our approach and its capacity to balance different interests, which might be beneficial for other consortia to consider when designing similar selection processes, even if their specific goals require different ranking criteria. Declarations Author contributions THS, AB & RAO conceived the study. All authors contributed to the development of the species selection process including the ADM and its criteria and weighing schemes. THS developed the R script of the ADM and conducted the analyses of the simulated and empirical data. TM, CdG and RM compiled all empirical data and checked its quality. THS and TM were major contributors in writing a first draft of the manuscript, on which the other authors contributed and commented. All authors read and approved the final manuscript. Funding Declaration Biodiversity Genomics Europe (Grant no.101059492) is funded by Horizon Europe under the Biodiversity, Circular Economy and Environment call (REA.B.3); co-funded by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract numbers 22.00173 and 24.00054; and by the UK Research and Innovation (UKRI) under the Department for Business, Energy and Industrial Strategy’s Horizon Europe Guarantee Scheme This study was also funded by the Research Council of Norway (Grant no. 300587 to T.H.S.). The funders played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript. Acknowledgements For their valuable discussions concerning the selection process and the individual criteria, we would like to acknowledge the contributions by Ana Riesgo Gil (Museo Nacional de Ciencias Naturales), Bernhard Hausdorf (Leibniz Institute for the Analysis of Biodiversity Change) and Alice Minotto (Earkham Institute). Competing interests All authors declare no financial or non-financial competing interests. Data and code availability The R script for the ADM process, as well as datasets to test the script, are available at: https://github.com/torstenstruck/BGE_species_priorization. References Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746, doi: 10.1038/s41586-021-03451-0 (2021). Consortium, T. D. T. o. L. P. et al. Sequence locally, think globally: The Darwin Tree of Life Project. Proc. Natl. Acad. Sci. 119, e2115642118, doi:doi: 10.1073/pnas.2115642118 (2022). Marcussen, T. & Jakobsen, K. S. En nasjonal dugnad for å kartlegge genomene til alle artene i Norge - EBP-Nor. Biolog 1, 6–12 (2023). Mazzoni, C. J., Ciofi, C. & Waterhouse, R. M. Biodiversity: an atlas of European reference genomes. Nature 619, 252, doi: https://doi.org/10.1038/d41586-023-02229-w (2023). Formenti, G. et al. The era of reference genomes in conservation genomics. Trends Ecol. Evol. 37, 197–202, doi: 10.1016/j.tree.2021.11.008 (2022). Mc Cartney, A. M. et al. The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics. npj Biodiversity 3, 28, doi: 10.1038/s44185-024-00054-6 (2024). Böhne, A. et al. Contextualising samples: supporting reference genomes of European biodiversity through sample and associated metadata collection. npj Biodiversity 3, 26, doi: 10.1038/s44185-024-00053-7 (2024). Howard, C. et al. On the path to reference genomes for all biodiversity: laboratory protocols and lessons learned from processing over 2,000 species in the Sanger Tree of Life. GigaScience 14, doi: 10.1093/gigascience/giaf119 (2025). Amano, T. et al. The manifold costs of being a non-native English speaker in science. PLoS Biol. 21, e3002184, doi: 10.1371/journal.pbio.3002184 (2023). Régner, I., Thinus-Blanc, C., Netter, A., Schmader, T. & Huguet, P. Committees with implicit biases promote fewer women when they do not believe gender bias exists. Nature Human Behaviour 3, 1171–1179, doi: 10.1038/s41562-019-0686-3 (2019). Hochkirch, A. et al. A strategy for the next decade to address data deficiency in neglected biodiversity. Conserv Biol 35, 502–509, doi: 10.1111/cobi.13589 (2021). Lewin, H. A. et al. Earth BioGenome Project: Sequencing life for the future of life. Proc. Natl. Acad. Sci. 115, 4325–4333, doi: 10.1073/pnas.1720115115 (2018). Mökander, J., Morley, J., Taddeo, M. & Floridi, L. Ethics-Based Auditing of Automated Decision-Making Systems: Nature, Scope, and Limitations. Science and Engineering Ethics 27, 44, doi: 10.1007/s11948-021-00319-4 (2021). Rizk, A. & Lindgren, I. 237–253 (Springer Nature Switzerland). Bazyar, J., Farrokhi, M., Salari, A. & Khankeh, H. R. The Principles of Triage in Emergencies and Disasters: A Systematic Review. Prehospital and Disaster Medicine 35, 305–313, doi: 10.1017/S1049023X20000291 (2020). Mohini, T. Demystifying search indexing and ranking. International Journal of Research in Computer Applications and Information Technology (IJRCAIT) 7, 166–174 (2024). Zahir, E. & Henda, J. Comparative Analysis of Page Ranking Algorithms for Efficient Information Retrieval. American Journal of Information Science and Technology 9, 15–23, doi: 10.11648/j.ajist.20250901.12 (2025). Leonard, J. A. et al. Sample Manifest Standard Operating Procedure - Version: 2.5.1. 34 (2025). European Commission: Directorate-General for, R. & Innovation. Horizon Europe – Widening participation and spreading excellence across Europe – Boosting research and innovation performance throughout the Union . (Publications Office of the European Union, 2021). Huber, J. T. The importance of voucher specimens, with practical guidelines for preserving specimens of the major invertebrate phyla for identification. J. Nat. Hist. 32, 367–385, doi: 10.1080/00222939800770191 (1998). Thomson, S. A. et al. Taxonomy based on science is necessary for global conservation. PLoS Biol. 16, e2005075, doi: 10.1371/journal.pbio.2005075 (2018). Godfray, H. C. J., Knapp, S. & Wilson, E. O. Taxonomy as a fundamental discipline. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 359, 739–739, doi:doi: 10.1098/rstb.2003.1440 (2004). Favret, C. The 5 ‘D’s of Taxonomy: A User’s Guide. The Quarterly Review of Biology 99, 131–156, doi: 10.1086/732044 (2024). Rivera, D. et al. What is in a name? The need for accurate scientific nomenclature for plants. Journal of Ethnopharmacology 152, 393–402, doi: https://doi.org/10.1016/j.jep.2013.12.022 (2014). Cerca, J., Purschke, G. & Struck, T. H. Marine connectivity dynamics: clarifying cosmopolitan distributions of marine interstitial invertebrates and the meiofauna paradox. Mar. Biol. 165, 123, doi: 10.1007/s00227-018-3383-2 (2018). Schmidt-Rhaesa, A. Guide to the Identification of Marine Meiofauna . (München: Dr. Friedrich Pfeil, 2020). Martínez, A. et al. Fundamental questions in meiofauna research highlight how small but ubiquitous animals can improve our understanding of Nature. Communications Biology 8, 449, doi: 10.1038/s42003-025-07888-1 (2025). Oliver, M. J., Petrov, D., Ackerly, D., Falkowski, P. & Schofield, O. M. The mode and tempo of genome size evolution in eukaryotes. Genome Res. 17, 594–601, doi: 10.1101/gr.6096207 (2007). Figuerola, B. et al. Interactive effects of ocean acidification and warming disrupt calcification and microbiome composition in bryozoans. Communications Biology 8, 1135, doi: 10.1038/s42003-025-08524-8 (2025). Roberts, N. G., Gilmore, M. J., Struck, T. H. & Kocot, K. M. Multiple Displacement Amplification Facilitates SMRT Sequencing of Microscopic Animals and the Genome of the Gastrotrich Lepidodermella squamata (Dujardin 1841). Genome Biol. Evol. 16, doi: 10.1093/gbe/evae254 (2024). Laumer, C. Picogram input multimodal sequencing (PiMmS) , %3Chttps://dx.doi.org/10.17504/protocols.io.rm7vzywy5lx1/v1%3E (2023). Bein, B. et al. Long-read sequencing and genome assembly of natural history collection samples and challenging specimens. Genome Biology 26, 25, doi: 10.1186/s13059-025-03487-9 (2025). Nishii, K. et al. A high quality, high molecular weight DNA extraction method for PacBio HiFi genome sequencing of recalcitrant plants. Plant Methods 19, 41, doi: 10.1186/s13007-023-01009-x (2023). Angthong, P. et al. Optimization of high molecular weight DNA extraction methods in shrimp for a long-read sequencing platform. PeerJ 8, e10340, doi: 10.7717/peerj.10340 (2020). Petersen, C. et al. High molecular weight DNA extraction methods lead to high quality filamentous ascomycete fungal genome assemblies using Oxford Nanopore sequencing. Microbial Genomics 8, doi: https://doi.org/10.1099/mgen.0.000816 (2022). Struck, T. H., Hessling, R. & Purschke, G. The phylogenetic position of the Aeolosomatidae and Parergodrilidae, two enigmatic oligochaete-like taxa of the ‘Polychaeta’, based on molecular data from 18SrDNA sequences. J. Zool. Syst. Evol. Res. 40, 155–163 (2002). Cowie, R. H., Bouchet, P. & Fontaine, B. The Sixth Mass Extinction: fact, fiction or speculation? Biol. Rev. 97, 640–663, doi: https://doi.org/10.1111/brv.12816 (2022). Lewin, H. A. et al. The Earth BioGenome Project 2020: Starting the clock. Proc. Natl. Acad. Sci. 119, e2115635118, doi:doi: 10.1073/pnas.2115635118 (2022). Tables Tables 1 and 2 are available in the Supplementary Files section. Box 1 Box 1 is available in the Supplementary Files section Additional Declarations No competing interests reported. Supplementary Files SupplementaryData01.xlsx Supplementary_Data_01.xlsx – a more detailed description of the selection criteria SupplementaryInformation.pdf SupplementaryInformation.pdf – a more detailed description of the results and methods Box1.docx Tables.docx Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 11 Feb, 2026 Reviews received at journal 14 Nov, 2025 Reviewers agreed at journal 14 Nov, 2025 Reviewers invited by journal 12 Nov, 2025 Editor assigned by journal 03 Nov, 2025 Submission checks completed at journal 01 Nov, 2025 First submitted to journal 26 Oct, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7957242","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":549124992,"identity":"5b454eb8-61bc-4a11-8ce6-347d2995e3c8","order_by":0,"name":"Torsten H. Struck","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzElEQVRIiWNgGAWjYFACHhjNfIAZRLMR1gDSkgBisSWQrIXHgJkoZ9kz8B5g/PnjXmJ/z5lvjwtzGBL72BvYHnzAawtfAjNPQnHijLO9241nbmNIbOM5wG44A7/DgO5JSEjcwM+7TZoXpEUigU2ah4AWxh9gLTzPiNfCwAPSwtvDRqSWwzwGh3nSEoxnnDlmBtQiYdzGc7BNEp9f2Nt7DB/+sEmQ7e9JBjnMRnZ+e/MxCXwhxgCMiwNIXAkgZmzAp2EUjIJRMApGAREAAOVgPWCBcak9AAAAAElFTkSuQmCC","orcid":"","institution":"University of Oslo","correspondingAuthor":true,"prefix":"","firstName":"Torsten","middleName":"H.","lastName":"Struck","suffix":""},{"id":549124993,"identity":"70672cbf-6d16-4f9e-9c0b-c77fa74c0e19","order_by":1,"name":"Thomas Marcussen","email":"","orcid":"","institution":"University of Oslo","correspondingAuthor":false,"prefix":"","firstName":"Thomas","middleName":"","lastName":"Marcussen","suffix":""},{"id":549124994,"identity":"710d0a56-95d2-44c4-acde-12148c41761d","order_by":2,"name":"Astrid Böhne","email":"","orcid":"","institution":"Leibniz Institute for the Analysis of Biodiversity Change, Museum Koenig Bonn","correspondingAuthor":false,"prefix":"","firstName":"Astrid","middleName":"","lastName":"Böhne","suffix":""},{"id":549124995,"identity":"a8cd2d29-a68b-4edd-8608-5a1f32744018","order_by":3,"name":"Rosa Fernández","email":"","orcid":"","institution":"Spanish National Research Council - Pompeu Fabra University","correspondingAuthor":false,"prefix":"","firstName":"Rosa","middleName":"","lastName":"Fernández","suffix":""},{"id":549124996,"identity":"f454b994-b5bb-4720-9a1c-7e38c8366842","order_by":4,"name":"José Melo-Ferreira","email":"","orcid":"","institution":"CIBIO, Universidade do Porto","correspondingAuthor":false,"prefix":"","firstName":"José","middleName":"","lastName":"Melo-Ferreira","suffix":""},{"id":549124997,"identity":"b94a11a5-a46a-4e20-b086-dde41139aafc","order_by":5,"name":"Isabelle Florent","email":"","orcid":"","institution":"UMR7245, National Museum of Natural History and CNRS","correspondingAuthor":false,"prefix":"","firstName":"Isabelle","middleName":"","lastName":"Florent","suffix":""},{"id":549125000,"identity":"38325cd0-65f4-49e7-9a7a-28376c62868b","order_by":6,"name":"Carmela Gissi","email":"","orcid":"","institution":"Università degli Studi di Bari “Aldo Moro”","correspondingAuthor":false,"prefix":"","firstName":"Carmela","middleName":"","lastName":"Gissi","suffix":""},{"id":549125002,"identity":"f43192eb-d88a-42e2-ada4-d23eec852439","order_by":7,"name":"Christian Guttry","email":"","orcid":"","institution":"SIB Swiss Institute of Bioinformatics, UNIL-Sorge","correspondingAuthor":false,"prefix":"","firstName":"Christian","middleName":"","lastName":"Guttry","suffix":""},{"id":549125004,"identity":"60ac30b1-f22a-4d54-a288-0b83805ed5a4","order_by":8,"name":"Jennifer A. Leonard","email":"","orcid":"","institution":"Estación Biológica de Doñana (EBD-CSIC)","correspondingAuthor":false,"prefix":"","firstName":"Jennifer","middleName":"A.","lastName":"Leonard","suffix":""},{"id":549125005,"identity":"b8e1c39d-28ba-4fff-83ea-0a693fe06c11","order_by":9,"name":"Seanna McTaggart","email":"","orcid":"","institution":"Earlham Institute","correspondingAuthor":false,"prefix":"","firstName":"Seanna","middleName":"","lastName":"McTaggart","suffix":""},{"id":549125006,"identity":"bb97addb-e2a0-4d53-b573-31a0294710cd","order_by":10,"name":"Camila Mazzoni","email":"","orcid":"","institution":"Leibniz Institute for Zoo- and Wildlife Research (IZW)","correspondingAuthor":false,"prefix":"","firstName":"Camila","middleName":"","lastName":"Mazzoni","suffix":""},{"id":549125007,"identity":"b66caa0a-b04a-408f-b0eb-8c73d65223ee","order_by":11,"name":"Rita Monteiro","email":"","orcid":"","institution":"Leibniz Institute for the Analysis of Biodiversity Change, Museum Koenig Bonn","correspondingAuthor":false,"prefix":"","firstName":"Rita","middleName":"","lastName":"Monteiro","suffix":""},{"id":549125009,"identity":"f4cb32fd-e2d5-4e1d-91e4-5b461b7561d9","order_by":12,"name":"Olga Vinnere Pettersson","email":"","orcid":"","institution":"National Genomics Infrastructure, Uppsala University","correspondingAuthor":false,"prefix":"","firstName":"Olga","middleName":"Vinnere","lastName":"Pettersson","suffix":""},{"id":549125010,"identity":"bf20f6f3-5f69-48c5-a115-e9bea41d19ae","order_by":13,"name":"João Pimenta","email":"","orcid":"","institution":"CIBIO, Universidade do Porto","correspondingAuthor":false,"prefix":"","firstName":"João","middleName":"","lastName":"Pimenta","suffix":""},{"id":549125012,"identity":"08286ab2-7076-4850-9224-adee074712c2","order_by":14,"name":"Jaakko Pohjoismäki","email":"","orcid":"","institution":"University of Eastern Finland","correspondingAuthor":false,"prefix":"","firstName":"Jaakko","middleName":"","lastName":"Pohjoismäki","suffix":""},{"id":549125014,"identity":"20545169-97d3-4107-b34c-faabb911a229","order_by":15,"name":"Katja Reichel","email":"","orcid":"","institution":"Freie Universität Berlin","correspondingAuthor":false,"prefix":"","firstName":"Katja","middleName":"","lastName":"Reichel","suffix":""},{"id":549125017,"identity":"bf3897e6-d66b-40d7-8637-82418b7114af","order_by":16,"name":"Andrii Tarieiev","email":"","orcid":"","institution":"Martin Luther University Halle-Wittenberg","correspondingAuthor":false,"prefix":"","firstName":"Andrii","middleName":"","lastName":"Tarieiev","suffix":""},{"id":549125018,"identity":"92752ba6-e831-4f8e-b2cb-773953d7dcca","order_by":17,"name":"Rebekah A. Oomen","email":"","orcid":"","institution":"University of Oslo","correspondingAuthor":false,"prefix":"","firstName":"Rebekah","middleName":"A.","lastName":"Oomen","suffix":""}],"badges":[],"createdAt":"2025-10-27 12:07:51","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7957242/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7957242/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":96596798,"identity":"72024f05-a6ca-4ef9-a313-4e4944394008","added_by":"auto","created_at":"2025-11-24 07:40:06","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1245421,"visible":true,"origin":"","legend":"","description":"","filename":"Manuscriptv2.docx","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/d0007ec8681b00876fe97916.docx"},{"id":96596784,"identity":"9ff8ad20-b786-45a7-8df4-d8024e1bb435","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":16653,"visible":true,"origin":"","legend":"","description":"","filename":"cce2fdeaadd943b38d4e28bf461328d5.json","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/c45cf0c6621c9a9009ea6131.json"},{"id":96596769,"identity":"bea7534c-5895-44e3-ae69-d1db147c112d","added_by":"auto","created_at":"2025-11-24 07:40:03","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4035539,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryInformation.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/ae9708006c281c6fc4a95eef.pdf"},{"id":96605634,"identity":"f4ae0b86-aaac-4894-9467-8bdadcbd73f0","added_by":"auto","created_at":"2025-11-24 09:23:42","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":16796,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryData01.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/49fc80453e96f385a5ad698f.xlsx"},{"id":96596778,"identity":"11047e42-34dd-4fc9-b5a6-518c69d030b8","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"xml","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":133542,"visible":true,"origin":"","legend":"","description":"","filename":"cce2fdeaadd943b38d4e28bf461328d51enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/46216eb728712bdb26e71100.xml"},{"id":96596795,"identity":"5b3d101c-86ca-4f89-b439-ecc1d78f112b","added_by":"auto","created_at":"2025-11-24 07:40:05","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":295503,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/438ae32e1de581b05e19152e.png"},{"id":96596789,"identity":"1b26ca8c-55ae-428e-91f6-942c55cda65e","added_by":"auto","created_at":"2025-11-24 07:40:05","extension":"jpeg","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1730786,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/2addf72b4ec2116e1f90ea09.jpeg"},{"id":96596777,"identity":"894cbd51-2357-4d74-a1c4-5b8b2c7df9f1","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"jpeg","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":546008,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/312430f3069f82b9f585c3a5.jpeg"},{"id":96596791,"identity":"ce0f467c-0423-4351-a832-1ec7e8b1bc35","added_by":"auto","created_at":"2025-11-24 07:40:05","extension":"jpeg","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":552959,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/8c85e554c1f6b6fc1417e4b8.jpeg"},{"id":96596783,"identity":"64fa04d0-2824-4430-9891-1cb568ff4bb3","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"jpeg","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":266240,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/e71b895e58d8622f17edb03c.jpeg"},{"id":96596782,"identity":"d7a96ced-804b-4dbd-9f12-4288d2fdb1d6","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"jpeg","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":207988,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage6.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/97bd8d2b14fd79ad33d71a2d.jpeg"},{"id":96596771,"identity":"259a4ee9-be4d-4ad1-8125-613dc278fdaf","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"jpeg","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":273616,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage7.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/c44e1fc1b8b04a41b1928fff.jpeg"},{"id":96596785,"identity":"9e1c05be-52d1-4b93-b060-44de6ac229b6","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"jpeg","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":675109,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage8.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/bce68c1f999cd754003ef7d7.jpeg"},{"id":96596766,"identity":"39976317-dcdd-4664-984b-b8ab5228ae15","added_by":"auto","created_at":"2025-11-24 07:40:03","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":93046,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/4de78e0654b63a0ecadc6e0a.png"},{"id":96596781,"identity":"f1115012-2f22-44ac-b812-ef3c46d5574b","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":337738,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/5a56f07a75284d75fe8800d1.png"},{"id":96596786,"identity":"5df1bf75-ea21-4146-adca-70690bb0f166","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"png","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":98634,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/82d985a2af255ddeb9edcb66.png"},{"id":96596792,"identity":"68ec8029-305d-4e77-8d13-c91efacf226f","added_by":"auto","created_at":"2025-11-24 07:40:05","extension":"png","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":91708,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/106258404cccce356c4f549b.png"},{"id":96596768,"identity":"c58e978b-bed0-4125-9504-e165b3c79996","added_by":"auto","created_at":"2025-11-24 07:40:03","extension":"png","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":64109,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/49e6d8fde1ac417e25e7a4a1.png"},{"id":96596775,"identity":"fee50f5d-41b9-49ca-bd51-8a4abd9261fe","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"png","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":38388,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/7b5dfaa33f4149e29f2486fb.png"},{"id":96605369,"identity":"8692adb1-cf08-4f80-a301-0b99e90cd417","added_by":"auto","created_at":"2025-11-24 09:22:36","extension":"png","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":48562,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/0bd1c0a182ac6119c289c474.png"},{"id":96596770,"identity":"9d2b36bc-a127-47ef-b9af-e75ce553c969","added_by":"auto","created_at":"2025-11-24 07:40:03","extension":"png","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":129470,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/a6b54dcfc68e5da8dc7ae8ce.png"},{"id":96596779,"identity":"7c0eaae9-9cc7-42e4-9018-8f1eea989fea","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"xml","order_by":22,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":133425,"visible":true,"origin":"","legend":"","description":"","filename":"cce2fdeaadd943b38d4e28bf461328d51structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/1bbce0d94ae8ad6a6e447223.xml"},{"id":96596790,"identity":"3169bdcd-dc8e-46a0-97e9-7950fce5a72f","added_by":"auto","created_at":"2025-11-24 07:40:05","extension":"html","order_by":23,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":151455,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/632c1e1084d49d782280827d.html"},{"id":96605934,"identity":"f8fb0a64-ce81-4899-8b5c-f3f6e049ae31","added_by":"auto","created_at":"2025-11-24 09:24:24","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":218315,"visible":true,"origin":"","legend":"\u003cp\u003eFlow-chart of the automated decision-making (ADM) process for species selection of sequencing of reference genomes suggested by the community within the ERGA stream of the BGE project. The whole ADM process itself is highlighted by the grey box and subparts by white boxes. JEDI = Justice, Equity, Diversity, and Inclusion.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/64361df597673951f50b352d.png"},{"id":96596787,"identity":"3cb057ae-0030-48e2-a10c-aeadcf481932","added_by":"auto","created_at":"2025-11-24 07:40:05","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":202453,"visible":true,"origin":"","legend":"\u003cp\u003eA) Heatmap of simulation studies of the enrichment of the different categories given the models (first number, 1-8) and number of selected species (second number, 100, 200, or 300). B \u0026amp; C) Heatmaps for empirical data. B) Heatmap of the enrichment factor of the different species selection and models versus the categories. C) Heatmap of the intersection between the 50 species selected by the eight different models. App = Applicability, Cer = Certainty, Cou = Country representation, Jed = JEDI, Nov = Novel leader, Tax = Taxon representation.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/040599ae014bfa28db1cec7f.png"},{"id":96596780,"identity":"09e65b5a-e905-4b60-9a98-4c6edee98ede","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":155415,"visible":true,"origin":"","legend":"\u003cp\u003eRepresentation of the countries where the species are collected, for all nominated species and the 50 selected species using models #7 or #8 in the test with empirical data.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/060789829c05d0b213ab59dd.png"},{"id":96596767,"identity":"61b747f0-d90a-4f94-b89e-d39e9ce46e18","added_by":"auto","created_at":"2025-11-24 07:40:03","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":84888,"visible":true,"origin":"","legend":"\u003cp\u003ePhylum representation at three stages of species ranking for the empirical BGE community dataset. From left to right: all 593 nominated species, before the 1st stage (exclusion); the 67 species that would have been selected after the decision model at the 2nd stage (ranking); the 99 species that passed the feasibility check and were selected. The taxonomy follows ENA. The absolute number of species is shown to the right when larger than 1.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/133f8f778693b76558d76adf.png"},{"id":96596773,"identity":"a66a9b41-78ee-482b-935d-d930d92c2f08","added_by":"auto","created_at":"2025-11-24 07:40:04","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":109430,"visible":true,"origin":"","legend":"\u003cp\u003eCountry representation at three stages of species ranking for the empirical BGE community dataset. From left to right: all 593 nominated species, before 1st stage (exclusion); the 67 species that would have been selected that would have been selected after the decision model at the 2nd stage (ranking); the 99 species that passed the feasibility check and were selected. Country names are shown as ISO 3166 country codes. The number of species is shown to the right when larger than 1.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/b429e853011caa668ce5aff8.png"},{"id":96608535,"identity":"c7d46c49-9a0f-4cb7-b663-d37b5409a89c","added_by":"auto","created_at":"2025-11-24 09:28:54","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1310972,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/5428f80e-d05c-4547-a150-2ab10a226292.pdf"},{"id":96596793,"identity":"d8159a1c-2792-452c-8919-2482c8b863a1","added_by":"auto","created_at":"2025-11-24 07:40:05","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":16796,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementary_Data_01.xlsx – a more detailed description of the selection criteria\u003c/p\u003e","description":"","filename":"SupplementaryData01.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/736be575f5926c561e011fbe.xlsx"},{"id":96605590,"identity":"f2816d5f-a863-4608-baca-95721b71cbc0","added_by":"auto","created_at":"2025-11-24 09:23:33","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":4035539,"visible":true,"origin":"","legend":"\u003cp\u003eSupplementaryInformation.pdf – a more detailed description of the results and methods\u003c/p\u003e","description":"","filename":"SupplementaryInformation.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/a532396c5e7e6c7d7dac076b.pdf"},{"id":96596788,"identity":"3a8fbc5d-2af9-440c-ba32-62217c310cec","added_by":"auto","created_at":"2025-11-24 07:40:05","extension":"docx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":23699,"visible":true,"origin":"","legend":"","description":"","filename":"Box1.docx","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/7ffc56028bb6cfb57f094d35.docx"},{"id":96596797,"identity":"b39f4c77-08b5-4184-bd26-74b5e738ae6d","added_by":"auto","created_at":"2025-11-24 07:40:06","extension":"docx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":17981,"visible":true,"origin":"","legend":"","description":"","filename":"Tables.docx","url":"https://assets-eu.researchsquare.com/files/rs-7957242/v1/2f64e4d38529614cab5e62a0.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"An automated decision-making procedure for ranking and selecting species in biodiversity projects","fulltext":[{"header":"Introduction","content":"\u003cp\u003eMolecular biodiversity studies, most notably large-scale genome sequencing initiatives\u003csup\u003e\u003cspan additionalcitationids=\"CR2 CR3 CR4 CR5\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e, are typically resource-intensive (see also Box 1). However, the number of potential target species is, in general, much higher than the actual number of species that can be processed. Consequently, data are generated for a limited subset of species within a project\u0026rsquo;s funding period. This normally requires that species are prioritised and selected for further analyses by a complex decision-making procedure based on various criteria, which may be scientific, technical, or social\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. Numerous criteria are potentially relevant for the ranking of candidate species for genome sequencing. These can involve characteristics of the species itself, such as taxonomic representation, characteristics of the available specimen(s), such as karyotype (sex), or geographical distribution. Technical criteria, e.g., related to sampling procedures such as access to liquid nitrogen or maintenance of a cold chain can be crucial. Political or strategic decisions within the consortium, such as regional and institutional representation, can also be relevant. These criteria can also conflict with one another, making objective decision making difficult. Accordingly, there is a high chance that in such species selection procedures, inherently subjective components can enter the decision process. For example, there are generally intrinsic advantages for English native speakers in many decision processes\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. Unconscious biases can also exist towards, e.g., gender\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u003c/sup\u003e. Similar biases can also exist towards more iconic and better-known taxa, or economically important taxa, which is to the detriment to both conservation goals\u003csup\u003e\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u003c/sup\u003e and the goals of EBP\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. Ideally, species selection procedures should be implemented that are transparent and objective.\u003c/p\u003e\u003cp\u003eAutomated decision-making (ADM) processes are widely used across various fields in society to ensure consistency, efficiency, and accuracy in decision-making when the number of criteria is high and the evaluation is complex\u003csup\u003e\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e,\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e\u003c/sup\u003e. A distinction may be made between \u003cem\u003eprioritisation\u003c/em\u003e procedures, by which targets are queued for action, and \u003cem\u003eranking\u003c/em\u003e procedures, by which targets are ordered for evaluation. Automated prioritisation procedures are used in e.g., emergency room triage, cybersecurity threat detection, and task management systems to continuously ensure that high-value or urgent items receive attention first\u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. Automated ranking procedures are used to sort options based on pre-defined criteria such as value or relevance, e.g., in search engines, recommendation systems, and accreditation rankings\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e. The purpose of implementing ADM processes is to save time, reduce human biases and ensure a higher degree of consistency in decision-making. Specifically, the implementation of an ADM requires a set of clearly defined criteria. The selection of this set of criteria can be achieved, through mechanisms such as a general vote or other bottom-up, transparent decisions. We therefore argue that an ADM process may ease resource allocation in biodiversity genome projects, while also making the process more transparent and democratic.\u003c/p\u003e\u003cp\u003eThe Biodiversity Genomics Europe (BGE) project (EU Horizon Europe BGE Project, Grant Agreement No 101059492) was based on large community involvement. Accordingly, one goal of the BGE project aimed to sequence a set of reference genomes through a community-driven approach. In this approach, scientists nominated a species and then, if selected, provided samples/specimens. To this aim, BGE created a transparent ADM process for selecting species based on a set of \u003cem\u003ea priori\u003c/em\u003e objective criteria from the pool of nominated species.\u003c/p\u003e\u003cp\u003eTo develop the ADM process and its criteria, BGE leveraged its foundation in the European Reference Genome Atlas (ERGA) community and its experience with a community-based pilot project\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e,\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e. The ERGA community brings together broad taxonomic expertise across the eukaryotic tree of life, along with deep knowledge of the laboratory methods needed for sequencing reference genomes. Moreover, through its governance structure, which includes a council of national representatives, ERGA stands for a broad representation across Europe and democratic decisions.\u003c/p\u003e\u003cp\u003eHere, we present the ADM procedure for ranking nominated species from across Europe as implemented by BGE. As part of this presentation, we contribute (1) a community-defined criteria set, (2) a reproducible ranking algorithm, and (3) a validation of the algorithm by simulated and empirical data. To our knowledge, such an ADM-based ranking procedure is unique among biodiversity genomics projects. This also includes that it was decided on in an open, bottom-up process. Therefore, we anticipate that this outcome from BGE has value to other large-scale biodiversity projects, and hence we provide (4) some guidance for other consortia.\u003c/p\u003e"},{"header":"Automated species selection process","content":"\u003cp\u003eA prerequisite for any selection procedure is that the number of candidates exceeds the number that can be processed. Within the available funding for the community-driven BGE task, sequencing was limited to a total of 150 Gb of estimated haploid genome size. To obtain a list of candidate species for genome sequencing we held two consecutive open calls, in which scientists and citizens could nominate European species for reference genome sequencing. This nomination was completed by filling in an online questionnaire with information that was later used to rank the species (see \u0026ldquo;Questionnaire for the nomination in round #2\u0026rdquo; in the Supplementary Information). The nomination process resulted in a list of 593 species nominations, and, based on the information provided in the nomination forms, summed to an estimated haploid genome size of about 1000 Gigabases (Gb). This means that the total genome size of the nominations was ca. seven times higher than the project's capacity. After the ranking procedure, 99 species (17% of the total) with a combined projected 150 Gb genome size were selected for genome sequencing.\u003c/p\u003e\u003cp\u003eThe species selection process implemented in the BGE project was a three-stage ADM process (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) that comprises (1) an exclusion stage, (2) a general ranking stage employing a decision-tree and additional ranking for country and individual researcher representation, and (3) a feasibility check followed by an additional adjustment for genera with multiple species suggestions. The result of the ADM process is different lists including the one of selected species and a waiting list.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cb\u003e1st stage. Species exclusion\u003c/b\u003e. Three criteria at the first stage needed to be fulfilled for a species to proceed to stage 2. If any of them was not fulfilled, the species was excluded. These specific criteria are \u0026ldquo;Availability of published reference genomes\u0026rdquo;, \u0026ldquo;Redundancy\u0026rdquo;, and \u0026ldquo;Availability of voucher specimen\u0026rdquo; (Table\u0026nbsp;1 \u0026amp; Supplementary Data 01, please also see \u0026ldquo;Individual selection criteria\u0026rdquo; in the Supplementary Information for details). At this stage, 148 species were excluded, and 445 species moved forward to stage 2 (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003cb\u003e2nd stage. Species ranking\u003c/b\u003e. In the second stage, all species were first ranked using a decision model based on 11 criteria (Table\u0026nbsp;1 \u0026amp; Supplementary Data 01), information for those was obtained from the sample providers when they nominated the species. These 11 criteria are grouped into six categories to make the process easier. Moreover, some of the individual criteria assess different aspects of the same topic. The six categories are: \u0026ldquo;Taxonomic representation\u0026rdquo; (1 criterion; \u0026ldquo;Taxonomic representation on Tree of Life\u0026rdquo;), \u0026ldquo;Certainty\u0026rdquo; (2 criteria; \u0026ldquo;Certainty of species identification\u0026rdquo; \u0026amp; \u0026ldquo;Type locality\u0026rdquo;), \u0026ldquo;Country representation\u0026rdquo; (2 criteria; \u0026ldquo;A minimum of one species per country\u0026rdquo; \u0026amp; \u0026ldquo;Countries with fewer genomic resources\u0026rdquo;), \u0026ldquo;JEDI\u0026rdquo; (3 criteria; \u0026ldquo;Gender of researcher or gender balance of team\u0026rdquo;, \u0026ldquo;Researcher of underrepresented minority\u0026rdquo; \u0026amp; \u0026ldquo;Diversity \u0026amp; inclusiveness of team\u0026rdquo;), \u0026ldquo;Novel leader\u0026rdquo; (1 criterion; \u0026ldquo;Novel sample coordinator\u0026rdquo;), and \u0026ldquo;Applicability\u0026rdquo; (2 criteria; \u0026ldquo;Application of the genomes\u0026rdquo; \u0026amp; \u0026ldquo;Reach/breadth of community\u0026rdquo;).\u003c/p\u003e\u003cp\u003eA weighting scheme is assigned to each category (explained in detail in \u0026ldquo;Weighting scheme of each category\u0026rdquo; in the Supplementary Information). In brief, the scheme of the category \u0026ldquo;Taxonomic representation\u0026rdquo;, ranging from genus to phylum (and beyond) level applied weights from 1\u0026ndash;5, with higher weights for higher taxonomic units. In the one of \u0026ldquo;Certainty\u0026rdquo;, weights of 1, 2, 5 \u0026amp; 6 depended on how close to the type locality the sample was collected and how certain the species identification could be accomplished. Higher weights were given to species with higher taxonomic certainty. In the category \u0026ldquo;Country representation\u0026rdquo;, one criterion contributes to the knowledge transfer and hence has been linked to the country of the sample coordinator. The other one contributes to an equal representation of species from across Europe and is accordingly connected to the collection site. This means one species can represent two countries in this category. Weights of 1, 4 \u0026amp; 5 were applied with higher weights for countries from the list of EU Widening countries and the single species being nominated from the country. In the category \u0026ldquo;JEDI\u0026rdquo;, weights of 1\u0026ndash;5 are applied depending on the sample coordinator\u0026rsquo;s gender, minority background, and the inclusiveness of the genome team, with higher weights assigned to increase diversity and inclusiveness. The category \u0026ldquo;Novel leader\u0026rdquo; used weights of 0 \u0026amp; 2 to reflect if the sample coordinator has already gotten a species sequenced via BGE/ERGA resources. Finally, the category \u0026ldquo;Applicability\u0026rdquo; had weights of 0\u0026ndash;2 depending on how quickly and broadly the reference genome could be applied outside the nominating group.\u003c/p\u003e\u003cp\u003eConcerning the selection model of the ADM process, eight different ranking models had been developed and tested (for more details on this step of the process, please see \u0026ldquo;Development of different prioritization models\u0026rdquo; in the Supplementary Information). Seven of these were variations of the final model presented below. These explored different combinations of \u0026ldquo;Certainty\u0026rdquo; and \u0026ldquo;Country representation\u0026rdquo; and of \u0026ldquo;Novel leader\u0026rdquo; and \u0026ldquo;Applicability\u0026rdquo; at different levels of a decision tree. The eighth model did not implement a decision tree but summed up the weights of all categories and ranked the species based on this sum. To explore the effect of the different models on the selection process and whether they generated the expected outcome, the models were applied to simulated and empirical data. For a detailed description of the procedures and results, please see \u0026ldquo;Simulation studies\u0026rdquo; and \u0026ldquo;Empirical data\u0026rdquo; in the Supplementary Information.\u003c/p\u003e\u003cp\u003eFor the simulated data, three probability distributions for the weighting scheme of each category were used to randomly assign a weight for each category and simulated species. In one distribution, all weights were drawn with equal probability, and in the other two, the probability was skewed either towards low or high weights being more probable. All possible combinations of these probability distributions were generated, resulting in 729 (3\u003csup\u003e6\u003c/sup\u003e) combinations. For each combination, 100 datasets with 1,000 species were generated, resulting in 72,900 simulated datasets. The eight models described above were applied to each dataset to select either the top 100, 200, or 300 species, resulting in 218,700 sets of selected species. For each dataset, model, and top-selected species, an enrichment factor was calculated for each category. The enrichment factor measures the relative enrichment of a category in the top-selected species in relation to all species as the difference of the mean weight values of the top-selected species to all 1,000 species set in relation to the maximum possible difference. Hence, it is a relative measurement of the maximally possible positive enrichment of a category in the dataset with a maximal value of 1. Negative values are not bound, indicating a lower representation in the top-selected species.\u003c/p\u003e\u003cp\u003eA heatmap of the mean enrichment factor across the 729 combinations for each model and number of top-selected species for each category showed that model #8 is set apart from all other models (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA). Moreover, all categories are more or less equally enriched, with the strongest enrichment in \u0026ldquo;Certainty\u0026rdquo; and the least in \u0026ldquo;Applicability\u0026rdquo;. Among the seven other models, the next split is between the selection of only 100 out of 1,000 species (10%). The models #1 and #3 emphasizing \u0026ldquo;Country representation\u0026rdquo; over \u0026ldquo;Certainty\u0026rdquo; in the species selection are thereby placed a bit closer to the other numbers of top-selected species than the other five. The major difference between these is that \u0026ldquo;Country representation\u0026rdquo; is more strongly enriched than \u0026ldquo;Certainty\u0026rdquo;, while it is the other way around in the models #2, #4\u0026ndash;7. Moreover, in the models #5\u0026ndash;7, which assess both categories together, \u0026ldquo;Certainty\u0026rdquo; has a stronger influence than \u0026ldquo;Country representation\u0026rdquo;. Also, for 200 and 300 selected species, the models #1 and #3 are set apart from the other five, but less prominently. Generally, a similar pattern occurs as with the 100 selected species. In summary, models #1 and #3, #2 and #4, as well as #5\u0026ndash;7 have no strong differences, while model #8 is set apart by having less emphasis on the \u0026ldquo;Taxonomic representation\u0026rdquo;.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eFor the empirical data, we used the 230 suggested species of the first round of suggestions by the community that passed the exclusion stage. Applying the eight different models to the dataset, we roughly selected 10% (25), 20% (50) and 30% (75) of the species, calculated the enrichment factor as above and also determined the intersection of the selected species between the models with the same number of selected species (i.e., for 25, 50 or 75 species).\u003c/p\u003e\u003cp\u003eAs in the simulation data, model #8 is different from all other models (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB). Within model #8, the differences are more pronounced with fewer species selected. The strongest enrichment is \u0026ldquo;Certainty\u0026rdquo;, followed by \u0026ldquo;Country representation\u0026rdquo;, and \u0026ldquo;Applicability\u0026rdquo;. While \u0026ldquo;JEDI\u0026rdquo; and \u0026ldquo;Taxon representation\u0026rdquo; have a low enrichment or none, \u0026ldquo;Novel leader\u0026rdquo; has negative values. In the case of the other seven models, the separation is between the number of selected species. Within the 25 selected species, there is no difference between the models. \u0026ldquo;Taxon representation\u0026rdquo; is strongly enriched, and \u0026ldquo;Applicability\u0026rdquo; is intermediate, while all others are not enriched. Selecting 50 species, enrichment of \u0026ldquo;Taxon representation\u0026rdquo; is not as strong, while enrichment of \u0026ldquo;Certainty\u0026rdquo;, \u0026ldquo;Country representation\u0026rdquo;, and \u0026ldquo;Applicability\u0026rdquo; are increased to intermediate values. \u0026ldquo;JEDI\u0026rdquo; shows no enrichment, while \u0026ldquo;Novel leader\u0026rdquo; has strong negative values. Moreover, models #2 and #4 are slightly different from the other five. Finally, the picture with 75 species looks similar, but for \u0026ldquo;Taxon representation\u0026rdquo; and \u0026ldquo;Novel leader\u0026rdquo; the values are not so strong any longer, and the differences of models #2 and #4, as well as #1 and #3, are a bit more pronounced.\u003c/p\u003e\u003cp\u003eLooking at the intersections of the 50 selected species shows that the overlap of selected species between the models is actually very high, with equal to or more than 62% of species shared among any pair of models (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC). Nonetheless, three groups can be recognized. One contains again only model #8. The other ones are models #2 and #4, which select the same species. The last group contains the remaining five models, which also select the same species. The similarity in species selection between these two last groups is also very high, with 94%. Hence, like with the simulation studies, the similarity between the models #1\u0026ndash;7 is very high.\u003c/p\u003e\u003cp\u003eHowever, a more detailed investigation of the empirical data also showed that the country representation after the decision model was very biased towards certain countries, independent of the applied model (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). For example, Croatia was very strongly represented and relatively more than among the nominating countries. Similarly, Poland and Italy were also highly represented. The original models also introduced a strong bias towards individual researchers as they had predominantly suggested species from underrepresented taxonomic groups (for more details, see \u0026ldquo;Additional step for prioritization\u0026rdquo; in Supplementary Information).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eGiven the results from the simulated and empirical studies, a final decision by the ERGA council was made on two of the eight models tested (i.e., models #7 and #8) and additional rounds of ranking (for more details, see \u0026ldquo;Decisions by the ERGA council\u0026rdquo; in the Supplementary Information). Model #7 was elected. The final model, hence, comprised four decision levels. The first level of ranking is based on the category \u0026ldquo;Taxonomic representation\u0026rdquo; as this was regarded as the most important category by both the larger scientific community and the BGE project (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Hence, species with a higher score in this category were ranked above those with a lower score. Species with the same score in this category are then ranked amongst them based on the combined categories \u0026ldquo;Certainty\u0026rdquo; and \u0026ldquo;Country representation\u0026rdquo;. The scores for these two categories are summed for each species. Then the species were ranked based on the sum. All species that had the same scores in the two higher levels were ranked at the third level based on the category \u0026ldquo;JEDI\u0026rdquo;. Finally, all species that had the same scores in the three top levels were ranked at the last level based on the summed scores of the categories of \u0026ldquo;Novel leader\u0026rdquo; and \u0026ldquo;Applicability\u0026rdquo;.\u003c/p\u003e\u003cp\u003eAdditionally, after the ranking by the decision tree, additional rounds of ranking were conducted (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The ranking of the top-10 species was left unchanged. Next, the best-ranked species of each country that was not included in the top-10 species was ranked after the top-10 species. The order of ranking within these best-ranked species by country is based on the ranking by the decision tree. The country of interest in this process is the country of the collection site and not the country of the institution of the sample coordinator. This addition aimed to achieve a better representation of species from different European countries among the selected species. After this country-based ranking, the best-ranked species of each researcher, suggesting a species that was not included in the top-10 species or the best-ranked country species, is ranked after these species. The procedure is the same as for the country ranking. This ranking aimed to distribute the benefits of the BGE project as widely as possible across the scientific community in Europe. Finally, the remaining species are ranked following the ranking from the decision model. The ranked list of all species after these processes was then transferred to the next stage. No species are excluded during the ranking stage.\u003c/p\u003e\u003cp\u003e\u003cb\u003e3rd stage. Feasibility.\u003c/b\u003e This stage consists of 10 criteria that assess whether a suggested species is feasible to be sequenced, given the available funding, current methodological capabilities\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e, and sample availability (Table\u0026nbsp;1). The feasibility thresholds were determined together with the sequencing centers. Species failing any of the feasibility criteria were flagged as non-feasible and removed.\u003c/p\u003e\u003cp\u003eAfter the feasibility check, a final round of ranking was conducted. An examination of the selected species after the check had shown that more than one species was present for very few genera, due to the fact that multiple species had been suggested for these same genera. This was assessed as not being in agreement with the goal to accomplish a broad taxonomic representation of the eukaryotic biodiversity. Accordingly, the same principle as for the best-ranked species for countries and individual researchers was applied to these genera. The best-ranked species of each genus was kept at its position in the ranking order; all other species were moved to the bottom of the list. Among all the species across the different genera that were placed at the bottom of the list, the same ranking order was kept as it had been in the original list. After the final ranking, a ranked list of feasible and non-excluded species is available. From this list, the top-ranked species are selected for sequencing, given the limit imposed by the sequencing capacity, plus additional species for a waiting list. In our case, we applied 50% of the sequencing capacity as the threshold for the waiting list. Hence, the selection process until stage 3 generates lists of feasible species (selected, on waiting list, or non-selected species), non-feasible species, and excluded species (due to lack of a voucher specimen or the availability of a reference genome, or an ongoing reference-genome project).\u003c/p\u003e\u003cp\u003e\u003cb\u003eCheck of permits.\u003c/b\u003e The check of all necessary legal documentation is placed before the actual sequencing. The legal responsibility that all permits necessary to access the sample for genome generation and genome data release are present, is with the sample coordinator (usually the researcher nominating the species). Before the sample can be sent to the sequencing centres, the sample coordinator must also submit a standardised set of metadata to BGE\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. Upon submission, the sample coordinator acknowledges that all necessary permits are present and must upload them. This crucial part has deliberately been integrated late into the process as the package of necessary permits may depend on the attributed sequencing centre and the country it is based in (e.g., export and import permits). Hence, they can only be provided at this stage of the process.\u003c/p\u003e"},{"header":"Results from the final species ranking","content":"\u003cp\u003eEffect of the selection process\u003c/p\u003e\u003cp\u003eAs mentioned above, we thoroughly tested the selection process and the different possibilities at different stages. Here, we will only present the outcome of the entire ADM process (i.e., the finally selected species) in comparison to the pool of the nominated species and the species that would have been selected without applying stage 3 (Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e \u0026amp; \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). We will also briefly present the results of the feasibility check and the reasons why species were not feasible.\u003c/p\u003e\u003cp\u003eBGE funding allows for the sequencing of 150 Gb in the task \u0026ldquo;T5.5 Critical biodiversity community sequencing\u0026rdquo;. If the selection process had been done without the feasibility check directly after stage 2, 67 species would have added up to 150 Gb and hence been selected. Adding up the predicted genome sizes of the ranked species after the feasibility check (stage 3), 99 species were finally selected. The lower number of possibly selected species after stage 2 is due to the fact that this list would have included some species with very large genomes, such as \u003cem\u003eSomniosus microcephalus\u003c/em\u003e (Chordata) with a genome size of ~\u0026thinsp;10 Gb, \u003cem\u003eCalotriton asper\u003c/em\u003e (Chordata) of ~\u0026thinsp;27 Gb, or \u003cem\u003eApocalathium malmogiense\u003c/em\u003e (Myzozoa) of ~\u0026thinsp;30 Gb. These three species alone would have comprised about 45% of the total 150 Gb.\u003c/p\u003e\u003cp\u003eHowever, the increased number of species is not reflected in a broader taxonomic representation. In contrast, the selected 99 species comprise 15 phyla of the originally 24 nominated phyla (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). The 67 species at stage 2 would have comprised 18 phyla. Due to the feasibility check, the only species representing Chlorophyta, Rhodophyta, Myzozoa, and Tardigrada were excluded, while a representative of Bacillariophyta was added. The phyla Arthropoda (35 species instead of 21), Chordata (19 instead of 9), and especially Mollusca (9 instead of 1) benefited the most from the feasibility check. Finally, even though 148 out of the original 593 nominated species were excluded at stage 1, only one representative of Acanthocephala was removed.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eConsidering the phyla originally nominated with more than 10 species, Arthropoda and Annelida are roughly represented at the same percentage in the finally selected species as in the nominated species. Chordata, Streptophyta, and Basidiomycota are less represented, as relatively more species have been nominated, while Mollusca benefited from the process. Additionally, Platyhelminthes is also more strongly represented in the final list. Without the feasibility check, Arthropoda, Chordata, and Mollusca would also have been less represented compared with the original nominations, with the effect being especially prominent for the latter two.\u003c/p\u003e\u003cp\u003eConcerning the countries, all nominated species comprised 32 countries (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e). The species from Cyprus was excluded at stage 1. Selecting species directly at stage 2 would have resulted in 25 countries. Hence, six countries would have dropped out at this stage (i.e., the Czech Republic, Georgia, Hungary, Montenegro, Sweden, and Slovenia). The actual species selection after the feasibility check reduced the number of selected countries to 22. Specifically, Belgium, the Faroe Islands, Georgia, Iceland, the Netherlands, Romania, Serbia, Slovakia, and the United Kingdom were excluded. Hence, the Czech Republic, Hungary, Montenegro, Sweden, and Slovenia were not excluded as would have happened at stage 2. Generally, the distribution across the other countries is relatively even after both stages, with 14 countries having more than one species considered. However, Spain, France, Croatia, Italy, Switzerland, Germany, and Portugal benefited from the feasibility check with a substantial increase in species coming from these countries. Hence, the spread became less even after the feasibility check.\u003c/p\u003e\u003cp\u003eThe distribution of individual researchers among the selected 99 species was very even, with a total of 92 researchers providing species and only five with two or three species. A similar pattern is also found for the 67 species at stage 2. This is in contrast to all species, where several individuals nominated multiple species. Hence, the goal to distribute the generation of reference genomes across Europe at both the country and individual levels has been generally accomplished, even though it was not perfect at the country level.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eOnly 5% of the selected 99 species had known taxonomic problems, but on the other hand, only about 7% of the 445 species after stage 1 had known taxonomic problems. Similarly, the proportion of species collected at or close to the type locality is about 49%. This is slightly better than all species after stage 1, with 45%. The relative composition of the procedures applied to identify species is not substantially different between the species after stage 1 and the selected species. Nonetheless, identification procedures applying only one approach are generally reduced among the selected species, especially if it is only the identification by a taxonomic expert for the group. Among the selected species, most (~\u0026thinsp;75%) applied more than one method of species identification. In contrast, among all nominated species, \u0026ldquo;only\u0026rdquo; ~64% applied more than one method.\u003c/p\u003e\u003cp\u003eWith respect to the gender distribution, while male researchers comprised about 49% across the species after stage 1, among the selected species, there was an increase to 54%. Similarly, researchers not disclosing their gender also increased among the selected species (~\u0026thinsp;5% from ~\u0026thinsp;3%). Accordingly, female and non-binary researchers were both less represented. The proportion of underrepresented minorities (2.5%) or persons who did not want to disclose it (6.5%) was small in the pool among all species after stage 1, but it still slightly increased due to the process to 3.0% and 7.1%. Moreover, among the selected species, about 52% of the genome teams are purely scientific, while it is only 45% for all species after stage 1. Otherwise, the composition of the genome teams in the selected species is not strongly different from the nominations. Hence, concerning JEDI criteria, the ADM process did not enrich these criteria and sometimes had to some degree the opposite effect.\u003c/p\u003e\u003cp\u003eGeneral effect of the feasibility check at stage 3\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eOf the non-feasible species, 21.6% were excluded due to a genome size of larger than 6 Gb, and 22.1% due to a sample size smaller than a \u003cem\u003eDrosophila\u003c/em\u003e fly or less than 100,000 nucleated cells. 41.3% of the species were excluded because they fulfilled one of the two criteria or both. 2.4% of the species have a small body size with large genomes. All of these species are challenging even for new sequencing technologies. Moreover, 29.1% of the species were excluded exclusively due to small body size and/or large genome size.\u003c/p\u003e\u003cp\u003eAdditionally, 19.7% of the species did not fulfill the criterion that they were already collected or easy to obtain (i.e., they were either common, widespread, and abundant or rare, but locally abundant). Of these 19.7%, 26.2% were also regarded as non-feasible due to the other criteria, but 73,8% of them, or 14.6% of all non-feasible species, are just challenging to collect. Hence, almost half (29.1%+14.6%+3.3%=47.0%) of the species were considered not feasible purely due to biological properties.\u003c/p\u003e\u003cp\u003eConsidering non-biological reasons, 45.3% of the species were considered not feasible due to criteria related to the sampling and sample processing. 29.3% of the species could not be snap-frozen or frozen on dry ice for sample preservation. For 21.3% of the non-feasible species, it was not possible to preserve them within 5 minutes of their death, and for 27.2% it was not possible to maintain a strict cold chain at -70\u0026deg;C. Of the non-feasible species, 31.6% were purely excluded because they could not properly be preserved or maintained. Hence, about one-third of the exclusions happened just due to non-biological criteria.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eWe showed that automated decision-making (ADM) (i) worked on a large-scale biodiversity project, including community input at all levels, (ii) feasibility and funding shaped the final set more than anything else, and (iii) representation improved in places but still reflected the original composition of the nomination pool. Moreover, before a final decision on the ADM process is made, it can be tested using both simulated and empirical data to assess if the desired outcome is achieved. For example, for the ERGA and BGE community, taxonomic representation across the tree of life was the most important factor, in line with the goals of EBP in general\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eThe next important goals were a broad representation across the European scientific community (both on the country and individual level) as well as taxonomic certainty in the species identification. The former one is in line with the EU goals for knowledge transfer to the so-called widening countries\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. The latter strengthens the goals of good taxonomist practice and reliable species identification as the foundation of biological science\u003csup\u003e\u003cspan additionalcitationids=\"CR21 CR22 CR23\" citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e. This is also reflected not only by the applied decision model but also by the additional ranking steps included in the process, which aim to increase the representation of taxa, countries, and individuals, as well as the requirement for specimen vouchers. Across both rounds of selection in the BGE project, the process generally accomplished these goals. However, both taxonomic and country representation were more strongly affected by the different stages of the process. While about 25% of species were excluded at stage 1, only one phylum and one country were no longer represented. Hence, stage 1 had little impact on the taxonomic and country representation. In contrast, both following stages had a stronger impact on both the taxonomic and the country representation. Nonetheless, both taxonomic and country representation were still relatively high and broad due to the ADM process and are not biased by subjective decisions. As we will show in the following, the inclusion and exclusion of phyla and countries can be essentially explained by the combination of feasibility, funding, and nomination.\u003c/p\u003e\u003cp\u003eThe effect of feasibility criteria and funding limitations on species selection.\u003c/p\u003e\u003cp\u003eFeasibility, either direct or indirect, severely affected which species were selected for genome sequencing and, as a consequence, conflicts with the EBP\u0026rsquo;s goal to sequence every eukaryotic species on Earth\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. A large proportion of the BGE samples were omitted because they failed to meet one or several of the \u003cem\u003ea priori\u003c/em\u003e defined feasibility criteria. Interestingly, even though substantially more species were selected after stage 3 than would have been after stage 2, the number of phyla selected reduced further instead of increasing due to the higher number of species. Several of the feasibility criteria are linked to features of the species, such as abundance, genome size, and size of the sample, and about half of the non-feasible species were excluded due to these biological criteria. These characteristics are also prone to have a systematic and phylogenetic bias; for example, meiofaunal species are not evenly distributed across the animal tree of life \u003csup\u003e\u003cspan additionalcitationids=\"CR26\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e, genome size is not across the eukaryotic tree of life\u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e or protists pose challenges to genomics in several aspects\u003csup\u003e\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e\u003c/sup\u003e. Accordingly, the four phyla excluded due to the feasibility check had either too large genomes (Myzozoa), too small body size (Tardigrada), or were too difficult to sample (Chlorophyta, Rhodophyta). However, in the BGE data, we did not see a clear phylum-level pattern, but sample size and nominations limited what one can conclude. Final coverage largely tracked what was nominated per phylum. For example, only a few species of phyla with small body sizes, such as Tardigrada, Rotifera, or Nematoda, were nominated or none at all, like for Gnathostomulida, Gastrotricha, or several unicellular eukaryotic phyla. Hence, an implicit filtering could have already occurred at the nomination level, as we clearly and transparently communicated the feasibility criteria.\u003c/p\u003e\u003cp\u003eNonetheless, enforcing upper or lower limits to these biological criteria may be valid for practical reasons, but it is also obviously in conflict with EBP\u0026rsquo;s goal to sequence every species on Earth\u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. It is therefore of pivotal importance that methods be developed or improved to reduce potentially systematic bias\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. The accumulation of expertise in processing diverse samples in the lab has led to the development of improved ultra-low extraction protocols that can better handle smaller organisms with larger genomes\u003csup\u003e\u003cspan additionalcitationids=\"CR31\" citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. Also, the development of specific protocols for different organism groups is helpful in this matter\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e,\u003cspan additionalcitationids=\"CR34\" citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e\u003cp\u003eThe ever-decreasing costs of genome sequencing help to gradually lower the threshold for what is considered an unfeasibly large genome by genome sequencing consortia. Indirectly, the large genomes would have exerted a strong impact on the selection of species at stage 2 and hence on the species included. Due to the large genomes and the limited funding for total genome size, the inclusion of large genomes would have substantially reduced the number of species compared to the actual selection. This reduction in numbers would have a strong impact on the country's representation. All of the six countries that would not have been considered at stage 2 would only have been included if the sequencing capacity had been three times higher. This effect is also highlighted by the fact that five of them were re-included in the actual selection, which included the exclusion of these large genomes as non-feasible, among others, due to funding limitations.\u003c/p\u003e\u003cp\u003eOur results also identify the ability to snap-freeze and maintain a cold chain for the collected sample as a seriously limiting factor to the feasibility of samples for genome sequencing, as almost one-third of the species were considered not feasible due to these factors. The proportion of non-feasible species from EU widening countries among the non-feasible species is almost identical to the proportion of all species included in stage 2. In general, for country representation, the original number of species nominated and included in stage 2 had a much stronger impact. Countries, which were on the borderline of being included or not, had usually nominated less than 1% of all species. The exceptions were Sweden, Hungary, and Romania, with 2.5%, 2.5% and 1.8% nominated species. The former two were re-included at the feasibility check, while Romania dropped out. On the other hand, all countries benefiting from the feasibility check by an increased number of selected species had nominated 5\u0026ndash;18% of all species. Perhaps unsurprisingly, as with taxonomic representation, the number of nominated species exerted a strong impact on the final country representation.\u003c/p\u003e\u003cp\u003eNonetheless, the problem of maintaining a cold chain can disproportionately affect the selection of species occurring in remote locations, such as oceanic islands like the Azores, which also frequently harbor a high proportion of endemic biota, or the deep sea. An obvious solution is access to dry-shippers, i.e., shipping containers whose cooling elements work by absorbing liquid nitrogen with no risk of spillage, which allow them to be brought on regular public transport, including planes. Additionally, this can be avoided through careful logistic planning, e.g., by \u0026ldquo;bioblitz\u0026rdquo; excursions with intensive sampling of taxa within a remote location. Finally, the development of new laboratory methods and chemicals to preserve DNA at higher temperatures might counteract this type of bias.\u003c/p\u003e\u003cp\u003eApplicability of our criteria to other projects\u003c/p\u003e\u003cp\u003eAdopting similar exclusion criteria as herein might be sensible for other genome projects as well. For example, to accomplish the goals of EBP, redundancy across different projects should be avoided as much as possible, allowing a more effective use of resources. In the same vein, avoidance of redundancy is also enhanced if the species is identified accurately, and proof of the identification can be provided by the presence of a voucher for the identification. This will ensure high certainty that the genomes belong to the species it is supposed to represent. This will also allow better guidance of future sequencing efforts across the tree of life.\u003c/p\u003e\u003cp\u003eConcerning the ranking criteria, taking the distribution across the tree of life into account will also be beneficial for other genome projects in accomplishing the first phase of the EBP, generating reference genomes for each eukaryotic family, as well as for later phases, generating genomes for all species. Such a criterion will more quickly result in a better representation of genomic resources across the tree of life. Accordingly, knowledge gaps in our understanding of the evolution and ecology of biodiversity on Earth, and thus the lack of knowledge on how to protect and preserve it, can be closed.\u003c/p\u003e\u003cp\u003eSimilarly, knowledge transfer and capacity building from economically and scientifically stronger countries to less strong countries (e.g., from EU non-widening countries to EU widening ones) can also enable a broader participation in biodiversity genomics and generate new initiatives and funding opportunities to generate genomic resources for different questions in biodiversity research. A diverse input of knowledge and viewpoints into biodiversity genomics is not only a value in itself but shows ways of a new understanding of biodiversity and new research avenues. This can lead to out-of-the-box hypotheses, a shifting focus and view on biodiversity and nature, as well as pointing to new frontiers in research. Following this, consideration of applicability and anchorage in the communities (e.g., local, research, taxonomic) will also facilitate such new research and enhance existing research as more are interested in using the public goods generated by such publicly funded genome projects.\u003c/p\u003e\u003cp\u003eFeasibility checks are important for any larger genome project to avoid the unnecessary waste of limited resources. However, more importantly, feasibility checks as conducted herein can also reveal the obstacles still in the way of obtaining reference genomes for certain species. Analysing the rejected species can show if they are rejected due to limitations in our sequencing technology, such as genome or sample size, or in the necessary sampling procedure, such as snap-freezing them and keeping a strict cold chain at all times. This can guide R\u0026amp;D efforts for such species.\u003c/p\u003e\u003cp\u003eCommunity engagement in biodiversity genomics.\u003c/p\u003e\u003cp\u003eAnother strength of the ADM process and its implementation, as presented herein and part of the BGE project, was that it was explicitly designed for performing each of the key steps using a bottom-up approach. The general community nominated all of the species, as well as decided on the ranking criteria and ranking mechanism applied to the species. This effort ensured that this ADM process is less subjective, based upon the needs of the ERGA community at large and hence beyond the BGE project, and that the communication about the different criteria and how they are applied was transparent from the very beginning. It can be seen that the BGE efforts of generating reference genomes are widely recognized among individual researchers across Europe, as there were almost as many individual researchers as possible species (i.e., 92 for 99). Moreover, the ability to test the outcomes \u003cem\u003ea priori\u003c/em\u003e allowed adjustment to the process early on, minimizing the risk of unintentional outcomes in contradiction to the desired main goals.\u003c/p\u003e\u003cp\u003eDespite the challenges with feasibility criteria and funding limitations, the ADM process also ensured a relatively high taxonomic and country representation, with 62.5% and 68.8% of the nominated phyla and countries being considered. However, an unevenness in both remained due to strong biases in the nominated species with respect to the number of species nominated for a phylum or a country. Hence, while the ADM process was able to level the playing field to some degree, it still could not resolve the initial bias in preferences. This could also be considered a strength of the approach, as the nomination process also shows where the majority of the community sees a need for reference genomes. Alternatively, independent of the limitations imposed by the feasibility criteria, it could also indicate that more efforts are needed to engage the scientific community from the underrepresented countries and phyla.\u003c/p\u003e\u003cp\u003eFinally, even though not the main goal in this ADM process, one aim was also to improve JEDI aspects in the distribution of the BGE resources across Europe. This is in line with the broader goal of the ERGA community to consider such aspects\u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. However, due to the low weight assigned to this criterion in the decision model and the absence of additional steps addressing it, no enrichment in these factors was achieved. In fact, in some cases, like for gender balance, the opposite occurred. The reason for this is that the JEDI category became only relevant for a very few species; those on the borderline of being included (if at all), and only when the species were identical in the previous two levels of the model. Hence, its overall impact is limited, and accordingly, any intrinsic bias present in the species scoring highest in the first two levels is transgressed through the process. The reasons for this can be multiple. For example, concerning gender balance, male researchers are slightly less represented among the non-feasible species, with 48% instead of 49%, slightly favoring them for the final list of selected species. Similarly, male researchers were present from 26 countries and female researchers from 22 countries, considering all species at stage 2. Given the goal to increase country representation, among the selected species, male researchers are present from 18 countries and female researchers from 13 countries. Hence, given the low sample size of 99 species, these small intrinsic biases with gender representation among countries and feasible species in the pool of nominated species can explain the slight relative increase in male researchers. Again, a balanced representation already in the nomination pool seems more crucial here than tweaking the ADM process itself. This can, for example, be accomplished in the future by targeted outreach in underrepresented groups and countries, encouraging their participation and lowering barriers to nominating species with an even better information flow.\u003c/p\u003e\u003cp\u003eFuture directions of ADMs in biodiversity genomics.\u003c/p\u003e\u003cp\u003eHerein, we applied a fixed ranking list with only two rounds of nomination. However, for future (and ongoing) biodiversity projects, it is possible to envision a dynamic list of candidate species. The ranking could run on a rolling basis as new species come in. Species will get added to the initial candidate list, and the species ranking is repeated each time. A process with repeated (or continuous) species additions will be more similar to triage, i.e., a prioritisation process \u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. Specifically, species that have started being processed will not be stopped, but newly added species may be prioritised over species that were already on the list. This could be the situation if, for instance, the project aims at acquiring species by means of subsequent bioblitzes, i.e., intensive sampling events where numerous species are collected at once and repeatedly throughout the project. This would ensure a more effective usage of the capacity in line with the set goals. For example, important gaps in taxonomic representation could be filled much faster by such an approach, as species filling such gaps would instantaneously be prioritized and processed.\u003c/p\u003e\u003cp\u003eIn principle, such an approach could already be conducted with the ADM process presented here. New species are added to the initial list, and species already being processed are removed from the list. Additionally, scores such as taxonomic representation are adjusted for all species in light of new information. For example, taxonomic representation of a species could be set to a lower taxonomic level if a species of the same higher taxonomic level is already being processed within the project or by other projects. The ADM itself takes only a few minutes, and the maintenance of the list is probably more time-demanding than the ADM.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eWith the BGE project, we show that involving the community in the nomination and selection of species is an efficient and equitable way to maximise and ensure diversity in sampling species for biodiversity studies. The lack of reduction in diversity in the two key ranking criteria highlighted here, i.e., taxonomic representation and country representation, demonstrates that the automated species ranking applied by BGE respected the diversity of needs from the broader ERGA community. These findings reflect the strength of our approach and its capacity to balance different interests, which might be beneficial for other consortia to consider when designing similar selection processes, even if their specific goals require different ranking criteria.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor contributions\u003c/h2\u003e\n\u003cp\u003eTHS, AB \u0026amp; RAO conceived the study. All authors contributed to the development of the species selection process including the ADM and its criteria and weighing schemes. THS developed the R script of the ADM and conducted the analyses of the simulated and empirical data. TM, CdG and RM compiled all empirical data and checked its quality. THS and TM were major contributors in writing a first draft of the manuscript, on which the other authors contributed and commented. All authors read and approved the final manuscript.\u003c/p\u003e\n\u003ch2\u003eFunding Declaration\u003c/h2\u003e\n\u003cp\u003eBiodiversity Genomics Europe (Grant no.101059492) is funded by Horizon Europe under the Biodiversity, Circular Economy and Environment call (REA.B.3); co-funded by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract numbers 22.00173 and 24.00054; and by the UK Research and Innovation (UKRI) under the Department for Business, Energy and Industrial Strategy\u0026rsquo;s Horizon Europe Guarantee Scheme\u0026nbsp;This study was also funded by the Research Council of Norway (Grant no. 300587 to T.H.S.). The funders played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eAcknowledgements\u003c/h2\u003e\n\u003cp\u003eFor their valuable discussions concerning the selection process and the individual criteria, we would like to acknowledge the contributions by Ana Riesgo Gil (Museo Nacional de Ciencias Naturales), Bernhard Hausdorf (Leibniz Institute for the Analysis of Biodiversity Change) and Alice Minotto (Earkham Institute).\u003c/p\u003e\n\u003ch2\u003eCompeting interests\u003c/h2\u003e\n\u003cp\u003eAll authors declare no financial or non-financial competing interests.\u003c/p\u003e\n\u003ch2\u003eData and code availability\u003c/h2\u003e\n\u003cp\u003eThe R script for the ADM process, as well as datasets to test the script, are available at: https://github.com/torstenstruck/BGE_species_priorization.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eRhie, A. \u003cem\u003eet al.\u003c/em\u003e Towards complete and error-free genome assemblies of all vertebrate species. \u003cem\u003eNature\u003c/em\u003e 592, 737\u0026ndash;746, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41586-021-03451-0\u003c/span\u003e\u003cspan address=\"10.1038/s41586-021-03451-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eConsortium, T. D. T. o. L. P. \u003cem\u003eet al.\u003c/em\u003e Sequence locally, think globally: The Darwin Tree of Life Project. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e 119, e2115642118, doi:doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1073/pnas.2115642118\u003c/span\u003e\u003cspan address=\"10.1073/pnas.2115642118\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMarcussen, T. \u0026amp; Jakobsen, K. S. En nasjonal dugnad for \u0026aring; kartlegge genomene til alle artene i Norge - EBP-Nor. \u003cem\u003eBiolog\u003c/em\u003e 1, 6\u0026ndash;12 (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMazzoni, C. J., Ciofi, C. \u0026amp; Waterhouse, R. M. Biodiversity: an atlas of European reference genomes. \u003cem\u003eNature\u003c/em\u003e 619, 252, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/d41586-023-02229-w\u003c/span\u003e\u003cspan address=\"10.1038/d41586-023-02229-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFormenti, G. \u003cem\u003eet al.\u003c/em\u003e The era of reference genomes in conservation genomics. \u003cem\u003eTrends Ecol. Evol.\u003c/em\u003e 37, 197\u0026ndash;202, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.tree.2021.11.008\u003c/span\u003e\u003cspan address=\"10.1016/j.tree.2021.11.008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMc Cartney, A. M. \u003cem\u003eet al.\u003c/em\u003e The European Reference Genome Atlas: piloting a decentralised approach to equitable biodiversity genomics. \u003cem\u003enpj Biodiversity\u003c/em\u003e 3, 28, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s44185-024-00054-6\u003c/span\u003e\u003cspan address=\"10.1038/s44185-024-00054-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eB\u0026ouml;hne, A. \u003cem\u003eet al.\u003c/em\u003e Contextualising samples: supporting reference genomes of European biodiversity through sample and associated metadata collection. \u003cem\u003enpj Biodiversity\u003c/em\u003e 3, 26, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s44185-024-00053-7\u003c/span\u003e\u003cspan address=\"10.1038/s44185-024-00053-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHoward, C. \u003cem\u003eet al.\u003c/em\u003e On the path to reference genomes for all biodiversity: laboratory protocols and lessons learned from processing over 2,000 species in the Sanger Tree of Life. \u003cem\u003eGigaScience\u003c/em\u003e 14, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/gigascience/giaf119\u003c/span\u003e\u003cspan address=\"10.1093/gigascience/giaf119\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAmano, T. \u003cem\u003eet al.\u003c/em\u003e The manifold costs of being a non-native English speaker in science. \u003cem\u003ePLoS Biol.\u003c/em\u003e 21, e3002184, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pbio.3002184\u003c/span\u003e\u003cspan address=\"10.1371/journal.pbio.3002184\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eR\u0026eacute;gner, I., Thinus-Blanc, C., Netter, A., Schmader, T. \u0026amp; Huguet, P. Committees with implicit biases promote fewer women when they do not believe gender bias exists. \u003cem\u003eNature Human Behaviour\u003c/em\u003e 3, 1171\u0026ndash;1179, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s41562-019-0686-3\u003c/span\u003e\u003cspan address=\"10.1038/s41562-019-0686-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2019).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHochkirch, A. \u003cem\u003eet al.\u003c/em\u003e A strategy for the next decade to address data deficiency in neglected biodiversity. \u003cem\u003eConserv Biol\u003c/em\u003e 35, 502\u0026ndash;509, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/cobi.13589\u003c/span\u003e\u003cspan address=\"10.1111/cobi.13589\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLewin, H. A. \u003cem\u003eet al.\u003c/em\u003e Earth BioGenome Project: Sequencing life for the future of life. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e 115, 4325\u0026ndash;4333, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1073/pnas.1720115115\u003c/span\u003e\u003cspan address=\"10.1073/pnas.1720115115\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eM\u0026ouml;kander, J., Morley, J., Taddeo, M. \u0026amp; Floridi, L. Ethics-Based Auditing of Automated Decision-Making Systems: Nature, Scope, and Limitations. \u003cem\u003eScience and Engineering Ethics\u003c/em\u003e 27, 44, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s11948-021-00319-4\u003c/span\u003e\u003cspan address=\"10.1007/s11948-021-00319-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRizk, A. \u0026amp; Lindgren, I. 237\u0026ndash;253 (Springer Nature Switzerland).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBazyar, J., Farrokhi, M., Salari, A. \u0026amp; Khankeh, H. R. The Principles of Triage in Emergencies and Disasters: A Systematic Review. \u003cem\u003ePrehospital and Disaster Medicine\u003c/em\u003e 35, 305\u0026ndash;313, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1017/S1049023X20000291\u003c/span\u003e\u003cspan address=\"10.1017/S1049023X20000291\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMohini, T. Demystifying search indexing and ranking. \u003cem\u003eInternational Journal of Research in Computer Applications and Information Technology (IJRCAIT)\u003c/em\u003e 7, 166\u0026ndash;174 (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZahir, E. \u0026amp; Henda, J. Comparative Analysis of Page Ranking Algorithms for Efficient Information Retrieval. \u003cem\u003eAmerican Journal of Information Science and Technology\u003c/em\u003e 9, 15\u0026ndash;23, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.11648/j.ajist.20250901.12\u003c/span\u003e\u003cspan address=\"10.11648/j.ajist.20250901.12\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLeonard, J. A. \u003cem\u003eet al.\u003c/em\u003e Sample Manifest Standard Operating Procedure - Version: 2.5.1. 34 (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEuropean Commission: Directorate-General for, R. \u0026amp; Innovation. \u003cem\u003eHorizon Europe \u0026ndash; Widening participation and spreading excellence across Europe \u0026ndash; Boosting research and innovation performance throughout the Union\u003c/em\u003e. (Publications Office of the European Union, 2021).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuber, J. T. The importance of voucher specimens, with practical guidelines for preserving specimens of the major invertebrate phyla for identification. \u003cem\u003eJ. Nat. Hist.\u003c/em\u003e 32, 367\u0026ndash;385, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1080/00222939800770191\u003c/span\u003e\u003cspan address=\"10.1080/00222939800770191\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (1998).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eThomson, S. A. \u003cem\u003eet al.\u003c/em\u003e Taxonomy based on science is necessary for global conservation. \u003cem\u003ePLoS Biol.\u003c/em\u003e 16, e2005075, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pbio.2005075\u003c/span\u003e\u003cspan address=\"10.1371/journal.pbio.2005075\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGodfray, H. C. J., Knapp, S. \u0026amp; Wilson, E. O. Taxonomy as a fundamental discipline. \u003cem\u003ePhilos. Trans. R. Soc. Lond. B. Biol. Sci.\u003c/em\u003e 359, 739\u0026ndash;739, doi:doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1098/rstb.2003.1440\u003c/span\u003e\u003cspan address=\"10.1098/rstb.2003.1440\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2004).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFavret, C. The 5 \u0026lsquo;D\u0026rsquo;s of Taxonomy: A User\u0026rsquo;s Guide. \u003cem\u003eThe Quarterly Review of Biology\u003c/em\u003e 99, 131\u0026ndash;156, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1086/732044\u003c/span\u003e\u003cspan address=\"10.1086/732044\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRivera, D. \u003cem\u003eet al.\u003c/em\u003e What is in a name? The need for accurate scientific nomenclature for plants. \u003cem\u003eJournal of Ethnopharmacology\u003c/em\u003e 152, 393\u0026ndash;402, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jep.2013.12.022\u003c/span\u003e\u003cspan address=\"10.1016/j.jep.2013.12.022\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2014).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCerca, J., Purschke, G. \u0026amp; Struck, T. H. Marine connectivity dynamics: clarifying cosmopolitan distributions of marine interstitial invertebrates and the meiofauna paradox. \u003cem\u003eMar. Biol.\u003c/em\u003e 165, 123, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s00227-018-3383-2\u003c/span\u003e\u003cspan address=\"10.1007/s00227-018-3383-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2018).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSchmidt-Rhaesa, A. \u003cem\u003eGuide to the Identification of Marine Meiofauna\u003c/em\u003e. (M\u0026uuml;nchen: Dr. Friedrich Pfeil, 2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMart\u0026iacute;nez, A. \u003cem\u003eet al.\u003c/em\u003e Fundamental questions in meiofauna research highlight how small but ubiquitous animals can improve our understanding of Nature. \u003cem\u003eCommunications Biology\u003c/em\u003e 8, 449, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s42003-025-07888-1\u003c/span\u003e\u003cspan address=\"10.1038/s42003-025-07888-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOliver, M. J., Petrov, D., Ackerly, D., Falkowski, P. \u0026amp; Schofield, O. M. The mode and tempo of genome size evolution in eukaryotes. \u003cem\u003eGenome Res.\u003c/em\u003e 17, 594\u0026ndash;601, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1101/gr.6096207\u003c/span\u003e\u003cspan address=\"10.1101/gr.6096207\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2007).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFiguerola, B. \u003cem\u003eet al.\u003c/em\u003e Interactive effects of ocean acidification and warming disrupt calcification and microbiome composition in bryozoans. \u003cem\u003eCommunications Biology\u003c/em\u003e 8, 1135, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/s42003-025-08524-8\u003c/span\u003e\u003cspan address=\"10.1038/s42003-025-08524-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRoberts, N. G., Gilmore, M. J., Struck, T. H. \u0026amp; Kocot, K. M. Multiple Displacement Amplification Facilitates SMRT Sequencing of Microscopic Animals and the Genome of the Gastrotrich Lepidodermella squamata (Dujardin 1841). \u003cem\u003eGenome Biol. Evol.\u003c/em\u003e 16, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/gbe/evae254\u003c/span\u003e\u003cspan address=\"10.1093/gbe/evae254\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2024).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLaumer, C. \u003cem\u003ePicogram input multimodal sequencing (PiMmS)\u003c/em\u003e, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e%3Chttps://dx.doi.org/10.17504/protocols.io.rm7vzywy5lx1/v1%3E\u003c/span\u003e\u003cspan address=\"http://%3Chttps://dx.doi.org/10.17504/protocols.io.rm7vzywy5lx1/v1%3E\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBein, B. \u003cem\u003eet al.\u003c/em\u003e Long-read sequencing and genome assembly of natural history collection samples and challenging specimens. \u003cem\u003eGenome Biology\u003c/em\u003e 26, 25, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13059-025-03487-9\u003c/span\u003e\u003cspan address=\"10.1186/s13059-025-03487-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2025).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNishii, K. \u003cem\u003eet al.\u003c/em\u003e A high quality, high molecular weight DNA extraction method for PacBio HiFi genome sequencing of recalcitrant plants. \u003cem\u003ePlant Methods\u003c/em\u003e 19, 41, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13007-023-01009-x\u003c/span\u003e\u003cspan address=\"10.1186/s13007-023-01009-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2023).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAngthong, P. \u003cem\u003eet al.\u003c/em\u003e Optimization of high molecular weight DNA extraction methods in shrimp for a long-read sequencing platform. \u003cem\u003ePeerJ\u003c/em\u003e 8, e10340, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.7717/peerj.10340\u003c/span\u003e\u003cspan address=\"10.7717/peerj.10340\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2020).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePetersen, C. \u003cem\u003eet al.\u003c/em\u003e High molecular weight DNA extraction methods lead to high quality filamentous ascomycete fungal genome assemblies using Oxford Nanopore sequencing. \u003cem\u003eMicrobial Genomics\u003c/em\u003e 8, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1099/mgen.0.000816\u003c/span\u003e\u003cspan address=\"10.1099/mgen.0.000816\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eStruck, T. H., Hessling, R. \u0026amp; Purschke, G. The phylogenetic position of the Aeolosomatidae and Parergodrilidae, two enigmatic oligochaete-like taxa of the \u0026lsquo;Polychaeta\u0026rsquo;, based on molecular data from 18SrDNA sequences. \u003cem\u003eJ. Zool. Syst. Evol. Res.\u003c/em\u003e 40, 155\u0026ndash;163 (2002).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCowie, R. H., Bouchet, P. \u0026amp; Fontaine, B. The Sixth Mass Extinction: fact, fiction or speculation? \u003cem\u003eBiol. Rev.\u003c/em\u003e 97, 640\u0026ndash;663, doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/brv.12816\u003c/span\u003e\u003cspan address=\"10.1111/brv.12816\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLewin, H. A. \u003cem\u003eet al.\u003c/em\u003e The Earth BioGenome Project 2020: Starting the clock. \u003cem\u003eProc. Natl. Acad. Sci.\u003c/em\u003e 119, e2115635118, doi:doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1073/pnas.2115635118\u003c/span\u003e\u003cspan address=\"10.1073/pnas.2115635118\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (2022).\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTables 1 and 2 are available in the Supplementary Files section.\u003c/p\u003e"},{"header":"Box 1","content":"\u003cp\u003eBox 1 is available in the Supplementary Files section\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"npj-biodiversity","isNatureJournal":false,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"npjbiodivers","sideBox":"Learn more about [npj Biodiversity](https://www.nature.com/npjbiodivers/)","snPcode":"44185","submissionUrl":"https://mts-npjbiodivers.nature.com/cgi-bin/main.plex","title":"npj Biodiversity","twitterHandle":"@npjbiodiversity","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"npj","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7957242/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7957242/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eIn large-scale biodiversity genomics projects, the number of species that could be sequenced exceeds the resources available. Species selection is therefore a crucial component, requiring clear criteria and procedures. In a bottom-up approach, the Biodiversity Genomics Europe project implemented an Automated Decision-Making (ADM) process for species selection based on objective criteria and tested it on simulated and empirical data. Here, we present our species ranking ADM process. It includes three stages: exclusion, ranking, and feasibility-check. The composition of selected species retained the diversity of the community-nominated species pool for key taxonomic, geographic, and demographic assessment criteria while reducing bias. Feasibility and funding limits influenced the final selection more than other factors, indicating that investments in these areas would improve available reference genome diversity. The ADM achieved species selection for genome sequencing in a large-scale biodiversity project in a relatively objective manner consistent with the broader European biodiversity genomic community\u0026rsquo;s priorities.\u003c/p\u003e","manuscriptTitle":"An automated decision-making procedure for ranking and selecting species in biodiversity projects","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-24 07:39:56","doi":"10.21203/rs.3.rs-7957242/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-02-11T10:10:03+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-14T15:16:47+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"318215475140755357684440765554054218517","date":"2025-11-14T12:34:07+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-11-12T11:49:49+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-11-03T15:12:09+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-11-01T05:40:17+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj Biodiversity","date":"2025-10-26T08:39:16+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"npj-biodiversity","isNatureJournal":false,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"npjbiodivers","sideBox":"Learn more about [npj Biodiversity](https://www.nature.com/npjbiodivers/)","snPcode":"44185","submissionUrl":"https://mts-npjbiodivers.nature.com/cgi-bin/main.plex","title":"npj Biodiversity","twitterHandle":"@npjbiodiversity","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"npj","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c9d0a20a-e1f3-470c-b78d-34e29d96410d","owner":[],"postedDate":"November 24th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":58432340,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":58432341,"name":"Biological sciences/Ecology"},{"id":58432342,"name":"Earth and environmental sciences/Ecology"},{"id":58432343,"name":"Biological sciences/Genetics"}],"tags":[],"updatedAt":"2026-05-22T09:10:20+00:00","versionOfRecord":[],"versionCreatedAt":"2025-11-24 07:39:56","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7957242","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7957242","identity":"rs-7957242","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.