Introduction
Currently, there are immense efforts on the way to digitize natural history collections on
a large scale, including the associated information and metadata (e.g., Smith &
Blagoderov 2012; Hardisty et al. 2020; Belot et al. 2023; Groom et al. 2023; Ong et al.
2023). In these endeavors, among other things the automatic capture of label data plays
a central role (e.g., Beaman et al. 2006; Heidorn & Wei 2008; Lafferty and Landrum
2009; Granzow-de la Cerda and Beach 2010; Haston et al. 2012; Agarwal et al. 2018;
Alzuru et al. 2019, 2020; Alzuru 2020; Owen et al. 2020; Belot et al. 2023; Takano et al.
2024; Zhang 2023). However, many of these very promising activities have been for
long exclusive to large companies, museums or institutions with specialized technical
infrastructure and special trained staff (e.g., Blagoderov et al. 2012) for the highly
customized implementations used (e.g., https://picturae.com/).
Most of the current digitization initiatives aim at a one-go retro-digitization of large
collections (Engledow et al. 2018; Hardisty et al. 2020; Helminger et al. 2020; De Smedt
et al. 2024). However, this approach comes with limitations: 1) Collections are
continuously growing and developing (see also Balke et al. 2013); 2) the scientific
community produces a large amount of high-quality biodiversity data independently of
the collection institutions with their ongoing research on the specimens, in which
amateur scientists are also largely involved (Löbl et al. 2023). The latter is connected
with the often-remote study of the collection material, off the collections and large
digitization pipelines. Especially in insects, taxonomic specialists are rare, and
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
3
specimens are often loaned by shipment overseas to obtain best IDs from world leading
specialists. In this, working processes are quite different from those of vertebrates or
plants having often the lead in new methodologies, such as large-scale digitization.
However, these data often do not yet end up in big data repositories, also due to the
lack of time and stimulus as well as the work-overload of the taxonomists.
Therefore, more flexible solutions are needed which allow a more efficient data
processing and that allow to speed up biodiversity/ species discovery and help to
overcome taxonomic impediment. This would be perfectly in line with the idea of
integrating specimen databases and revisionary systematics (Schuh 2012). Advantages
of a revision-based digitization (see also Meier and Dikow 2004) in contrast to a retro-
digitization, i.e. that biodiversity data come from taxonomic revisionary studies–rather
than from uncritical digitizing of museum specimen data, are the following (extended,
based on Meier and Dikow (2004) as well as Schuh (2012)): 1) the data are provided in
association with the most accurate identifications, 2) the data have the most complete
taxonomic and geographic coverage, 3) and the data satisfy these points in a cost-
effective way, 4) data for occurrences and images are citable and acknowledgeable
(therefore, errors can be retracted and be corrected).
Recently we came across, that mobile devices used to be in the hand of almost every
person may assist in this aim to speed up data collection and digitization including
biodiversity discovery. By simple playful experimenting, we discovered, how useful
mobile phones can be in association with cloud-like environments (such as Google or
Apple). Since we think that these “workflows” can be really useful for a large audience,
we prepared this short paper to disseminate the(se) simple tutorial(s) for how to read
out specimen label data in a rapid and easy way with a smartphone.
Most digitization approaches envision the capture of digital metadata (e.g., labels) with
the intermediate step of digital images (Nelson et al. 2012). This comes with other
difficulties and quite considerable costs for image processing and storage (Tann and
Flemons 2008; Hardisty et al. 2020b). In an optimized balance of a cost-benefit ratio, it
would be therefore more sustainable to skip this step if data can be read out and being
spell-check in the same moment without the burden of images. The latter are
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
4
scientifically and practically quite unnecessary (in terms of cost-benefit balance) for non-
type specimens.
Results
In Table 1 we summarized some major characteristics of the data capture with these
methods, showing directly pasted content and the necessary amount of real-time spell
corrections for the data. While for printed labels the need for subsequent spelling
corrections was minimal, handwritten labels needed often more corrections, depending
on the size and style of handwriting. In these cases, scanning the labels separately from
the pin without distortion helped quite much (Fig. 1A, B). In printed labels, direction (Fig
2C) and distortion of labels did not matter much (Fig 2D). We were able to scan up to
three labels (from the distorted side view) still mounted at a pin and without flipping out
the labels or even to remove them (Fig 2D).
Since low image resolution was not a problem, we could zoom-in digitally with the
mobile phone into the labels until these were almost format filling. However, the initial
testing was successfully done also with much smaller images (Fig. 1A-D).
The average processing time per specimen was very fast, the estimated time for full
data capture (including spell correction) was 3-10 seconds per specimen. Processing
time was often a little longer for badly handwritten labels, or when an insect pin or other
labels covered parts of the label text, or when the overall internet connection was slow.
The total time gain per label was larger with labels containing much information or with
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
9
multiple labels. For example, in the ones shown in Figure 2B, typing the data by hand
into the computer requested 60 seconds (including spell check), the label scan with the
approach 1 took 8 seconds (including manual spell check). For the data of Figure 2C,
manual typing and spell-check required 121 seconds, while the label scan with the
approach 1 took 10 seconds (including manual spell check). We did refrain from larger
experiments on measuring comparatively the time, since duration of typing data
depends much on the typing skills of the person. The comparative numbers given here,
refer to a typing-untrained scientist (performed by D.A.).
In some instances, in approach 1, we had to use the deviation via a Google document*
due to bad internet connection, when the copy process failed due to slow data transfer.
This was then usually two “clicks” (or seconds) slower, but not really a big delay
compared to the amount of time required for manual typing.
The iPhone workflow test with was done with a larger label (Fig. 3B). In the workflow 2a)
above, using the Notes app on the Apple iPhone, the image recognition tried to identify
and focus blocks of text within the label, but did not to capture the label as a whole. To
capture multiple bits of information the process had to be reiterated accordingly. Once
the data has been collected in Notes, further copy-paste editing is necessary to transfer
the data to a database. Workflow 2b), using Shortcuts app automation (Fig. 4) scans the
whole label and also stores the data with a timestamp directly into a spreadsheet app.
Furthermore, the photos are stored in the user's Apple iCloud account (as backups for
potential later reference), but this step is optional in the algorithm. The result of the
scanning is shown in Fig. 3B, C. Note that incomplete text in the original caused
interpretation problems (truncated third line and partially hidden bottom part of "image
0355"). In addition, the algorithm wants to place each recognized line of text in a
separate cell. If several lines belong to the same block of information, editing of the cells
was necessary. The scan of the label and filling of the cells in the spreadsheet took less
than 10 s. The algorithm analyzed the Label as lines of text and allocated one cell per
line in the spreadsheet. This means that the locality information in our example was split
up into two cells in our test. Depending on which further tasks the user wants to
accomplish copy-paste processing of such splits will be necessary.
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
10
The approach using a Bluetooth connection between the mobile phone and the
computer appeared to be slightly longer (by the amount of “device clicks”), however, yet
saved incredible amount of time for scanning the label data. Given the widely
experienced situations than many collection magazines are partly or entirely offline, or
remotely working taxonomists might have difficulties to have a good internet connection.
Bulk approaches are available under the Google and Apple environment (see Fig. 4)
with the Google Keep and Notes applications, respectively. In both, images are
temporarily stored in the mobile devices, which can be subsequently either being saved
of discarded. While they safe time with the data transfer, they have the disadvantage
that potentially incomplete scans are only discovered when the specimens are already
out of hand.
Discussion
While new technology including artificial intelligence is entering in our daily life, their use
and application in biodiversity research is yet rather limited, although there have been
developed approaches to using AI-powered label recognition (Johaadien 2023, Takano
et al. 2024, Waever et al. 2023). Similar smartphone tutorials have been already
provided for specimen photography (Riyaz & Ignacimuthu 2023), although maybe
already being widely in use without being formally addressed in the scientific literature.
Here we addressed the scanning of label information using a smartphone under
different operating systems. According to our knowledge, this has not been so far
explored and applied particularly with insect collection specimens. There are solutions
for large-scale mass digitization of collections (Belot et al. 2023; Blagoderov 2012;
Engledow et al. 2018; Tegelberg et al. 2014). All of these solutions require manual
separation of specimens and labels in order to photograph them separately. Initial trials
with robotic technology (e.g. Dupont & Price 2019) are promising but can only be used
by larger institutions with the appropriate budget.
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
11
With partly omitting the so far obligatory step of taking and permanently storing images
of the labels, this direct approach of data capture is more rapid and environmentally
more sustainable. In a part of our procedures, this happens nevertheless without delay
in the background and there is the option to retain the images or to discard them.
Especially, for a simple distribution data extraction in the framework of taxonomic
revisions or faunistic studies, there is scientifically no necessity to hold images of the
metadata labels of every specimen long term. Moreover, the spell-checking of the
scanned and extracted data can be done yet with the specimen at hand, with the data
finalized once and for all after the first processing.
However, depending on the individual needs and working conditions, the user has the
choice on the individual workflow. It is possible to scan 50 labels in a row (i.e., bulk
workflow) before transferring the data to the computer. Then in some critical cases,
having a backup photo is good for quality assessment and spell-check.
One other great advantage is, that these protocols use commercial devices which are
simple to handle, and which are for little costs to replace when they come into
(informatic) age which is also a matter of cybersecurity. Unfortunately, in biosystematics
we have been make often the experience that customized devices are overpriced, often
behind the technological advances (e.g. computer operational systems) requiring often
expensive updates and service.
Since biodiversity research is also done by a great portion of amateur scientists (and
even professionals always lack funding for their “descriptive research”), these people do
not have access to large or continuous funding.
Other consequences: The high reliability of text recognition and the rapid data transfer
make the use of (only-)machine readable barcode labels and QR codes in collection
management superfluous since connected data can be easily inferred from numerical
voucher numbers on labels.
Our solutions and tutorial proposed here are very well suited for the fast and secure
recording of small quantities of collection objects, e.g. when visiting a collection or when
selecting individual objects. We are aware that habits, skills and specific workflows
influence the way we integrate such devices and text recognition capabilities. We are
convinced that they will make a significant contribution and help to alleviate the
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
12
taxonomic impediment (e.g., de Carvalho et al. 2005, 2007; Engel et al. 2021), as the
workload for taxonomist recording the material they study in databases will be reduced
by at least tenfold.
Finally, it should be said, that there might be even more options and possibilities to scan
labels with mobile devices. These options might evolve as quickly as mobile phones
and artificial intelligence technology, in general. Nevertheless, we expect the potential
user to take this paper as an inspiration to continue exploring options on how to apply
this technology successfully in their established workflows.
References
Agarwal N, Ferrier N, Hereld M (2018) Towards automated transcription of label text
from pinned insect collections. 2018 IEEE Winter Conference on Applications of
Computer Vision (WACV), Lake Tahoe, NV, USA, pp. 189-198, doi:
10.1109/WACV.2018.00027.
Alzuru I (2020) Human-machine extraction of Information from biological collections.
PhD thesis, University Florida, 160pp.
Alzuru I, Malladi A, Matsunaga A, Tsugawa M, José FAB. (2019) Human-Machine
Information Extraction Simulator for Biological Collections. 2019 IEEE
International Conference on Big Data (Big Data), Los Angeles, CA, USA, pp.
4565-4572, doi: 10.1109/BigData47090.2019.9005601.
Alzuru I, Matsunaga A, Tsugawa M, Fortes JAB (2020) General Self-aware Information
Extraction from Labels of Biological Collections. 2020 IEEE International
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
13
Conference on Big Data (Big Data), Atlanta, GA, USA, pp. 3035-3044, doi:
10.1109/BigData50022.2020.9377737.
Balke M, Schmidt S, Hausmann A et al. (2013) Biodiversity into your hands - A call for a
virtual global natural history ‘metacollection’. Frontiers in Zoology 10: 55
https://doi.org/10.1186/1742-9994-10-55
Beaman RS, Cellinese N. Heidorn PB, Guo Y, Green AM, Thiers B (2006) HERBIS:
Integrating digital imaging and label data capture for herbaria [Abstract]. Botany
2006, California State University – Chico. 28 July–2 August 2006.
http://www.2006.botanyconference.org/engine/search/index.php?func=detail&aid
=402.
Belot M, Preuss L, Tuberosa J, Claessen M, Svezhentseva O, Schuster F, Bölling C,
Léger T (2023) High Throughput Information Extraction of Printed Specimen
Labels from Large-Scale Digitization of Entomological Collections using a Semi-
Automated Pipeline. Biodiversity Information Science and Standards 7: e112466.
https://doi.org/10.3897/biss.7.112466
Blagoderov V, Kitching I, Livermore L, Simonsen T, Smith VS (2012) No specimen left
behind: industrial scale digitization of natural history collections. ZooKeys 209:
133-146. https://doi.org/10.3897/zookeys.209.3178
de Carvalho MR, Bockmann FA, Amorim DS, de Vivo M, de Toledo-Piza M, Menezes
NA, de Figueiredo JL, McEachran JD (2005) Revisiting the taxonomic
impediment. Science 307: 353-353. DOI:10.1126/science.307.5708.353b
de Carvalho MR, Bockmann FA, Amorim DS et al. (2007) Taxonomic Impediment or
Impediment to Taxonomy? A Commentary on systematics and the
cybertaxonomic-automation paradigm. Evolutionary Biology 34: 140–143
https://doi.org/10.1007/s11692-007-9011-6
De Smedt S, Bogaerts A, De Meeter N, Dillen M, Engledow H, Van Wambeke P, Leliaert
F, Groom Q (2024) Ten lessons learned from the mass digitisation of a herbarium
collection. PhytoKeys 244: 23-37. https://doi.org/10.3897/phytokeys.244.120112
Dupont S, Price BW (2019) ALICE, MALICE and VILE: High throughput insect specimen
digitisation using angled imaging techniques. Biodiversity Information Science
and Standards 3: e37141. https://doi.org/10.3897/biss.3.37141
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
14
Engel MS, Ceríaco LMP, Daniel GM, et al. (2021) The taxonomic impediment: a
shortage of taxonomists, not the lack of technical approaches. Zoological Journal
of the Linnean Society 193(2): 381–387.
https://doi.org/10.1093/zoolinnean/zlab072
Engledow H, De Smedt S, Groom Q, Bogaerts A, Stoffelen P, Sosef M, Van Wambeke P
(2018) Managing a mass digitization project at Meise Botanic Garden: From start
to finish. Biodiversity Information Science and Standards 2: e25912.
https://doi.org/10.3897/biss.2.25912
Granzow-de la Cerda Í, Beach JH (2010) Semi-automated workflows for acquiring
specimen data from label images in herbarium collections. Taxon 59: 1830-1842.
https://doi.org/10.1002/tax.596014
Groom Q, Dillen M, Addink W, Ariño AHH, Bölling C, Bonnet P, Cecchi L, Ellwood ER,
Figueira R, Gagnier P-Y, Grace OM, Güntsch A, Hardy H, Huybrechts P, Hyam R,
Joly AAJ, Kommineni VK, Larridon I, Livermore L, Lopes RJ, Meeus S, Miller JA,
Milleville K, Panda R, Pignal M, Poelen J, Ristevski B, Robertson T, Rufino AC,
Santos J, Schermer M, Scott B, Seltmann KC, Teixeira H, Trekels M, Gaikwad J
(2023) Envisaging a global infrastructure to exploit the potential of digitised
collections. Biodiversity Data Journal 11: e109439.
https://doi.org/10.3897/BDJ.11.e109439
Haston E, Cubey RWN, Pullan M, Atkins H, Harris D (2012) Developing integrated
workflows for the digitisation of herbarium specimens using a modular and
scalable approach. ZooKeys 209: 93-102.
https://doi.org/10.3897/zookeys.209.3121
Hardisty A, Saarenmaa H, Casino A, Dillen M, Gödderz K, Groom Q, Hardy H, Koureas
D, Nieva de la Hidalga A, Paul DL, Runnel V, Vermeersch X, van Walsum M,
Willemse L (2020a) Conceptual design blueprint for the DiSSCo digitization
infrastructure - DELIVERABLE D8.1. Research Ideas and Outcomes 6: e54280.
https://doi.org/10.3897/rio.6.e54280
Hardisty A, Livermore L, Walton S, Woodburn M, Hardy H (2020b) Costbook of the
digitisation infrastructure of DiSSCo. Research Ideas and Outcomes 6: e58915.
https://doi.org/10.3897/rio.6.e58915
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
15
Heidorn PB, Wei Q (2008) Automatic metadata extraction from museum specimen
labels. Pp. 57–68 in: Greenberg, J. & Klas, W. (eds.), Metadata for semantic and
social applications: Proceedings of the International Conference on Dublin Core
and Metadata Applications, Berlin, 22–26 September 2008, DC 2008: Berlin,
Germany. Göttingen: Universitätsverlag Göttingen.
Helminger T, Weber O, Braun P (2020) Digitisation of the LUX herbarium collection of
the National Museum of Natural History Luxembourg. Bulletin de la Société des
naturalists luxembourgeois 122: 147-152.
Johaadien R, Torma M (2023) “Publish First”: A Rapid, GPT-4 based digitisation system
for small institutes with minimal resources. Biodiversity Information Science and
Standards 7: e112428. https://doi.org/10.3897/biss.7.112428
Lafferty D, Landrum LR (2009) SALIX, a semi-automatic label information extraction
system using OCR [Abstract]. Botany & Mycology 2009, Snowbird, Utah, 25–29
July 2009.
http://2009.botanyconference.org/engine/search/index.php?func=detail&aid=130
(accessed 21.X.2024).
Löbl I, Klausnitzer B, Hartmann M (2022) Das stille Aussterben von Arten und
Taxonomen – ein Appell an Wissenschaftspolitik und Legislative. Entomologische
Nachrichten und Berichte 66(3): 217-226.
Meier R & Dikow T (2004) Significance of specimen databases from taxonomic
revisions for estimating and mapping the global species diversity of invertebrates
and repatriating reliable specimen data. Conservation Biology 18: 478-488.
https://doi.org/10.1111/j.1523-1739.2004.00233.x
Nelson G, Paul D, Riccardi G, Mast A (2012) Five task clusters that enable efficient and
effective digitization of biological collections. ZooKeys 209: 19-45.
https://doi.org/10.3897/zookeys.209.3135
Ong S-Q, Mat Jalaluddin, NS, Yong KT, Ong SP, Lim KF, Azhar S (2023) Digitization of
natural history collections: A guideline and nationwide capacity building workshop
in Malaysia. Ecology and Evolution 13: e10212.
https://doi.org/10.1002/ece3.10212
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
16
Owen D, Groom Q, Hardisty A, Leegwater T, Livermore L, van Walsum M, Wijkamp N,
Spasić I (2020) Towards a scientific workflow featuring Natural Language
Processing for the digitisation of natural history collections. Research Ideas and
Outcomes 6: e58030. https://doi.org/10.3897/rio.6.e58030
Riyaz M, Ignacimuthu S (2023) Smart phone-macro lens setup (SPMLS): a low-cost
and portable photography device for amateur taxonomists, biodiversity
researchers, and citizen enthusiasts. Bulletin of the National Research Centre
47: 143 https://doi.org/10.1186/s42269-023-01120-y
Smith V, Blagoderov V (2012) Bringing collections out of the dark. ZooKeys 209: 1-6.
https://doi.org/10.3897/zookeys.209.3699
Schuh R (2012) Integrating specimen databases and revisionary systematics. ZooKeys
209: 255-267. https://doi.org/10.3897/zookeys.209.3288
Takano A, Cole TCH, Konagai H (2024) A novel automated label data extraction and
data base generation system from herbarium specimen images using OCR and
NER. Scientific Reports 14(1): 112. https://doi.org/10.1038/s41598-023-50179-0
Tann J, Flemons P (2008) Data capture of specimen labels using volunteers. Australian
Museum.
http://australianmuseum.net.au/Uploads/Documents/23183/Data%20Capture%2
0of%20specimen%20labels%20using%20volunteers%20-
%20Tann%20and%20Flemons%202008.pdf [accessed 21.X.2024]
Tegelberg R, Mononen T, Saarenmaa H (2014) High-Performance digitization of natural
history collections: Automated imaging lines for herbarium and insect specimens.
Taxon 63(6): 1307–1313. https://doi.org/10.12705/636.13
Weaver WN, Ruhfel BR, Lough KJ & Smith SA (2023) Herbarium specimen label
transcription reimagined with large language models: Capabilities, productivity,
and risks. American Journal of Botany, 110(12).
https://doi.org/10.1002/ajb2.16256
Zhang Y (2023) Use of artificial intelligence (AI) in historical records transcription:
Opportunities, challenges, and future directions. Master thesis, McGill University,
24pp.
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
17
Table 1: Summary of label configuration/ view (with reference to Figure 1 and 2) and
the obtained resulting text in the final database. Text corrected by real-time manual
corrections are indicated in Bold.
Label configuration/
view
Text as pasted from computer’s
clipboard
Verbatim finalized
data (after manual
correction)
Figure 1A (labels
scanned on pin,
distorted)
Belivr vista Peretra、インタ
Museum Frey
Tutzing
Ex Coll. Frey, Basel, Switzer
“Bolivia Buenavista
Pereira XI.48 /
Museum Frey
Tutzing/ Ex Coll. Frey,
Basel, Switzerland”
(CF).
Figure 1B (labels
scanned separately,
not distorted)
Bolivia Buengvista Pereira X198
Ex Coll. Frey, Basel,
Switzerland
Museum Frey Tutzing
“Bolivia Buenavista
Pereira XI.48 / Ex
Coll. Frey, Basel,
Switzerland/ Museum
Frey Tutzing” (CF).
Figure 1C (partly
handwritten labels
scanned on pin,
distorted)
North IRAQ, KURDISTAN Duhok,
Akre, Bjeel 2.V.2018,
leg.1.H.Mudhafar
Maladera
del. D. Ahrens 2023
“North IRAQ,
KURDISTAN Duhok,
Akre, Bjeel 2.V.2018,
leg.1.H.Mudhafar/
Maladera insanbilis
(Brsk.) det. D.
Ahrens 2023”
Figure 1D (partly
handwritten labels
scanned separately,
not distorted)
Maladus dusanabilis (Boy)
det. D. Ahrens 2023
Maladera insanabilis
(Brsk) det. D. Ahrens
2023
Figure 2C Tucuman:
Argentina. H.E.Box. Β.Μ.1930-238.
Est. Expt
Agric. No 2486
TUCUMAN 101/
AHRosenfeld Collector
“Tucuman: Argentina.
H.E.Box. Β.Μ.1930-
238./ Est. Expt. Agric.
No 2486/ TUCUMAN
XI-I 191/ A H
Rosenfeld Collector/
Astaena argentina
Moser/ Ex Coll. Frey,
Basel, Switzerland/
Museum Frey
Tutzing“
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
18
Astaena argentina Moser
Ex Coll. Frey, Basel, Switzerland
Museum Frey Tutzing
Figure 2D Argentiniel w.Wittmer
L. Cabral Coral
Salta 1160m
3.XII.1985
Ex Coll.NHM
Basel, Switzerland
“Argentinien W.
Wittmer/ L. Cabral
Coral Salta 1160m
3.XII.1985/ Ex Coll.
NHM Basel,
Switzerland” (NHMB)
Figure 3A 四川:峨嵋山چہ
19573131
中國科學院
“四川:峨嵋山
1957.VII.31 中國科學
院”
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
19
Figure 1. Exemplary specimens used for experimental real-time label scans: A -
(printed labels scanned on pin); B - (printed labels scanned separately); C - (partly
handwritten labels scanned on pin); D - (partly handwritten labels scanned separately).
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
20
Figure 2. Steps of scanning (exemplified by a screenshot from mobile phone) of real-
time data collection, and examples of labels: A – step 1: marking of the text to be
captured via touch screen of the mobile phone (example - printed labels scanned on
pin); B – step 2: select from menu bar (at the right side under three dots) “Copy to
computer” (example - printed labels scanned separately). As to be seen, different labels
at different levels on the pin can be scanned simultaneously and do not need to be
removed from the pin; C – Screenshot showing the capture of multidirectional printed
labels scanned separately from the specimen in Google Lens; D - Screenshot showing
the capture of multiple distorted, printed labels scanned on the pinned specimen in
Google Lens; E - Screenshot showing the initial capture of a printed label scanned
separately from the specimen in Google Keep; F - Screenshot showing the extracted
data resulting from E.
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
21
Figure 3. Other exemplary specimens used for experimental label scans: A – for
Chinese language labels (printed); B - The printed Herpetology collection label that was
scanned in the test of the Apple Shortcuts app algorithm. Note the incomplete text in the
third text line and the cut off text "image 0355" below (compare to the corresponding
data entries in C); C - Screenshot of the automatically scanned collection label as
transferred into cells of the spreadsheet app Numbers. Although the text scan was very
reliable, incomplete text will need editing: the somewhat cut off text "image 0355" of the
label was interpreted as "Tmaee 0355". The time stamp in the first column corresponds
to the file name of the respective photo saved as backup in the Shortcuts directory.
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
22
Figure 4. iOS Shortcuts app algorithm. From top to bottom: The first step will open the
iPhone's Camera app and lets you photograph the label. The photo (“LABEL”) is then
resized (optional, to reduce space) and saved in the background to the Shortcuts
directory in your iCloud account with the current date (and time) as file name. Then the
text is extracted from the photo and stored to a text container. The next step opens the
spreadsheet "Test" in app Numbers; an empty target spreadsheet file (here: "Test") must
be prepared beforehand and waiting in the Shortcuts folder of your iCloud account.
Current Date and Text items are then collected in the “List”. The List items are finally
entered int different columns in the spreadsheet file "Test" and a sheet with the name
"A".
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113
23
Figure 5. Screenshot of bulk-scanned labels via Google keep, inspected afterwards
directly from the computer interface, during the step of copying to of the label text to a
Google document (interface here in Portuguese).
Author-formatted, not peer-reviewed document posted on 06/11/2024. DOI: https://doi.org/10.3897/arphapreprints.e141113