Rotifer Detection and Tracking Framework Using Deep Learning for Automatic Culture Systems

doi:10.21203/rs.3.rs-4302742/v1

Rotifer Detection and Tracking Framework Using Deep Learning for Automatic Culture Systems

2024 · doi:10.21203/rs.3.rs-4302742/v1

preprint OA: closed

Full text JSON View at publisher

Full text 127,260 characters · extracted from preprint-html · click to expand

Rotifer Detection and Tracking Framework Using Deep Learning for Automatic Culture Systems | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Rotifer Detection and Tracking Framework Using Deep Learning for Automatic Culture Systems Naoto Ienaga, Toshinori Takashi, Hitoko Tamamizu, Kei Terayama This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4302742/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Although rotifers ( Brachionus plicatilis sp. complex) are a very important first feed source in marine fish aquaculture, the managementof rotifers is quite time consuming because their population and movements need to be monitored on a daily basis. This management is still performed manually, and automation is required. If we could make good use of recent breakthroughs in deep learning, the automation of a rotifer culture system could be realized. We propose a deep learning framework for detecting and tracking rotifers as a basis for such automation and carefully verified its accuracy. Experimental results show that a mean average precision of 88.5% was achieved for detection and a higher order tracking accuracy of 88.7% was achieved for tracking, indicating the suitability of deep learning methods for predicting the state of rotifers. In addition, this research will contribute to the development of the field by releasing the trained model and code for visualizing the tracking results as well as an annotated dataset with over 30K instances. Rotifer Automatic measurement Deep learning Object detection Multiple object tracking Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Highlights A deep learning framework for rotifer detection and tracking is proposed. The general applicability of deep learning to rotifer data is investigated. The trained model and code for visualizing the tracking results have been released. An annotated dataset with over 30K instances is also available. 1. Introduction A stable supply of healthy rotifers ( Brachionus plicatilis sp. complex) is key to the success of artificial seedling production in marine fish aquaculture. The lorica of rotifers used for seedling production ranges from approximately 80 to 320 µm in size, and rotifers lay about 20 eggs during their 7- to14-day life span. They are therefore widely used as an initial feed for various marine fish larvae. It is necessary to harvest some of the cultivated rotifers periodically before they reach saturation. Otherwise, the rotifers start to die from lack of oxygen (Yamasaki et al., 1987). In addition, when cultivating large numbers of rotifers, problems in the culture frequently occur such as poor rotifer growth or rapid decrease in the culture density due to various factors such as incompatibility in the quality and quantity of culture feed, deterioration in the water quality due to dissolved oxygen and ammonia-forming nitrogen, or the proliferation of ciliate and bacteria (Yu and Hirayama, 1986, Yu et al., 1990, Cheng et al., 1997). It is therefore important to measure the health of the rotifers using metrics such as the number of rotifers, eggs, and contaminated ciliates as well as the movement of rotifers to monitor rotifer density and signs of culture failure. In practice, the measurement process is performed at many aquaculture sites on a daily basis, but it is very time consuming even for experts despite the fact that the basic techniques of culturing rotifers are generally well established. The state of the rotifers is measured based on the experience and intuition of each expert, and as yet, there is no automatic system for measuring the state of rotifers that can be used for practical purposes. Automation of the manual measurement process would not only lead to a significant reduction in time and labor, but would also enable non-specialists to perform objective measurements. Furthermore, this would lead to the automation of the rotifer culture process itself, which would have a significant impact on the fisheries industry. Essential techniques for the realization of an automated rotifer culture system include rotifer classification, detection, and tracking. In some cases, rotifers are obstacles to be removed. Therefore, several systems have been proposed to automatically count or classify rotifers (with or without eggs). Fully automated systems for counting rotifers, measuring their size, and identifying females and males based on shape have been developed (Alver et al., 2007, Stelzer, 2009). The systems also automatically aspirate water from rotifer rearing tanks and capture images. Similarly, a method for detecting rotifers moving in biofilms by calculating image differences (Saur et al., 2014) and a method for detecting rotifers mixed in spirulina by keypoint matching have also been proposed (Lakshmi et al., 2015). These systems heuristically used classical computer vision (CV) techniques that were developed before the breakthrough of deep learning. Therefore, their robustness to new datasets is questionable. Since about 2012, deep learning, especially convolutional neural networks (CNNs) in the field of CV, has rapidly developed and been applied in various fields. This deep learning, a data-driven supervised learning method, should be used to implement the basic techniques for an automatic rotifer culture system, and the applicability of deep learning should be verified. Prior to that, a study proposed a method that classifies rotifers without eggs, those with one egg, and those with two eggs using a NN (Yang and Chou, 2000). However, the input features to the NN in this method were features such as shape features that were extracted using classical CV techniques (which is not surprising, since this study was performed in 2000). Another study has verified the applicability of a neural network (NN) to detect microalgae (Cerbin et al., 2012). More recently, a general object recognition model, YOLOv3 (Redmon and Farhadi, 2018), has been used to detect rotifers (Polumpung et al., 2022) and the algae that are a food source for the rotifers (Tsai et al., 2022). Other studies have used other CNNs to classify four species of zooplankton (Bochinski et al., 2019), 25 species of algae (Yuan et al., 2023), and 103 species of plankton (Lee et al., 2016). As mentioned above, many efforts have been made to classify plankton and detect rotifers. However, all of these studies have focused on images. There are two problems with this approach: ( 1 ) Some of the above studies collect images without killing the rotifers, but in general, it is necessary to kill the rotifers with Lugol's iodine to collect images of them. This wastes rotifers, even though it is a small amount, and adds extra effort. ( 2 ) The movement of the rotifer is also an important index of its state. Analyzing single images does not tell us anything about the movement of the rotifers. Therefore, a system that uses video as input to predict the rotifer state is more desirable. To our best knowledge, only one study (a doctoral dissertation) has used video as input for comprehensive prediction of some rotifer states (Geng, 2021). This study used a CNN to perform rotifer counting, classification (egg-bearing females, non-egg-bearing females, and small particles such as ciliates), and velocity prediction. However, the CNNs were only used for the classification, and deep learning was not used for detection and tracking. Moreover, the code and datasets are not publicly available. The goal of this research is to propose a framework that predicts the number of egg-bearing females/non-egg-bearing females and the movement of rotifers and that uses CNNs in all processes. The contributions of this research include the following: Using the recent detection and tracking CNN models, we proposed a framework to predict rotifer states. No other study has applied such recent CNN models to rotifer data in this way. We have made our trained model available on the Internet so that anyone can easily use our system. The only thing the user needs to do is to prepare a video of rotifers. We have also released an annotated video dataset with more than 30k instances. Such data have not been available before and has the potential to stimulate research on rotifers. It will also assist readers interested in training and testing their own models. The number of ciliates is also an important index of rotifer culture, but they were much smaller than rotifers in our dataset and could not be photographed at a scale that is large enough to capture their visual features. They were also difficult to annotate, making their data collection more difficult than for rotifers. However, we also attempted to count ciliates as a pilot study. 2. Materials and methods 2.1. Overview of proposed system This section provides a brief review of detection and tracking models. The YOLO series is one of the most popular general object recognition models today because of its accuracy and speed. Many researchers improved YOLO since the first YOLO model was published (Redmon et al., 2016), and YOLOv8 was launched in April 2023 (Jocher et al., 2023). Multi-object tracking (MOT) aims to detect and identify each object in a video, with the goal of keeping track of each object. MOT is also one of the most actively researched topics in the machine learning field, and a large number of models have been proposed. There are models that perform both detection and identification in one model, but the tracking-by-detection models, in which identification is performed using the output of object detection models, are highly accurate. The SORT series is one of the best known of these (Bewley et al., 2016). In fact, BOT-SORT, the latest member of the SORT series, achieved the state-of-the-art status at the time of its release (June 2022) (Aharon et al., 2022). We used YOLOv8 for detection and BOT-SORT for tracking. Figure 1 shows an overview of the proposed system and experiments. When a test video is fed to an object detection model (YOLOv8 version 8.0.54) that has been trained on training data (annotated rotifer images), the model outputs detection results for each frame. Next, a tracking model (BOT-SORT) performs MOT using the object detection model. In Experiment 1, the detection results of two classes (egg-bearing females/non-egg-bearing females) were evaluated. In Experiment 2, the tracking was evaluated. In Experiment 3, detection was evaluated for three classes by adding ciliates to the previous two classes as described in the Introduction. Because Experiment 3 is a pilot experiment, this evaluation was performed last. 2.2. Dataset The rotifers were cultured in artificial seawater for 3 weeks. Chlorella was fed and the water temperature was maintained at approximately 30°C. The videos in the dataset were recorded at 16× magnification using a microscope (Olympus SZX7) and a smartphone (Apple iPhone 11). Each video is in full HD. The dataset is available at https://github.com/naotoienaga/rotifer-tool/ . Only instances of objects that were moving and recognizable were annotated using CVAT ( https://www.cvat.ai/ ) , as shown Fig. 1 . Therefore, dead rotifers were not annotated. In addition, rotifers at the edge of the image such that more than half of the body was hidden were not annotated. Table 1 lists the number of instances of each class in each video. The dataset contains five videos. Each video is about 30 s long. The first 90 frames of each video were extracted and annotated (450 images were annotated). Three of the five videos were annotated with the ciliate class ( ciliate ) in addition to the non-egg-bearing female class ( non-egg ) and the egg-bearing female class ( egg ). The number of instances varied, but this is thought to be due to variation in the location of the recording rather than a daily variation in the number of rotifers in the tank (the rotifers dropped into the watch dish were not evenly distributed). Note that the number of rotifers in Video 5 decreased significantly because they had not been fed since the day Video 3 was taken. In addition to the detection annotation, Video 1 was also annotated for tracking. The annotation for tracking differed from the annotation for detection in two ways: ( 1 ) the same ID was kept for the same rotifer as long as the rotifer was in frame because the whole water droplet in the watch dish was not captured; ( 2 ) the egg label was not used, and all instances were annotated as non-egg . This is because even an egg-bearing female can appear to have no eggs depending on the orientation. Moreover, to retain the same ID for the same instance, the labels must be consistent (at least in CVAT). Hence, all instances were labeled as non-egg . Note that the position of the bounding box (bbox) is consistent in the tracking and detection annotations. The average size of the bboxes (in pixels) was 39.3 (± 7.4) in width and 39.8 (± 7.9) in height for the non-egg class; 50.1 (± 10.8) in width and 50.8 (± 10.3) in height for the egg class; and 14.0 (± 4.2) in width and 13.4 (± 4.2) in height for the ciliate class in our dataset. Note that because the orientation of the instances was not taken into account, the width and height are almost the same. Table 1 Number of instances of each class in the dataset. Video ID Non-egg Egg Ciliate 1 2,925 542 - 2 7,268 435 - 3 3,057 774 947 4 4,015 492 4,381 5 1,891 363 2,951 Total 19,156 2,606 8,279 2.3. Training Method This section describes how the object detection model was trained. The YOLOv8m model was used because it is a good compromise between accuracy and computational speed. Early stopping was used for training when the validation loss did not improve for 50 epochs. The size of non-egg-bearing females was less than 40 pixels in our dataset, as mentioned in Section 2.2 , the images were trained without any reduction in size, and no data augmentation regarding image size was performed. In addition, the color of the rotifer was considered to be basically the same, and hence no data augmentation regarding color was applied. By contrast, data augmentation consisting of vertical flipping, mixup (Zhang et al., 2017), copy paste (Ghiasi et al., 2021), and mosaic (Bochkovskiy et al., 2020) were added with a 50% probability. These empirically improved the accuracy of rotifer detection. The batch size was two because the large input images were memory intensive (an NVIDIA GeForce RTX 3090 was used). The other hyperparameters were set to their default values. The conditions above were the same for the two- and three-class classifications. In the two-class classification, the accuracy of detection was evaluated by 5-fold cross-validation to reduce randomness. The model was trained and tested five times, as shown in Table 2 . To ensure a sufficient number of training data, the video with the fewest instances (except for the test video) was chosen for the validation. For the 3-class classification, 3-fold cross-validation was used, and no test data were provided to ensure sufficient training data. In other words, the results shown in Section 3.3 are the results of training with the optimal number of epochs. As described above, the classification result is not the main contribution of this study. Because this experiment was positioned as a pilot experiment and a sufficient number of data could not be prepared because of the difficulty of annotation, this evaluation was made against validation data. Table 2 5( 3 )-fold cross-validation details. The numbers in the table are the video IDs (Table 1 ). Those with R in the fold ID are a two-class classification of non-egg vs. egg , and C indicates three-class classification with ciliate added. Fold ID R1 R2 R3 R4 R5 C1 C2 C3 Train 2, 3, 4 1, 3, 4 1, 2, 4 1, 2, 3 2, 3, 4 4, 5 3, 5 3, 4 Validation 5 5 5 5 1 3 4 5 Test 1 2 3 4 5 - - - As described in Section 2.1 , no additional training was required for tracking. Because the annotation for tracking was only completed for Video 1, the trained object detection model in fold R1 was used for tracking. 2.4. Evaluation Metrics We used four evaluation metrics: ( 1 ) average precision (AP) is one of the most common metrics for evaluating object detection. There are several derivations of the calculation method; we used the one used in the VOC Pascal 2012 Challenge (Everingham et al., 2015, Padilla et al., 2021). AP is calculated separately for each class (this is also true for accuracy in this study). ( 2 ) Accuracy is the simplest metric and is the percentage of predictions that match the ground truth (GT). ( 3 ) The F1-score (F1) is a metric of how correct the classes of the predictions are. The F1 will be lower when there are many incorrect predictions, even if most of the GT can be predicted, or conversely, when there are many GT instances that are not predicted even if most of the predictions are correct. ( 4 ) The higher order tracking accuracy (HOTA) is a metric for MOT. Five to nine metrics are typically used for MOT because tracking is a complex task, and an appropriate metric varies depending on the objective. However, it is clear that a single metric is much easier to understand than multiple metrics. To solve this problem, HOTA was proposed (Luiten et al., 2021, Luiten and Hoffhues, 2023). It has been proven that HOTA scores more closely indicate the perception of the human senses than the conventional typical metrics. HOTA was used in this study to evaluate tracking to give the reader a clear picture of how well the rotifers were tracked. The four evaluation metrics all range from 0–100%, where higher scores indicate better performance. AP and accuracy were used for detection, whereas accuracy, F1, and HOTA were used for tracking. Note that a manually annotated bbox and its class are taken as the GT. When the intersection over union (IoU), which indicates the degree of overlap between the GT and the predicted bbox is greater than or equal to 0.5, the prediction is considered to match the GT. 3. Results and Discussion 3.1. Rotifer detection In addition to the position of the bboxes and classes, object detection models typically output a confidence value. This is a value that represents how reliable the prediction is. If the threshold of the confidence value is too low, there will be too many incorrect bboxes, but if the threshold is too high, the correct predictions will also be discarded. To determine the optimal confidence threshold in the dataset constructed in this study, we first evaluated the results of the validation dataset (Fig. 2 ). In Fig. 2 , the values were calculated using only bboxes whose confidence values exceeded the threshold. Because of the very large amount of noise, the minimum threshold was set at 1%. AP, due to its calculation method, is higher as more predictions are made, as confirmed in Fig. 2 . The best values for each metric were 95.0% (at a threshold of 1%) for AP and 84.7% (at a threshold of 50%) for accuracy. Here, 0.5, the maximum accuracy, was taken as the best confidence threshold and was used in subsequent experiments. Note that the AP was 89.8% at a 50% confidence threshold. Figure 3 shows an example of detection. Overall, the predictions were accurate; for example, in the top center of the image, three rotifers overlap, but are detected well. However, there is a tendency for more errors to occur when the rotifers overlap or are at the edges of the image. When the rotifers were at the edge of the image and more than half of their bodies were hidden, they were not annotated (no GT bbox), but were still predicted in some cases. In these cases, the prediction was not actually incorrect, but was considered erroneous for consistency in the evaluation. Table 3 shows the AP and accuracy for the detection of the non-egg and egg classes. The AP and accuracy showed a generally similar trend. The reason for the lower accuracy for the egg class than for the non-egg class can be attributed to the fact that there are substantially fewer data available for the egg class than for the non-egg class. Table 4 shows the confusion matrices. It appears that debris and other materials were incorrectly predicted as non-egg-bearing females, and egg-bearing females were incorrectly predicted as non-egg-bearing females in many cases. Table 3 AP and accuracy for the 2-class detection for each fold. Metric Fold R1 Fold R2 Fold R3 Fold R4 Fold R5 Mean (Std.) AP (%) Non-egg 93.3 96.9 98.0 95.8 97.8 96.4 (1.7) Egg 61.1 90.8 76.3 88.1 89.0 81.1 (11.2) Mean 77.2 93.8 87.2 91.9 93.4 88.7 (11.1) Accuracy (%) Non-egg 84.7 95.2 89.9 85.6 94.1 89.9 (4.3) Egg 58.7 82.9 75.6 75.8 87.1 76.0 (9.7) Mean 71.7 89.0 82.7 80.7 90.6 83.0 (10.2) Table 4 Confusion matrices for each class and all instances. Unlike in Table 3 , each confusion matrix shows the total, not the average. For example, the number of cases in which a non-egg-bearing female was predicted but no non-egg-bearing female was actually there (or an egg-bearing female was there) was 1587, and the number of cases in which no non-egg-bearing female was predicted but a non-egg-bearing female was actually there was 407. Note that “true negative” is not defined in object detection (because there are countless such cases). Non-egg Egg Total Actual True False True False True False Prediction True 18,749 1,587 2,093 185 20,842 1,772 False 407 - 513 - 920 - 3.2. Rotifer tracking For tracking, the training model of fold R1 described in Section 3.1 was used (this is also the published trained model). Figure 4 confirms that the tracking was highly accurate, and a HOTA of 88.7% was achieved. As mentioned in Section 2.2 , the labels are all non-egg in the manually annotated data. Hence, we again checked which non-egg-bearing females were egg-bearing females. By contrast, each instance was predicted to be either non-egg or egg in each frame in the prediction (the tracking ID was preserved even if the instance was predicted to be of a different class in different frames). Hence, a rotifer that was predicted to be egg even once was classified as egg . Of course, the tracking IDs of the GT instances and predictions do not match. Therefore, we manually mapped these IDs. The mapping was made using an IoU of 0.5 as a guide in the last frame of the test video, as shown in Fig. 4 . As shown in Table 5 , there were 45 rotifers in total. It can be seen that a passable accuracy (70.4%) was achieved. The F1 for the non-egg class was 82.2%, and the F1 for the egg class was 61.5%. The main reason why high accuracy could not be achieved was the low precision for the egg class (the percentage of instances predicted as egg that were actually egg ). By contrast, the recall for the egg class was high (the percentage of instances that were predicted as egg out of the instances that were actually egg ). This is the opposite result of Table 4 , and is probably due to the classification algorithm; that is, the algorithm classifies the rotifer that is predicted as egg in even one frame as an instance of egg . How to classify in video tracking is a topic for future work. Overall, we can conclude that for this system, the detection was acceptable, but the tracking was highly accurate. Table 5 Confusion matrix for the tracking results. “None” indicates that there was no predicted/GT bbox. There were no prediction omissions (the third row). Actual Non-egg Egg None Prediction Non-egg 30 1 6 Egg 6 8 3 None 0 0 0 3.3. Ciliate detection As described in Section 3.1 , we evaluated the detection performance for three classes, including ciliate . The qualitative result is shown in Fig. 5 and the quantitative results are listed in Tables 6 and 7 . As we can see from Fig. 5 , many ciliates were not predicted, and many predictions were wrong. Ciliates were very small, and there was a lot of similarly sized debris in this dataset, which probably made the ciliates difficult to detect. As can be seen from the tables, the results for the non-egg class are almost the same as those in Section 3.1 , but the number of egg instances has dropped. The best results for the ciliate class were an AP of 68.0% and an accuracy of 52.9%, but the average AP was 43.1%, which is not an acceptable level. Table 6 AP and accuracy for 3-class detection for each fold. Metric Fold C1 Fold C2 Fold C3 Mean (Std.) AP (%) Non-egg 97.6 94.5 97.7 96.6 (1.5) Egg 78.9 60.0 82.6 73.8 (9.9) Ciliate 28.5 68.0 32.8 43.1 (17.7) Mean 68.3 74.2 71.0 71.2 (24.9) Accuracy (%) Non-egg 90.8 84.9 90.4 88.7 (2.7) Egg 77.2 49.7 78.2 68.4 (13.2) Ciliate 28.2 52.9 35.1 38.7 (10.4) Mean 65.4 62.5 67.9 65.3 (22.7) Table 7 Confusion matrices for each class and the total. Non-egg Egg Ciliate Total Actual True False True False True False True False Prediction True 8,785 1,021 1,269 264 5,364 3,774 15,418 5,059 False 178 - 360 - 2,914 - 3,452 - 3.4. Available data The dataset (videos and annotations), the weights of the trained object detection model, and code to output the video as shown in Fig. 6 from the output of the tracking model are available ( https://github.com/naotoienaga/rotifer-tool/ ). As shown in Fig. 6 , the code visualizes the trajectory of each rotifer for the last 3 s (this value can be changed), the number of instances in that frame (total, non-egg , and egg ), and the number of instances in the entire video. 4. Conclusion In this study, we proposed a framework for detecting and tracking rotifers. This framework is a fundamental component of an automated culture system for rotifers, which are a very important feed in aquaculture. By incorporating recent successful deep learning methods, highly accurate detection and tracking were achieved. We believe that our results indicate that the use of deep learning methods could lead to the realization of an automated rotifer culture system. In the future, by collaborating with aquaculture experts, we would like to realize a system that not only predicts quantitative numbers, but also diagnoses the status of the rotifers; for instance, whether it is time to harvest or whether the culture is developing. Because the trained model has been made available, anyone with an interest can reproduce the results of this study. In addition, a newly constructed dataset with over 30k instances has published to assist those who would like to use it to train or validate their respective models. This study has several limitations. First, the dataset consists of videos that show only a portion of the droplets in a watch dish. Thus, it is difficult to estimate the overall number of rotifers. To estimate this number, one could reduce the magnification or the volume of the droplets. However, reducing the magnification also reduces the size of rotifers recorded, which could decrease accuracy, and reducing the droplet volume would lead to errors in estimating the total number of rotifers. The optimal magnification and droplet volume will be investigated in the future. The number of eggs held by egg-bearing females in this dataset was only one, but in general, many rotifers carry two eggs simultaneously. As mentioned in Section 3.1 , the number of instances of the egg-bearing females class itself was also low. This should be improved. The dataset contained a large amount of debris and other materials (Section 3.3 ). While it is important to try to include as little debris as possible, it would be difficult to completely remove debris that is smaller than a rotifer. Therefore, it would be better to focus only on moving objects using tracking, which has been confirmed to be very accurate (Section 3.2 ). Declarations Author Contribution N.I. collected data, worked out almost all of the technical details, wrote the manuscript, and acquired funding. T.T. and K.T. conceived of the presented idea, and supervised the project. H.T. annotated data. K.T. acquired funding. All authors discussed the results and reviewed the manuscript. Acknowledgement We are grateful to Dr. Masahiko Koiso for insightful comments. This work was supported by JSPS KAKENHI Grant Numbers 22K14932 and 20K15587. We thank Kimberly Moravec, PhD, from Edanz (https://jp.edanz.com/ac) for editing a draft of this manuscript. Data Availability Data is provided within the manuscript. References Yamasaki S, Secor DH, Hirata H (1987) Population growth of two types of rotifer (L and S) brachionus plicatilis at different dissolved oxygen levels. Nippon Suisan Gakkaishi 53(7):1303. https://doi.org/10.2331/suisan.53.1303 Yu J-P, Hirayama K (1986) The effect of un-ionized ammonia on the population growth of the rotifer in mass culture. Nippon Suisan Gakkaishi 52(9):1509–1513. https://doi.org/10.2331/suisan.52.1509 Yu J-P, Hino A, Noguchi T, Wakabayashi H (1990) Toxicity of vibrio alginolyticus on the survival of the rotifer brachionus plicatilis. Nippon Suisan Gakkaishi 56(9):1455–1460. https://doi.org/10.2331/suisan.56.1455 Cheng S-H, Suzaki T, Hino A (1997) Lethality of the heliozoon oxnerella maritima on the rotifer brachionus rotundiformis. Fish Sci 63(4):543–546. https://doi.org/10.2331/fishsci.63.543 Alver MO, Tennøy T, Alfredsen JA, Øie G (2007) Automatic measurement of rotifer Brachionus plicatilis densities in first feeding tanks. Aquacult Eng 36(2):115–121. https://doi.org/10.1016/j.aquaeng.2006.09.002 Stelzer C-P (2009) Automated system for sampling, counting, and biological analysis of rotifer populations: Automated analysis of rotifer populations. Limnol Oceanography: Methods 7(12):856–864. https://doi.org/10.4319/lom.2009.7.856 Saur T, Milferstedt K, Bernet N, Escudié R (2014) An automated method for the quantification of moving predators such as rotifers in biofilms by image analysis. J Microbiol Methods 103:40–43. https://doi.org/10.1016/j.mimet.2014.05.009 Lakshmi S, Siva Kumar R, Rajendran S (2015) Automated system for identifying and recognizing rotifer contamination in spirulina. Indian J Sci Technol 8(8):702. https://doi.org/10.17485/ijst/2015/v8i8/63673 Cerbin S, Nowakowski K, Dach J, Pilarski K, Boniecki P, Przybyl J, Lewicki A (2012) Possibilities of neural image analysis implementation in monitoring of microalgae production as a substrate for biogas plant. Fourth International Conference on Digital Image Processing , 8334 , 458–462. https://doi.org/10.1117/12.954164 Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 . https://doi.org/10.48550/arXiv.1804.02767 Polumpung A, Lim KG, Tan MK, Shaleh M, Chin SRY, Kin RKT (2022) K. T. Optimizing high-density aquaculture rotifer detection using deep Learning Algorithm. 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology , 1–6. https://doi.org/10.1109/IICAIET55139.2022.9936794 Tsai S-M, Chuang M-L, Huang P-S (2022) Detection and counting of algae based on deep learning. 2022 IEEE International Conference on Consumer Electronics - Taiwan , 597–598. https://doi.org/10.1109/ICCE-Taiwan55306.2022.9869225 Bochinski E, Bacha G, Eiselein V, Walles TJW, Nejstgaard JC, Sikora T (2019) Deep active learning for in situ plankton classification. Pattern Recognit Inform Forensics ICPR 2018 11188:5–15. https://doi.org/10.1007/978-3-030-05792-3_1 Yuan A, Wang B, Li J, Lee JHW (2023) A low-cost edge AI-chip-based system for real-time algae species classification and HAB prediction. Water Res 233:119727. https://doi.org/10.1016/j.watres.2023.119727 Lee H, Park M, Kim J (2016) Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning. 2016 IEEE International Conference on Image Processing , 3713–3717. https://doi.org/10.1109/ICIP.2016.7533053 Yang C-Y, Chou J-J (2000) Classification of rotifers with machine vision by shape moment invariants. Aquacult Eng 24(1):33–57. https://doi.org/10.1016/S0144-8609(00)00065-0 Geng J (2021) Toward automation: Developing machine learning based intelligent vision for automated rotifer brachionus spp. culture systems. Doctoral Dissertation , University of Miami Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. IEEE Conference on Computer Vision and Pattern Recognition , 779–788 Jocher G, Chaurasia A, Qiu J (2023) YOLO by Ultralytics [Computer software]. https://github.com/ultralytics/ultralytics (accessed 17 April 2023). Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. 2016 IEEE International Conference on Image Processing , 3464–3468. https://doi.org/10.1109/ICIP.2016.7533003 Aharon N, Orfaig R, Bobrovsky BZ (2022) BoT-SORT: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651 . https://doi.org/10.48550/arXiv.2206.14651 Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 . https://doi.org/10.48550/arXiv.1710.09412 Ghiasi G, Cui Y, Srinivas A, Qian R, Lin TY, Cubuk ED, Le QV, Zoph B (2021) Simple copy-paste is a strong data augmentation method for instance segmentation. IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2918–2928 Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 . https://doi.org/10.48550/arXiv.2004.10934 Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vision 111:98–136. https://doi.org/10.1007/s11263-014-0733-5 Padilla R, Passos WL, Dias TL, Netto SL, Da Silva EA (2021) A comparative analysis of object detection metrics with a companion open-source toolkit. Electronics 10(3):279. https://doi.org/10.3390/electronics10030279 Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, Leibe B (2021) HOTA: A higher order metric for evaluating multi-object tracking. Int J Comput Vision 129:548–578. https://doi.org/10.1007/s11263-020-01375-2 Luiten J, Hoffhues A TrackEval. https://github.com/JonathonLuiten/TrackEval (accessed 17 April 2023) Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4302742","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":296247440,"identity":"f4fb21fb-d813-43c3-9dc3-fd633d58260b","order_by":0,"name":"Naoto Ienaga","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAvklEQVRIiWNgGAWjYBACNghlI8PADmLzEK8ljYeBmVgtUHAYqoUYwMfee/Djl5rzPPzMzM8eMMjcIcJhPOeSpWWO3eaRbGYzN2DgeUaEFokcA2kJtts8BocZzCQYeA4ToUX+jfFviX/neOwPs38jUosEj5nkx7YDPAbMPMTawpNjZs3Yl8wjcZin3CCBGL/It58xvvnjm50cf3v7tgcfe4gIMRBghkdgYs8B4rQw/oAzfxCpZRSMglEwCkYUAAB6CjCWEhwUwwAAAABJRU5ErkJggg==","orcid":"","institution":"University of Tsukuba","correspondingAuthor":true,"prefix":"","firstName":"Naoto","middleName":"","lastName":"Ienaga","suffix":""},{"id":296247441,"identity":"f6b1b8be-cef8-43e0-80f8-fe756e2c4b4b","order_by":1,"name":"Toshinori Takashi","email":"","orcid":"","institution":"Japan Fisheries Research and Education Agency","correspondingAuthor":false,"prefix":"","firstName":"Toshinori","middleName":"","lastName":"Takashi","suffix":""},{"id":296247442,"identity":"6d51488c-5f30-4741-8342-f4434fd80c29","order_by":2,"name":"Hitoko Tamamizu","email":"","orcid":"","institution":"Yokohama City University","correspondingAuthor":false,"prefix":"","firstName":"Hitoko","middleName":"","lastName":"Tamamizu","suffix":""},{"id":296247444,"identity":"69a4e1b0-c670-45c8-a48e-5d70c43a72d9","order_by":3,"name":"Kei Terayama","email":"","orcid":"","institution":"Yokohama City University","correspondingAuthor":false,"prefix":"","firstName":"Kei","middleName":"","lastName":"Terayama","suffix":""}],"badges":[],"createdAt":"2024-04-22 03:07:28","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4302742/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4302742/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":55504555,"identity":"adb93c38-d247-46b0-842a-4d3906e99bd7","added_by":"auto","created_at":"2024-04-29 11:13:39","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":820014,"visible":true,"origin":"","legend":"\u003cp\u003eOverview of the proposed framework and the evaluations.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-4302742/v1/489c98bc3cee5cacfcf41cd8.png"},{"id":55505017,"identity":"6102752a-b48f-4328-b3db-7b8470973429","added_by":"auto","created_at":"2024-04-29 11:21:39","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":223933,"visible":true,"origin":"","legend":"\u003cp\u003eChanges in the evaluation metrics at different confidence thresholds on the validation dataset. Each metric is the average of the classes and folds. That is, they are the average of 10 values (2 classes × 5 folds).\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-4302742/v1/2a53297df365baa241fc9b43.png"},{"id":55504549,"identity":"5692c3c0-7391-4684-9a5d-298ef8d145b3","added_by":"auto","created_at":"2024-04-29 11:13:39","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1156716,"visible":true,"origin":"","legend":"\u003cp\u003eExample of rotifer detection. The image is the first frame of the test video for fold R2, which has the most instances. Solid lines represent the GT and dashed lines represent prediction results.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-4302742/v1/feb1b3c40a4b6d8375e6d84a.png"},{"id":55505018,"identity":"7fa54c14-daf8-4e84-8215-9894f8ac954d","added_by":"auto","created_at":"2024-04-29 11:21:39","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1440669,"visible":true,"origin":"","legend":"\u003cp\u003eExample of rotifer tracking. The position of each bbox in the last frame and the trajectory of its movement are drawn in different colors. The last position is also drawn for the bbox of the instances that left the image field of view in the middle of the video.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-4302742/v1/67da210af41bf04f84a85011.png"},{"id":55504554,"identity":"9d1c78cd-1481-45d9-b209-421af55b265b","added_by":"auto","created_at":"2024-04-29 11:13:39","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1200552,"visible":true,"origin":"","legend":"\u003cp\u003eExample of 3-class detection. The image is the first frame of the test video for fold C2, which has the most instances. The solid lines represent GT instances and the dashed lines represent prediction results.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-4302742/v1/8f3171d879fc9e8893a073ed.png"},{"id":55504553,"identity":"44e975e5-7469-48fb-9377-c4eca3261bd5","added_by":"auto","created_at":"2024-04-29 11:13:39","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":1360044,"visible":true,"origin":"","legend":"\u003cp\u003eResults of the visualization code on the original 30-s video (Video 2). The green numbers indicate the number of \u003cem\u003enon-egg\u003c/em\u003e instances, the red numbers indicate the number of \u003cem\u003eegg\u003c/em\u003e instances, and the blue numbers are the total. The top row is the number in that frame and the bottom row is the number in the whole video.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-4302742/v1/ab1a92d3319596658362f903.png"},{"id":56920938,"identity":"c68c301e-34c4-44c7-ad1f-43b09c9daf4e","added_by":"auto","created_at":"2024-05-22 07:32:25","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":8576906,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4302742/v1/f4e47847-6929-431c-abdd-ab2c988dd46e.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Rotifer Detection and Tracking Framework Using Deep Learning for Automatic Culture Systems","fulltext":[{"header":"Highlights","content":"\u003cul\u003e\n \u003cli\u003eA deep learning framework for rotifer detection and tracking is proposed.\u003c/li\u003e\n \u003cli\u003eThe general applicability of deep learning to rotifer data is investigated.\u003c/li\u003e\n \u003cli\u003eThe trained model and code for visualizing the tracking results have been released.\u003c/li\u003e\n \u003cli\u003eAn annotated dataset with over 30K instances is also available.\u003c/li\u003e\n\u003c/ul\u003e"},{"header":"1. Introduction","content":"\u003cp\u003eA stable supply of healthy rotifers (\u003cem\u003eBrachionus plicatilis\u003c/em\u003e sp. complex) is key to the success of artificial seedling production in marine fish aquaculture. The lorica of rotifers used for seedling production ranges from approximately 80 to 320 \u0026micro;m in size, and rotifers lay about 20 eggs during their 7- to14-day life span. They are therefore widely used as an initial feed for various marine fish larvae.\u003c/p\u003e \u003cp\u003eIt is necessary to harvest some of the cultivated rotifers periodically before they reach saturation. Otherwise, the rotifers start to die from lack of oxygen (Yamasaki et al., 1987). In addition, when cultivating large numbers of rotifers, problems in the culture frequently occur such as poor rotifer growth or rapid decrease in the culture density due to various factors such as incompatibility in the quality and quantity of culture feed, deterioration in the water quality due to dissolved oxygen and ammonia-forming nitrogen, or the proliferation of ciliate and bacteria (Yu and Hirayama, 1986, Yu et al., 1990, Cheng et al., 1997). It is therefore important to measure the health of the rotifers using metrics such as the number of rotifers, eggs, and contaminated ciliates as well as the movement of rotifers to monitor rotifer density and signs of culture failure. In practice, the measurement process is performed at many aquaculture sites on a daily basis, but it is very time consuming even for experts despite the fact that the basic techniques of culturing rotifers are generally well established. The state of the rotifers is measured based on the experience and intuition of each expert, and as yet, there is no automatic system for measuring the state of rotifers that can be used for practical purposes. Automation of the manual measurement process would not only lead to a significant reduction in time and labor, but would also enable non-specialists to perform objective measurements. Furthermore, this would lead to the automation of the rotifer culture process itself, which would have a significant impact on the fisheries industry.\u003c/p\u003e \u003cp\u003eEssential techniques for the realization of an automated rotifer culture system include rotifer classification, detection, and tracking. In some cases, rotifers are obstacles to be removed. Therefore, several systems have been proposed to automatically count or classify rotifers (with or without eggs). Fully automated systems for counting rotifers, measuring their size, and identifying females and males based on shape have been developed (Alver et al., 2007, Stelzer, 2009). The systems also automatically aspirate water from rotifer rearing tanks and capture images. Similarly, a method for detecting rotifers moving in biofilms by calculating image differences (Saur et al., 2014) and a method for detecting rotifers mixed in spirulina by keypoint matching have also been proposed (Lakshmi et al., 2015). These systems heuristically used classical computer vision (CV) techniques that were developed before the breakthrough of deep learning. Therefore, their robustness to new datasets is questionable.\u003c/p\u003e \u003cp\u003eSince about 2012, deep learning, especially convolutional neural networks (CNNs) in the field of CV, has rapidly developed and been applied in various fields. This deep learning, a data-driven supervised learning method, should be used to implement the basic techniques for an automatic rotifer culture system, and the applicability of deep learning should be verified. Prior to that, a study proposed a method that classifies rotifers without eggs, those with one egg, and those with two eggs using a NN (Yang and Chou, 2000). However, the input features to the NN in this method were features such as shape features that were extracted using classical CV techniques (which is not surprising, since this study was performed in 2000). Another study has verified the applicability of a neural network (NN) to detect microalgae (Cerbin et al., 2012). More recently, a general object recognition model, YOLOv3 (Redmon and Farhadi, 2018), has been used to detect rotifers (Polumpung et al., 2022) and the algae that are a food source for the rotifers (Tsai et al., 2022). Other studies have used other CNNs to classify four species of zooplankton (Bochinski et al., 2019), 25 species of algae (Yuan et al., 2023), and 103 species of plankton (Lee et al., 2016).\u003c/p\u003e \u003cp\u003eAs mentioned above, many efforts have been made to classify plankton and detect rotifers. However, all of these studies have focused on images. There are two problems with this approach: (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) Some of the above studies collect images without killing the rotifers, but in general, it is necessary to kill the rotifers with Lugol's iodine to collect images of them. This wastes rotifers, even though it is a small amount, and adds extra effort. (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) The movement of the rotifer is also an important index of its state. Analyzing single images does not tell us anything about the movement of the rotifers. Therefore, a system that uses video as input to predict the rotifer state is more desirable.\u003c/p\u003e \u003cp\u003eTo our best knowledge, only one study (a doctoral dissertation) has used video as input for comprehensive prediction of some rotifer states (Geng, 2021). This study used a CNN to perform rotifer counting, classification (egg-bearing females, non-egg-bearing females, and small particles such as ciliates), and velocity prediction. However, the CNNs were only used for the classification, and deep learning was not used for detection and tracking. Moreover, the code and datasets are not publicly available.\u003c/p\u003e \u003cp\u003eThe goal of this research is to propose a framework that predicts the number of egg-bearing females/non-egg-bearing females and the movement of rotifers and that uses CNNs in all processes. The contributions of this research include the following:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eUsing the recent detection and tracking CNN models, we proposed a framework to predict rotifer states. No other study has applied such recent CNN models to rotifer data in this way.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eWe have made our trained model available on the Internet so that anyone can easily use our system. The only thing the user needs to do is to prepare a video of rotifers.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eWe have also released an annotated video dataset with more than 30k instances. Such data have not been available before and has the potential to stimulate research on rotifers. It will also assist readers interested in training and testing their own models.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe number of ciliates is also an important index of rotifer culture, but they were much smaller than rotifers in our dataset and could not be photographed at a scale that is large enough to capture their visual features. They were also difficult to annotate, making their data collection more difficult than for rotifers. However, we also attempted to count ciliates as a pilot study.\u003c/p\u003e"},{"header":"2. Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Overview of proposed system\u003c/h2\u003e \u003cp\u003eThis section provides a brief review of detection and tracking models. The YOLO series is one of the most popular general object recognition models today because of its accuracy and speed. Many researchers improved YOLO since the first YOLO model was published (Redmon et al., 2016), and YOLOv8 was launched in April 2023 (Jocher et al., 2023). Multi-object tracking (MOT) aims to detect and identify each object in a video, with the goal of keeping track of each object. MOT is also one of the most actively researched topics in the machine learning field, and a large number of models have been proposed. There are models that perform both detection and identification in one model, but the tracking-by-detection models, in which identification is performed using the output of object detection models, are highly accurate. The SORT series is one of the best known of these (Bewley et al., 2016). In fact, BOT-SORT, the latest member of the SORT series, achieved the state-of-the-art status at the time of its release (June 2022) (Aharon et al., 2022). We used YOLOv8 for detection and BOT-SORT for tracking.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e shows an overview of the proposed system and experiments. When a test video is fed to an object detection model (YOLOv8 version 8.0.54) that has been trained on training data (annotated rotifer images), the model outputs detection results for each frame. Next, a tracking model (BOT-SORT) performs MOT using the object detection model. In Experiment 1, the detection results of two classes (egg-bearing females/non-egg-bearing females) were evaluated. In Experiment 2, the tracking was evaluated. In Experiment 3, detection was evaluated for three classes by adding ciliates to the previous two classes as described in the Introduction. Because Experiment 3 is a pilot experiment, this evaluation was performed last.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2. Dataset\u003c/h2\u003e \u003cp\u003eThe rotifers were cultured in artificial seawater for 3 weeks. Chlorella was fed and the water temperature was maintained at approximately 30\u0026deg;C. The videos in the dataset were recorded at 16\u0026times; magnification using a microscope (Olympus SZX7) and a smartphone (Apple iPhone 11). Each video is in full HD. The dataset is available at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/naotoienaga/rotifer-tool/\u003c/span\u003e\u003cspan address=\"https://github.com/naotoienaga/rotifer-tool/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eOnly instances of objects that were moving and recognizable were annotated using CVAT (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.cvat.ai/\u003c/span\u003e\u003cspan address=\"https://www.cvat.ai/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e)\u003c/span\u003e, as shown Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Therefore, dead rotifers were not annotated. In addition, rotifers at the edge of the image such that more than half of the body was hidden were not annotated.\u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e lists the number of instances of each class in each video. The dataset contains five videos. Each video is about 30 s long. The first 90 frames of each video were extracted and annotated (450 images were annotated). Three of the five videos were annotated with the ciliate class (\u003cem\u003eciliate\u003c/em\u003e) in addition to the non-egg-bearing female class (\u003cem\u003enon-egg\u003c/em\u003e) and the egg-bearing female class (\u003cem\u003eegg\u003c/em\u003e). The number of instances varied, but this is thought to be due to variation in the location of the recording rather than a daily variation in the number of rotifers in the tank (the rotifers dropped into the watch dish were not evenly distributed). Note that the number of rotifers in Video 5 decreased significantly because they had not been fed since the day Video 3 was taken. In addition to the detection annotation, Video 1 was also annotated for tracking. The annotation for tracking differed from the annotation for detection in two ways: (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) the same ID was kept for the same rotifer as long as the rotifer was in frame because the whole water droplet in the watch dish was not captured; (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) the \u003cem\u003eegg\u003c/em\u003e label was not used, and all instances were annotated as \u003cem\u003enon-egg\u003c/em\u003e. This is because even an egg-bearing female can appear to have no eggs depending on the orientation. Moreover, to retain the same ID for the same instance, the labels must be consistent (at least in CVAT). Hence, all instances were labeled as \u003cem\u003enon-egg\u003c/em\u003e. Note that the position of the bounding box (bbox) is consistent in the tracking and detection annotations.\u003c/p\u003e \u003cp\u003eThe average size of the bboxes (in pixels) was 39.3 (\u0026plusmn;\u0026thinsp;7.4) in width and 39.8 (\u0026plusmn;\u0026thinsp;7.9) in height for the \u003cem\u003enon-egg\u003c/em\u003e class; 50.1 (\u0026plusmn;\u0026thinsp;10.8) in width and 50.8 (\u0026plusmn;\u0026thinsp;10.3) in height for the \u003cem\u003eegg\u003c/em\u003e class; and 14.0 (\u0026plusmn;\u0026thinsp;4.2) in width and 13.4 (\u0026plusmn;\u0026thinsp;4.2) in height for the \u003cem\u003eciliate\u003c/em\u003e class in our dataset. Note that because the orientation of the instances was not taken into account, the width and height are almost the same.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eNumber of instances of each class in the dataset.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eVideo ID\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eNon-egg\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eEgg\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003eCiliate\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e2,925\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e542\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e7,268\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e435\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e3,057\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e774\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e947\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e4,015\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e492\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e4,381\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1,891\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e363\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e2,951\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e19,156\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2,606\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e8,279\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3. Training Method\u003c/h2\u003e \u003cp\u003eThis section describes how the object detection model was trained. The YOLOv8m model was used because it is a good compromise between accuracy and computational speed. Early stopping was used for training when the validation loss did not improve for 50 epochs. The size of non-egg-bearing females was less than 40 pixels in our dataset, as mentioned in Section \u003cspan refid=\"Sec4\" class=\"InternalRef\"\u003e2.2\u003c/span\u003e, the images were trained without any reduction in size, and no data augmentation regarding image size was performed. In addition, the color of the rotifer was considered to be basically the same, and hence no data augmentation regarding color was applied. By contrast, data augmentation consisting of vertical flipping, mixup (Zhang et al., 2017), copy paste (Ghiasi et al., 2021), and mosaic (Bochkovskiy et al., 2020) were added with a 50% probability. These empirically improved the accuracy of rotifer detection. The batch size was two because the large input images were memory intensive (an NVIDIA GeForce RTX 3090 was used). The other hyperparameters were set to their default values. The conditions above were the same for the two- and three-class classifications.\u003c/p\u003e \u003cp\u003eIn the two-class classification, the accuracy of detection was evaluated by 5-fold cross-validation to reduce randomness. The model was trained and tested five times, as shown in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. To ensure a sufficient number of training data, the video with the fewest instances (except for the test video) was chosen for the validation. For the 3-class classification, 3-fold cross-validation was used, and no test data were provided to ensure sufficient training data. In other words, the results shown in Section \u003cspan refid=\"Sec10\" class=\"InternalRef\"\u003e3.3\u003c/span\u003e are the results of training with the optimal number of epochs. As described above, the classification result is not the main contribution of this study. Because this experiment was positioned as a pilot experiment and a sufficient number of data could not be prepared because of the difficulty of annotation, this evaluation was made against validation data.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e5(\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e)-fold cross-validation details. The numbers in the table are the video IDs (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Those with R in the fold ID are a two-class classification of \u003cem\u003enon-egg\u003c/em\u003e vs. \u003cem\u003eegg\u003c/em\u003e, and C indicates three-class classification with \u003cem\u003eciliate\u003c/em\u003e added.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"9\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFold ID\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eR1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eR2\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eR3\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eR4\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eR5\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eC1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eC2\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eC3\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTrain\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2, 3, 4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1, 3, 4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1, 2, 4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1, 2, 3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e2, 3, 4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e4, 5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e3, 5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e3, 4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eValidation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAs described in Section \u003cspan refid=\"Sec3\" class=\"InternalRef\"\u003e2.1\u003c/span\u003e, no additional training was required for tracking. Because the annotation for tracking was only completed for Video 1, the trained object detection model in fold R1 was used for tracking.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4. Evaluation Metrics\u003c/h2\u003e \u003cp\u003eWe used four evaluation metrics: (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e) average precision (AP) is one of the most common metrics for evaluating object detection. There are several derivations of the calculation method; we used the one used in the VOC Pascal 2012 Challenge (Everingham et al., 2015, Padilla et al., 2021). AP is calculated separately for each class (this is also true for accuracy in this study). (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) Accuracy is the simplest metric and is the percentage of predictions that match the ground truth (GT). (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e) The F1-score (F1) is a metric of how correct the classes of the predictions are. The F1 will be lower when there are many incorrect predictions, even if most of the GT can be predicted, or conversely, when there are many GT instances that are not predicted even if most of the predictions are correct. (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e) The higher order tracking accuracy (HOTA) is a metric for MOT. Five to nine metrics are typically used for MOT because tracking is a complex task, and an appropriate metric varies depending on the objective. However, it is clear that a single metric is much easier to understand than multiple metrics. To solve this problem, HOTA was proposed (Luiten et al., 2021, Luiten and Hoffhues, 2023). It has been proven that HOTA scores more closely indicate the perception of the human senses than the conventional typical metrics. HOTA was used in this study to evaluate tracking to give the reader a clear picture of how well the rotifers were tracked.\u003c/p\u003e \u003cp\u003eThe four evaluation metrics all range from 0\u0026ndash;100%, where higher scores indicate better performance. AP and accuracy were used for detection, whereas accuracy, F1, and HOTA were used for tracking. Note that a manually annotated bbox and its class are taken as the GT. When the intersection over union (IoU), which indicates the degree of overlap between the GT and the predicted bbox is greater than or equal to 0.5, the prediction is considered to match the GT.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Results and Discussion","content":"\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Rotifer detection\u003c/h2\u003e \u003cp\u003eIn addition to the position of the bboxes and classes, object detection models typically output a confidence value. This is a value that represents how reliable the prediction is. If the threshold of the confidence value is too low, there will be too many incorrect bboxes, but if the threshold is too high, the correct predictions will also be discarded. To determine the optimal confidence threshold in the dataset constructed in this study, we first evaluated the results of the validation dataset (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the values were calculated using only bboxes whose confidence values exceeded the threshold. Because of the very large amount of noise, the minimum threshold was set at 1%. AP, due to its calculation method, is higher as more predictions are made, as confirmed in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. The best values for each metric were 95.0% (at a threshold of 1%) for AP and 84.7% (at a threshold of 50%) for accuracy. Here, 0.5, the maximum accuracy, was taken as the best confidence threshold and was used in subsequent experiments. Note that the AP was 89.8% at a 50% confidence threshold.\u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows an example of detection. Overall, the predictions were accurate; for example, in the top center of the image, three rotifers overlap, but are detected well. However, there is a tendency for more errors to occur when the rotifers overlap or are at the edges of the image. When the rotifers were at the edge of the image and more than half of their bodies were hidden, they were not annotated (no GT bbox), but were still predicted in some cases. In these cases, the prediction was not actually incorrect, but was considered erroneous for consistency in the evaluation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows the AP and accuracy for the detection of the \u003cem\u003enon-egg\u003c/em\u003e and \u003cem\u003eegg\u003c/em\u003e classes. The AP and accuracy showed a generally similar trend. The reason for the lower accuracy for the \u003cem\u003eegg\u003c/em\u003e class than for the \u003cem\u003enon-egg\u003c/em\u003e class can be attributed to the fact that there are substantially fewer data available for the \u003cem\u003eegg\u003c/em\u003e class than for the \u003cem\u003enon-egg\u003c/em\u003e class. Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows the confusion matrices. It appears that debris and other materials were incorrectly predicted as non-egg-bearing females, and egg-bearing females were incorrectly predicted as non-egg-bearing females in many cases.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAP and accuracy for the 2-class detection for each fold.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003eMetric\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFold R1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFold R2\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eFold R3\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eFold R4\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eFold R5\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eMean (Std.)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eAP (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eNon-egg\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e93.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e96.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e98.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e95.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e97.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e96.4 (1.7)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eEgg\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e61.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e90.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e76.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e88.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e89.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e81.1 (11.2)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMean\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e77.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e93.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e87.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e91.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e93.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e88.7 (11.1)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eAccuracy (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eNon-egg\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e84.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e95.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e89.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e85.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e94.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e89.9 (4.3)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eEgg\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e58.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e82.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e75.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e75.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e87.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e76.0 (9.7)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMean\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e71.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e89.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e82.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e80.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e90.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e83.0 (10.2)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eConfusion matrices for each class and all instances. Unlike in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, each confusion matrix shows the total, not the average. For example, the number of cases in which a non-egg-bearing female was predicted but no non-egg-bearing female was actually there (or an egg-bearing female was there) was 1587, and the number of cases in which no non-egg-bearing female was predicted but a non-egg-bearing female was actually there was 407. Note that \u0026ldquo;true negative\u0026rdquo; is not defined in object detection (because there are countless such cases).\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e \u003cp\u003e\u003cem\u003eNon-egg\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c6\" namest=\"c5\"\u003e \u003cp\u003e\u003cem\u003eEgg\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c8\" namest=\"c7\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003eActual\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTrue\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFalse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTrue\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eFalse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eTrue\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eFalse\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003ePrediction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTrue\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e18,749\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1,587\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2,093\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e185\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e20,842\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e1,772\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFalse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e407\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e513\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e920\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Rotifer tracking\u003c/h2\u003e \u003cp\u003eFor tracking, the training model of fold R1 described in Section \u003cspan refid=\"Sec8\" class=\"InternalRef\"\u003e3.1\u003c/span\u003e was used (this is also the published trained model). Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e confirms that the tracking was highly accurate, and a HOTA of 88.7% was achieved.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAs mentioned in Section \u003cspan refid=\"Sec4\" class=\"InternalRef\"\u003e2.2\u003c/span\u003e, the labels are all \u003cem\u003enon-egg\u003c/em\u003e in the manually annotated data. Hence, we again checked which non-egg-bearing females were egg-bearing females. By contrast, each instance was predicted to be either \u003cem\u003enon-egg\u003c/em\u003e or \u003cem\u003eegg\u003c/em\u003e in each frame in the prediction (the tracking ID was preserved even if the instance was predicted to be of a different class in different frames). Hence, a rotifer that was predicted to be \u003cem\u003eegg\u003c/em\u003e even once was classified as \u003cem\u003eegg\u003c/em\u003e. Of course, the tracking IDs of the GT instances and predictions do not match. Therefore, we manually mapped these IDs. The mapping was made using an IoU of 0.5 as a guide in the last frame of the test video, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eAs shown in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, there were 45 rotifers in total. It can be seen that a passable accuracy (70.4%) was achieved. The F1 for the \u003cem\u003enon-egg\u003c/em\u003e class was 82.2%, and the F1 for the \u003cem\u003eegg\u003c/em\u003e class was 61.5%. The main reason why high accuracy could not be achieved was the low precision for the \u003cem\u003eegg\u003c/em\u003e class (the percentage of instances predicted as \u003cem\u003eegg\u003c/em\u003e that were actually \u003cem\u003eegg\u003c/em\u003e). By contrast, the recall for the \u003cem\u003eegg\u003c/em\u003e class was high (the percentage of instances that were predicted as \u003cem\u003eegg\u003c/em\u003e out of the instances that were actually \u003cem\u003eegg\u003c/em\u003e). This is the opposite result of Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, and is probably due to the classification algorithm; that is, the algorithm classifies the rotifer that is predicted as \u003cem\u003eegg\u003c/em\u003e in even one frame as an instance of \u003cem\u003eegg\u003c/em\u003e. How to classify in video tracking is a topic for future work. Overall, we can conclude that for this system, the detection was acceptable, but the tracking was highly accurate.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eConfusion matrix for the tracking results. \u0026ldquo;None\u0026rdquo; indicates that there was no predicted/GT bbox. There were no prediction omissions (the third row).\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" morerows=\"1\" nameend=\"c2\" namest=\"c1\" rowspan=\"2\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c5\" namest=\"c3\"\u003e \u003cp\u003eActual\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eNon-egg\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003eEgg\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNone\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003ePrediction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eNon-egg\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eEgg\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e3.3. Ciliate detection\u003c/h2\u003e \u003cp\u003eAs described in Section \u003cspan refid=\"Sec8\" class=\"InternalRef\"\u003e3.1\u003c/span\u003e, we evaluated the detection performance for three classes, including \u003cem\u003eciliate\u003c/em\u003e. The qualitative result is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e and the quantitative results are listed in Tables\u0026nbsp;\u003cspan refid=\"Tab6\" class=\"InternalRef\"\u003e6\u003c/span\u003e and \u003cspan refid=\"Tab7\" class=\"InternalRef\"\u003e7\u003c/span\u003e. As we can see from Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, many ciliates were not predicted, and many predictions were wrong. Ciliates were very small, and there was a lot of similarly sized debris in this dataset, which probably made the ciliates difficult to detect.\u003c/p\u003e \u003cp\u003eAs can be seen from the tables, the results for the \u003cem\u003enon-egg\u003c/em\u003e class are almost the same as those in Section \u003cspan refid=\"Sec8\" class=\"InternalRef\"\u003e3.1\u003c/span\u003e, but the number of \u003cem\u003eegg\u003c/em\u003e instances has dropped. The best results for the \u003cem\u003eciliate\u003c/em\u003e class were an AP of 68.0% and an accuracy of 52.9%, but the average AP was 43.1%, which is not an acceptable level.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab6\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 6\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAP and accuracy for 3-class detection for each fold.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003eMetric\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFold C1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFold C2\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eFold C3\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eMean (Std.)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eAP (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eNon-egg\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e97.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e97.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e96.6 (1.5)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eEgg\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e78.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e60.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e82.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e73.8 (9.9)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eCiliate\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e28.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e68.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e32.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e43.1 (17.7)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMean\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e68.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e74.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e71.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e71.2 (24.9)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003eAccuracy (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eNon-egg\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e90.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e84.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e90.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e88.7 (2.7)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eEgg\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e77.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e49.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e78.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e68.4 (13.2)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eCiliate\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e28.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e52.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e35.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e38.7 (10.4)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMean\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e65.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e62.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e67.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e65.3 (22.7)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab7\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 7\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eConfusion matrices for each class and the total.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"10\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e \u003cp\u003e\u003cem\u003eNon-egg\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c6\" namest=\"c5\"\u003e \u003cp\u003e\u003cem\u003eEgg\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c8\" namest=\"c7\"\u003e \u003cp\u003e\u003cem\u003eCiliate\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c10\" namest=\"c9\"\u003e \u003cp\u003eTotal\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003eActual\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eTrue\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFalse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eTrue\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eFalse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eTrue\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eFalse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eTrue\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eFalse\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003ePrediction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTrue\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e8,785\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1,021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1,269\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e264\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e5,364\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e3,774\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e15,418\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e5,059\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFalse\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e178\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e360\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e2,914\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e3,452\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e3.4. Available data\u003c/h2\u003e \u003cp\u003eThe dataset (videos and annotations), the weights of the trained object detection model, and code to output the video as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e from the output of the tracking model are available (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/naotoienaga/rotifer-tool/\u003c/span\u003e\u003cspan address=\"https://github.com/naotoienaga/rotifer-tool/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e).\u003c/span\u003e\u003c/p\u003e \u003cp\u003eAs shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, the code visualizes the trajectory of each rotifer for the last 3 s (this value can be changed), the number of instances in that frame (total, \u003cem\u003enon-egg\u003c/em\u003e, and \u003cem\u003eegg\u003c/em\u003e), and the number of instances in the entire video.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. Conclusion","content":"\u003cp\u003eIn this study, we proposed a framework for detecting and tracking rotifers. This framework is a fundamental component of an automated culture system for rotifers, which are a very important feed in aquaculture. By incorporating recent successful deep learning methods, highly accurate detection and tracking were achieved. We believe that our results indicate that the use of deep learning methods could lead to the realization of an automated rotifer culture system. In the future, by collaborating with aquaculture experts, we would like to realize a system that not only predicts quantitative numbers, but also diagnoses the status of the rotifers; for instance, whether it is time to harvest or whether the culture is developing.\u003c/p\u003e \u003cp\u003eBecause the trained model has been made available, anyone with an interest can reproduce the results of this study. In addition, a newly constructed dataset with over 30k instances has published to assist those who would like to use it to train or validate their respective models.\u003c/p\u003e \u003cp\u003eThis study has several limitations. First, the dataset consists of videos that show only a portion of the droplets in a watch dish. Thus, it is difficult to estimate the overall number of rotifers. To estimate this number, one could reduce the magnification or the volume of the droplets. However, reducing the magnification also reduces the size of rotifers recorded, which could decrease accuracy, and reducing the droplet volume would lead to errors in estimating the total number of rotifers. The optimal magnification and droplet volume will be investigated in the future. The number of eggs held by egg-bearing females in this dataset was only one, but in general, many rotifers carry two eggs simultaneously. As mentioned in Section \u003cspan refid=\"Sec8\" class=\"InternalRef\"\u003e3.1\u003c/span\u003e, the number of instances of the egg-bearing females class itself was also low. This should be improved. The dataset contained a large amount of debris and other materials (Section \u003cspan refid=\"Sec10\" class=\"InternalRef\"\u003e3.3\u003c/span\u003e). While it is important to try to include as little debris as possible, it would be difficult to completely remove debris that is smaller than a rotifer. Therefore, it would be better to focus only on moving objects using tracking, which has been confirmed to be very accurate (Section \u003cspan refid=\"Sec9\" class=\"InternalRef\"\u003e3.2\u003c/span\u003e).\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eN.I. collected data, worked out almost all of the technical details, wrote the manuscript, and acquired funding. T.T. and K.T. conceived of the presented idea, and supervised the project. H.T. annotated data. K.T. acquired funding. All authors discussed the results and reviewed the manuscript.\u003c/p\u003e\u003ch3\u003eAcknowledgement\u003c/h3\u003e\n\u003cp\u003eWe are grateful to Dr. Masahiko Koiso for insightful comments. This work was supported by JSPS KAKENHI Grant Numbers 22K14932 and 20K15587. We thank Kimberly Moravec, PhD, from Edanz (https://jp.edanz.com/ac) for editing a draft of this manuscript.\u0026nbsp;\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eData is provided within the manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eYamasaki S, Secor DH, Hirata H (1987) Population growth of two types of rotifer (L and S) brachionus plicatilis at different dissolved oxygen levels. Nippon Suisan Gakkaishi 53(7):1303. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2331/suisan.53.1303\u003c/span\u003e\u003cspan address=\"10.2331/suisan.53.1303\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu J-P, Hirayama K (1986) The effect of un-ionized ammonia on the population growth of the rotifer in mass culture. Nippon Suisan Gakkaishi 52(9):1509\u0026ndash;1513. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2331/suisan.52.1509\u003c/span\u003e\u003cspan address=\"10.2331/suisan.52.1509\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu J-P, Hino A, Noguchi T, Wakabayashi H (1990) Toxicity of vibrio alginolyticus on the survival of the rotifer brachionus plicatilis. Nippon Suisan Gakkaishi 56(9):1455\u0026ndash;1460. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2331/suisan.56.1455\u003c/span\u003e\u003cspan address=\"10.2331/suisan.56.1455\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCheng S-H, Suzaki T, Hino A (1997) Lethality of the heliozoon oxnerella maritima on the rotifer brachionus rotundiformis. Fish Sci 63(4):543\u0026ndash;546. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2331/fishsci.63.543\u003c/span\u003e\u003cspan address=\"10.2331/fishsci.63.543\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlver MO, Tenn\u0026oslash;y T, Alfredsen JA, \u0026Oslash;ie G (2007) Automatic measurement of rotifer Brachionus plicatilis densities in first feeding tanks. Aquacult Eng 36(2):115\u0026ndash;121. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.aquaeng.2006.09.002\u003c/span\u003e\u003cspan address=\"10.1016/j.aquaeng.2006.09.002\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStelzer C-P (2009) Automated system for sampling, counting, and biological analysis of rotifer populations: Automated analysis of rotifer populations. Limnol Oceanography: Methods 7(12):856\u0026ndash;864. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.4319/lom.2009.7.856\u003c/span\u003e\u003cspan address=\"10.4319/lom.2009.7.856\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSaur T, Milferstedt K, Bernet N, Escudi\u0026eacute; R (2014) An automated method for the quantification of moving predators such as rotifers in biofilms by image analysis. J Microbiol Methods 103:40\u0026ndash;43. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.mimet.2014.05.009\u003c/span\u003e\u003cspan address=\"10.1016/j.mimet.2014.05.009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLakshmi S, Siva Kumar R, Rajendran S (2015) Automated system for identifying and recognizing rotifer contamination in spirulina. Indian J Sci Technol 8(8):702. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.17485/ijst/2015/v8i8/63673\u003c/span\u003e\u003cspan address=\"10.17485/ijst/2015/v8i8/63673\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCerbin S, Nowakowski K, Dach J, Pilarski K, Boniecki P, Przybyl J, Lewicki A (2012) Possibilities of neural image analysis implementation in monitoring of microalgae production as a substrate for biogas plant. \u003cem\u003eFourth International Conference on Digital Image Processing\u003c/em\u003e, \u003cem\u003e8334\u003c/em\u003e, 458\u0026ndash;462. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1117/12.954164\u003c/span\u003e\u003cspan address=\"10.1117/12.954164\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRedmon J, Farhadi A (2018) Yolov3: An incremental improvement. \u003cem\u003earXiv preprint arXiv:1804.02767\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.1804.02767\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.1804.02767\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePolumpung A, Lim KG, Tan MK, Shaleh M, Chin SRY, Kin RKT (2022) K. T. Optimizing high-density aquaculture rotifer detection using deep Learning Algorithm. \u003cem\u003e2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology\u003c/em\u003e, 1\u0026ndash;6. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/IICAIET55139.2022.9936794\u003c/span\u003e\u003cspan address=\"10.1109/IICAIET55139.2022.9936794\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTsai S-M, Chuang M-L, Huang P-S (2022) Detection and counting of algae based on deep learning. \u003cem\u003e2022 IEEE International Conference on Consumer Electronics - Taiwan\u003c/em\u003e, 597\u0026ndash;598. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICCE-Taiwan55306.2022.9869225\u003c/span\u003e\u003cspan address=\"10.1109/ICCE-Taiwan55306.2022.9869225\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBochinski E, Bacha G, Eiselein V, Walles TJW, Nejstgaard JC, Sikora T (2019) Deep active learning for in situ plankton classification. Pattern Recognit Inform Forensics ICPR 2018 11188:5\u0026ndash;15. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-030-05792-3_1\u003c/span\u003e\u003cspan address=\"10.1007/978-3-030-05792-3_1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYuan A, Wang B, Li J, Lee JHW (2023) A low-cost edge AI-chip-based system for real-time algae species classification and HAB prediction. Water Res 233:119727. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.watres.2023.119727\u003c/span\u003e\u003cspan address=\"10.1016/j.watres.2023.119727\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee H, Park M, Kim J (2016) Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning. \u003cem\u003e2016 IEEE International Conference on Image Processing\u003c/em\u003e, 3713\u0026ndash;3717. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICIP.2016.7533053\u003c/span\u003e\u003cspan address=\"10.1109/ICIP.2016.7533053\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang C-Y, Chou J-J (2000) Classification of rotifers with machine vision by shape moment invariants. Aquacult Eng 24(1):33\u0026ndash;57. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/S0144-8609(00)00065-0\u003c/span\u003e\u003cspan address=\"10.1016/S0144-8609(00)00065-0\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGeng J (2021) Toward automation: Developing machine learning based intelligent vision for automated rotifer brachionus spp. culture systems. \u003cem\u003eDoctoral Dissertation\u003c/em\u003e, \u003cem\u003eUniversity of Miami\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRedmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. \u003cem\u003eIEEE Conference on Computer Vision and Pattern Recognition\u003c/em\u003e, 779\u0026ndash;788\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJocher G, Chaurasia A, Qiu J (2023) YOLO by Ultralytics [Computer software]. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/ultralytics/ultralytics\u003c/span\u003e\u003cspan address=\"https://github.com/ultralytics/ultralytics\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (accessed 17 April 2023).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. \u003cem\u003e2016 IEEE International Conference on Image Processing\u003c/em\u003e, 3464\u0026ndash;3468. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICIP.2016.7533003\u003c/span\u003e\u003cspan address=\"10.1109/ICIP.2016.7533003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAharon N, Orfaig R, Bobrovsky BZ (2022) BoT-SORT: Robust associations multi-pedestrian tracking. \u003cem\u003earXiv preprint arXiv:2206.14651\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2206.14651\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2206.14651\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: Beyond empirical risk minimization. \u003cem\u003earXiv preprint arXiv:1710.09412\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.1710.09412\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.1710.09412\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGhiasi G, Cui Y, Srinivas A, Qian R, Lin TY, Cubuk ED, Le QV, Zoph B (2021) Simple copy-paste is a strong data augmentation method for instance segmentation. \u003cem\u003eIEEE/CVF Conference on Computer Vision and Pattern Recognition\u003c/em\u003e, 2918\u0026ndash;2928\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. \u003cem\u003earXiv preprint arXiv:2004.10934\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2004.10934\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2004.10934\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEveringham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vision 111:98\u0026ndash;136. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11263-014-0733-5\u003c/span\u003e\u003cspan address=\"10.1007/s11263-014-0733-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePadilla R, Passos WL, Dias TL, Netto SL, Da Silva EA (2021) A comparative analysis of object detection metrics with a companion open-source toolkit. Electronics 10(3):279. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/electronics10030279\u003c/span\u003e\u003cspan address=\"10.3390/electronics10030279\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taix\u0026eacute; L, Leibe B (2021) HOTA: A higher order metric for evaluating multi-object tracking. Int J Comput Vision 129:548\u0026ndash;578. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11263-020-01375-2\u003c/span\u003e\u003cspan address=\"10.1007/s11263-020-01375-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuiten J, Hoffhues A TrackEval. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/JonathonLuiten/TrackEval\u003c/span\u003e\u003cspan address=\"https://github.com/JonathonLuiten/TrackEval\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e (accessed 17 April 2023)\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Rotifer, Automatic measurement, Deep learning, Object detection, Multiple object tracking","lastPublishedDoi":"10.21203/rs.3.rs-4302742/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4302742/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eAlthough rotifers (\u003cem\u003eBrachionus plicatilis\u003c/em\u003e sp. complex) are a very important first feed source in marine fish aquaculture, the managementof rotifers is quite time consuming because their population and movements need to be monitored on a daily basis. This management is still performed manually, and automation is required. If we could make good use of recent breakthroughs in deep learning, the automation of a rotifer culture system could be realized. We propose a deep learning framework for detecting and tracking rotifers as a basis for such automation and carefully verified its accuracy. Experimental results show that a mean average precision of 88.5% was achieved for detection and a higher order tracking accuracy of 88.7% was achieved for tracking, indicating the suitability of deep learning methods for predicting the state of rotifers. In addition, this research will contribute to the development of the field by releasing the trained model and code for visualizing the tracking results as well as an annotated dataset with over 30K instances.\u003c/p\u003e","manuscriptTitle":"Rotifer Detection and Tracking Framework Using Deep Learning for Automatic Culture Systems","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-04-29 11:13:34","doi":"10.21203/rs.3.rs-4302742/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"2a280a7d-fdb9-45cc-b391-4b9c67de353e","owner":[],"postedDate":"April 29th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-05-22T07:24:14+00:00","versionOfRecord":[],"versionCreatedAt":"2024-04-29 11:13:34","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4302742","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4302742","identity":"rs-4302742","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00