Abstract
High-throughput phenotyping of biological samples is essential for large-scale studies but
is frequently bottlenecked by the need for accurate instance segmentation in crowded
images. While deep learning offers powerful solutions, the high cost of manual annotation
and the requirement for coding expertis e often limit adoption in routine laboratory
workflows. Here we present SegBio, a lightweight, open-source pipeline that enables end-
to-end instance segmentation for non -expert users. The protocol features an interactive
annotation GUI that extrapolates full masks from minimal centerline markings, significantly
reducing manual labeling effort. It further integrates a configurable U-Net training module
and a standalone inference application with a ‘human-in-the-loop’ editing workflow for
rapid and intuitive error correction. We employ the pipeline to annotate and train the model
on a novel dataset of crowded C. elegans images. Validated on independent datasets,
SegBio achieves high segmentation performance (Panoptic quality ~0.85) and accurately
quantifies per-animal morphology and fluorescence. By eliminating external dependencies
and streamlining the correction process, SegBio provides a scalable solution for routine
phenotyping that is easily generalized to other crowded biological samples , such as
cellular organelles, cells, and organisms.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Introduction
Deep learning (DL) approaches, particularly U-Net–style architectures and their variants,
have become a standard choice for biomedical semantic segmentation because they are
relatively efficient and can generalize well when trained on sufficiently diverse annotated
data 1–3. At the same time, crowded microscopy often requires instance segmentation, and
a broad landscape of solutions has emerged, including detection -first approaches 4–6,
shape-representation methods tailored to microscopy 7–9, and generalist “off -the-shelf”
tools designed for broad transfer across imaging conditions 10,11. Many practical pipelines
also combine CNN probability maps with classical post -processing (such as marker -
controlled watershed) to split touching objects into individual instances 12–14. Nevertheless,
several practical obstacles continue to limit adoption in many labs.
Training a DL model from scratch requires generating large annotated datasets, which can
be prohibitive if done manually. On the other hand, pre-trained models and public datasets
are not always usable off -the-shelf due to differences in image setups betwe en labs
(microscopes, acquisition settings, sample preparation, etc.). Even modest distribution
shifts can degrade performance and force users to fine -tune or retrain models on locally
acquired images 10,15. Additionally, even strong models are not fully autonomous in
practice. Noisy samples, and occasional merge/split errors and other edge -case failures
still require human supervision and targeted manual edits to maintain a high standard of
accuracy, making usability and correction workflows crucial for lowering the barrier for non-
expert users while preserving reproducibility and transparency 16,17.
Caenorhabditis elegans is a widely used model organism for studying development, aging,
disease mechanisms, and gene function, in part because it is small, optically accessible,
and amenable to scalable imaging assays 18–22. Large-scale phenotyping in brightfield and
fluorescence microscopy often depends on extracting per -animal measurements (e.g.,
count, size, posture -derived metrics, or fluorescence intensity ) rather than relying on
whole-image summary statistics 23–26. However, obtaining these measurements typically
requires accurate segmentation of individual worms, a step that can become a major
bottleneck when experiments generate a large number of images, or when animals are
tightly clustered.
C. elegans computer vision has long included behavioral -analysis pipelines that do not
rely on dense instance masks, ranging from early machine-vision tracking and phenotype
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
classification to newer segmentation -free methods for locomotor quantification under
noisy imaging conditions 22,23. However, manual instance segmentation remains common
in worm imaging pipelines because it provides high-quality masks and can be used directly
for downstream morphology and fluorescence quantification 27,28. Yet manual outlining is
slow, labor-intensive, and introduces variability across users and labs. This challenge is
amplified in crowded microscopy fields where adjacent worms are difficult to separate
consistently. There is, therefore, a strong incentive to develop automated or semi -
automated worm segmentation tools, enabling high -throughput image analysis, while
maintaining accuracy through human-in-the-loop editing workflows.
Related challenges also arise in whole -brain C. elegans imaging, where automated
segmentation, correspondence, and tracking of densely packed neurons in moving
animals have motivated pipelines based on registration, semi-synthetic training, targeted
augmentation, and multi-lab data harmonization 29–32.
Here we introduce SegBio, an open-source, lightweight instance segmentation tool, pre-
trained for crowded brightfield microscopy images of adult C. elegans, and designed to
work as a simple, intuitive off-the-shelf solution for routine use. At the same time, SegBio
includes three integrated modules that directly address the practical obstacles above: (i)
a fast annotation module to reduce the cost of generating lab -specific training data, (ii) a
training module that supports straightforward retraining or fine -tuning the model to novel
imaging conditions, and (i ii) a standalone inference -and-editing workflow that enables
efficient human-in-the-loop correction of model outputs.
We first provide an overview of the pipeline, followed by a detailed implementation and
usage guide for each component. We then provide quantitative validation of the pre -
trained model on two independently curated validation sets to estimate out -of-sample
performance and typical correction burden. Finally, we demonstrate practical common
downstream analyses enabled by the resulting instance masks, including per -animal
morphology measurements and fluorescence quantification.
In summary, SegBio provides an accessible and flexible image analysis pipeline that
lowers the barrier to adopting deep -learning segmentation in everyday experiments,
supporting reproducible, scalable worm phenotyping from raw microscopy images to
analysis-ready measurements.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Results
Software overview
SegBio provides a simple, open-source end-to-end solution for instance segmentation,
from training data creation by manual annotation to deep -learning based automatic
labeling and morphological feature extraction. The toolset presented here is fine-tuned for
adult C. elegans nematodes but is easily adaptable for a wide range of other use-cases.
The library includes three software modules, including two simple and intuitive Graphical
User Interface modules (GUIs):
(i) Manual annotation GUI for fast labeling/data curation (Fig 1A)
(ii) CNN-based (U-Net) instance segmentation training library (Fig 1B)
(iii) Automatic inference module and manual editing GUI (Fig 1C)
The first two modules provide open-source data curation and model training pipelines with
which the toolset can be adapted for additional tasks. Module (i) is a standalone, self-
contained, Python-based executable GUI that allows users to create full instance masks
of elongated objects without painstakingly tracing the entire outline of the object. The user
must only trace the centerline, specify the widest point of each object, and supply a
characteristic tapering profile. The software then extrapolates this data into full object
masks. This method is much faster and requires significantly less tracing accuracy than
hand-drawing a full outline.
Module (ii) is a PyTorch-based flexible U-Net training pipeline that can be used to retrain
the model on new data. Users can easily adjust multiple parameters to better fit the model
for their needs, including the architecture of the network (number of layers and filters) ,
image augmentations, post-processing method and filtering criteria. The resulting model
can then be integrated into module (iii) for inference.
The inference GUI ( iii) is delivered as a standalone, self -contained executable with pre -
trained model weights tuned for crowded microscopy images of adult C. elegans. The
package is written in Python (PyTorch), but does not require an existing Python
installation, has no external dependencies, and can employ either GPU or CPU for
inference. As such, the GUI is highly accessible to anyone, regardless of available
hardware or coding knowledge, and can be used out -of-the-box for multiple applications
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
in C. elegans research, including counting individuals, extracting morphological
measurements, and quantifying fluorescence.
Detailed user and developer guides for all modules are available in
https://github.com/zaslab/SegBio.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Figure 1. The segmentation pipeline consists of annotation, model training, and
inference modules
A) The annotation module (i) is an interactive manual marking interface where the
user marks the length and width of the object. Full instance masks are then
extrapolated from the markings.
B) The model training module (ii) consists of a configurable U-Net that performs a 3-
class semantic segmentation on all object pixels ( foreground), object outlines
(boundaries), and object centerline (seeds). The separate class masks are
obtained by deconstruction of the object masks from module 1. The U -Net is
constructed with flexible depth and width, allowing users to easily adapt and retrain
the model for their own data.
C) The inference module (iii) provides a simple and intuitive GUI to perform automatic
instance segmentation of images. The 3-class predictions given by the model are
transformed into instance masks via a watershed algorithm that fills in objects
growing outward from the seeds until a boundary is reached. The resulting object
masks are then displayed in the GUI on top of the image for simple manual editing.
The user can apply quick fixes by lightly adjusting model outputs and rerunning the
post-processing segmentation. Alternatively, the user can edit the ma sks
themselves for finer control.
Model validation
To assess the quality of the model’s predictions we used two validation sets created
separately from the training set by two different experimenters. The experimenters used
the model to segment images and then manually adjusted the output with the editing tool.
We then compared the raw predictions of the model to the final user-adjusted masks (Fig
2, Table 1). This provides an estimate of the independent accuracy of the model on new
and variable data, and the amount of required user post-processing.
Across the two test sets, the model performed consistently well in both object detection,
and mask matching. From the perspective of object detection, the two main failure modes
of the model are over -segmentation, where a single object is divided into mult iple
instances (false positives, fp), and under -segmentation, where multiple objects are
combined into a single instance (false negatives, fn). The model performed well on both
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
metrics, correctly identifying the vast majority of worms in an image (Fig 2 A-D). The
number of both over- and under-segmented instances (fp and fn, respectively, Fig 2 A,B)
in both test datasets was less than one per image, resulting in precision and recall of ~1
(Fig 2 C,D). For correctly identified objects, quality of segmentation was also high, with an
average IoU of ~0.88 (Fig 2 E) across datasets. This indicates strong overlap between
predicted masks and ground truth. Panoptic quality, a summary metric combining object
detection and mask overlap, was ~0.85 (Fig 2 F) across datasets, indicating strong overall
performance of the model.
Figure 2. The model accurately segments individual worms with minimal manual
editing.
A) False positive per image , matched at a threshold of IoU>0.5. False positives
normally occur when a worm is over -segmented into multiple individuals. Such
fragmentations can be fixed by removing the extra boundaries separating the
fragments.
B) False negatives per image, matched at a threshold of IoU>0.5. False negatives
can occur when a full worm is either not segmented or is filtered out during post
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
processing. These can be mitigated by adjusting the filtering criteria or by manually
adding a new mask.
C) Precision per image.
D) Recall per image.
E) Intersection over union for individual worm masks, averaged over each image.
F) Panoptic quality per image, a summary statistic combining object detection and
mask overlap (recognition quality and segmentation quality, respectively, Table 1).
Dataset Dataset 1 Dataset 2
N images 60 250
Metric Mean Std Mean Std
N ground truth 12.339 1.108 13.558 0.971
N predicted 12.492 1.104 13.530 1.021
Objects found (%) 0.957 0.140 0.993 0.030
True positive 11.814 2.047 13.458 1.051
False positive 0.678 1.842 0.072 0.274
False negative 0.525 1.685 0.100 0.392
Precision 0.947 0.142 0.995 0.020
Recall 0.957 0.140 0.993 0.030
Recognition quality 0.951 0.138 0.993 0.022
Segmentation quality 0.872 0.024 0.885 0.020
Panoptic quality 0.830 0.126 0.879 0.030
Matched instance IoU 0.872 0.024 0.885 0.020
Foreground IoU 0.866 0.065 0.891 0.022
Table 1. Model performance metrics on two separate test sets
The toolset supports multiple applications of worm segmentation
Many research applications require accurate morphological measurements of individual
animals. These include studies on development, aging, diseases, etc. Manual tracing of
individuals is slow, labor intensive, and can suffer from experimenter bias. Our mod el
solves these issues by quickly and accurately creating masks of single animals, and
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
extracting common and useful metric s from these masks including object length, width,
and area , as well as allowing for quantification of fluorescence intensity in transgenic
animals. Further measurements (e.g. posture, tapering) can be obtained from the masks
as needed.
Figure 3A shows microscopy images obtained with an additional fluorescence channel of
hsp-6::GFP worms that exhibited elevated fluorescence after consuming bacteria exposed
to zinc in a concentration dependent manner (see methods). The images were segmented
using the trained model and inference GUI, and the predicted masks were used to quantify
the median fluorescence intensity of all the pixels within each worm. The masks provide
an accurate measurement of fluorescence, capturing the dose-dependent GFP response
(Fig 3B).
The masks also faithfully reproduce morphological differences. Zinc toxicity has
prevIoUsly been shown to affect the morphology of adult worms 33,34. We used the
predicted masks to extract length measurements of zinc exposed worms, and found that
worm length decreased with high exposure to zinc (Fig 3C).
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Figure 3. SegBio can be employed to measure fluorescence intensity in transgenic
animals
A) Microscope images of hsp-6::GFP worms obtained with brightfield (top), and
fluorescence (middle) channels, and individual worm masks (bottom) segmented
by the model.
B) Median fluorescence intensity of hsp-6::GFP worms increases with exposure to
high concentrations of zinc.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
C) Length of hsp-6::GFP worms decreases with exposure to high concentrations of
zinc. Asterisks denote significant differences from the 0mM control: * p<1e-3; **
p<1e-4; *** p<1e -5, two -sided Welch’s t -test with Holm multiple comparison
correction.
Extending the model to additional functions
The flexible structure of the pipeline allows the user to easily adapt the model for new
tasks. For example, by adding body -part specific masks to the training input the model
can learn to identify that part in addition to the full worm masks. Depending on the target
body part, this can be achieved with minimal additional manual labeling. To demonstrate
this feature, we constructed head -specific masks by simply extracting the front 15% of
each ground-truth worm mask. This required a single extra click per w orm to mark the
head location in the manual annotation GUI (i). The head masks were concatenated with
the ground-truth labels (foreground, boundary, and seed) before training, and the model
was extended with an additional output class and retrained.
With this, the inference module (iii) is able to locate the centroid of the target body part. To
demonstrate the utility of this, and to showcase the generalizability of the pipeline, we
applied the model to a new collection of images obtained by a third experimenter with a
different acquisition protocol, including a smaller magnification. For these images we used
the hsp-4::GFP strain, whose expression pattern is different from the hsp-6::GFP (Fig 4A).
We then used the predicted head location (Fig 4B) to compare the fluorescence profile
along the body of the worms in both strains. Figure 4C shows the difference in
fluorescence intensity distribution. The hsp-4 promoter shows two main foci of expression:
a narrow peak just behind the pharyngeal bulb, and a wider peak in the posterior region,
encompassing about a third of the body length. In contrast, hsp-6 shows a single intensity
peak around the anterior gut region , faithfully recreating the difference in expression
patterns of the two promoters.
Overall, the toolkit’s open-source and flexible architecture makes it highly generalizable,
adaptable, and expandable, offering high utility to any instance segmentation task, even
outside of the included pre-trained model.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Figure 4. Distribution of fluorescence along the body, obtained by expanding the
model to segment the head region
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
A) Microscope images of hsp-4::GFP worms obtained with brightfield (top), and
fluorescence (middle) channels, and individual worm masks (bottom) segmented
by the extended model.
B) Zoomed in image of the worm masks showing the centroid of the segmented head
region (red plus sign).
C) Comparison of fluorescence profile of hsp-4::GFP vs hsp-6::GFP worms.
Fluorescence intensity was measured in each worm in 20 bins starting from the
model-identified head. Intensity was normalized within each worm before
averaging.
Discussion
Reliable segmentation is foundational for quantitative biological imaging, from
morphological profiling and organism counting to tracking behavior and quantifying
fluorescence. The field has converged on a handful of practical solution families, each
reflecting a different tradeoff between speed, accuracy, and usability. Classical pipelines
(thresholding, morphology, watershed) are fast and interpretable, but often brittle.
Semantic segmentation models, most often U-Net, are generally more robust, but shift the
burden to curating training data, compute, and maintenance, and pre-trained models may
not transfer cleanly across microscopes or acquisition settings. Instance-focused methods
address touching and overlap using detection-first approaches (e.g., Mask R -CNN), but
these can be heavier to deploy, sensitive to domain shift, or still require manual correction
in edge cases. Together, these tradeoffs highlight the need for tools that are lightweight,
robust, and adaptable, while remaining approachable to non-experts.
SegBio is designed to fit in this tradeoff intersection, making instance segmentation of
crowded adult C. elegans microscopy images practical in day -to-day experimental work
and accessible to a wide user base, regardless of prior experience or hardware availability.
It offers a lightweight end-to-end pipeline spanning training-label creation, model training,
inference, and post -hoc correction organized into three separate but interconnected
modules.
A central focus in the design is to reduce the cost of producing labels for supervised
learning. Rather than requiring painstaking manual tracing of every worm, the annotation
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
workflow (i) relies on a simple anatomical prior, the tapering profile, to extrapolate object
masks from a minimal, few -click centerline. Though the resulting masks are not always
pixel-perfect, this compromise preserves the benefits of supervised learning while making
it realistic to generate sufficient labeled data. The flexible, heuristic-based implementation
also means that the utility of the annotation tool is not confined to C. elegans nematodes.
As long as the target objects are relatively uniform in proportions, a custom tapering profile
can be provided to adapt the entire pipeline to any object or animal shape.
On the modeling side, the approach is as broad as possible. Featuring flexible model
architecture, and a generalizable inference algorithm, the training module (ii) can be
leveraged to accommodate similar segmentation tasks of varying complexity. However, in
the spirit of accessibility, the library comes with a pre -trained plug -and-play model for
brightfield adult C. elegans images which can be immediately utilized for a variety of
experimental purposes. We demonstrate several such applications including
morphological and fluorescence intensity measurements, and we are confident that the C.
elegans community will swiftly come up with additional uses.
While the current model’s performance is generally high, some limitations remain. First,
the provided pre -trained model is inherently domain -bound: performance depends on
imaging modality, magnification, contrast, developmental stage, and how similar worm
appearances are to the training distribution. Deviations from these parameters will require
fine-tuning or retraining. Second, the current architecture does not address overlapping
worms. Other tools are available that address overlaps explicitly 35, and perhaps future
versions of SegBio will incorporate such edge cases. For this initial version, the
experimenters will be required to account for worm positioning during image acquisition.
Alternatively, such edge cases can be removed in the final inference step.
While the model itself is relatively simplistic, a major practical differentiator is the inference
workflow. We are acutely aware that no model is perfect, and occasional errors are
unavoidable. We address this issue by minimizing correction friction. Infe rence is
performed in a completely standalone intuitive application that does not involve any
complex installation procedures, has no dependencies, and does not even require an
existing Python installation. The editing itself is performed on intermediate representations
of the output (boundary/seed layers), followed by a repeat of post -processing, which is
much faster than repeated inference or manual correction of the masks themselves.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Overall, SegBio’s main impact is shortening the path from microscopy images to clean,
per-animal instance masks and measurements while keeping the workflow accessible to
non-expert users. By treating segmentation as an iterative, correctable process, supported
by rapid label generation, separation-aware predictions, and a packaged editor, the tool
aims to make large -scale animal phenotyping more practical and scalable across
experiments.
Methods
Strains and worm maintenance
MIR249 risIs33 [K03A1.5p::3xFLAG::SV40-NLS::dCas9::SV40-NLS::VP64::HA + unc-
119(+)] 36
SJ4005; zcIs4 [hsp-4::GFP] V 37
SJ4100 (zcIs13[hsp-6::GFP]) 38
Worms were grown at 20°C on NGM plates seeded with E. coli.
Molecular biology and RNAi
L4440-zntA::cco-1 plasmid was made from L4440 -cco-1 plasmid (From the C. elegans
RNAi library, gene name F26E4.9 or cox-5b 39), by replacing both flanking T7 promoters
with two 500bp promoters of the E. coli zntA gene from both sides of the cco-1 RNAi. E.
coli HT115 bacteria were transformed with our L4440 -zntA::cco-1 plasmid. For each
experiment a fresh colony was picked and grown in LB for 8h before seeding on NGM
Ampicillin plates containing increasing concentrations of ZnSO 4. The plates were left to
dry for 24 h before the experiment.
SJ4100 (zcIs13[hsp-6::GFP]) worms were bleached to obtain eggs that were placed on
the HT115 seeded NGM plates. Worms were grown to the young adult stage. zntA in E.
coli responds to Zinc in the environment 40. Since we used the zntA promoter to express
RNAi directed to the worm cco-1 gene in the HT115 bacteria, worms were fed with cco-1
RNAi which led to a mitochondrial unfolded protein response stress and expression of
GFP derived from the hsp-6 promoter.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Image acquisition
10-15 Young adult worms were transferred onto a clean agar plate and immobilized with
a drop of 10-4 levamisole.
All training images were acquired with a QImaging QIClick 12 -bit monochrome camera
attached to an Olympus MVX10 binocular, using an Olympus MV PLAPO 2XC lens and
with 2.5x zoom, controlled by QCapture.
The additional test set was acquired with an Olympus IX-83 inverted microscope equipped
with a Photometrics EMCCD camera and a 4x Olympus objective control led by
MicroManager.
Manual annotation for training-label generation
To initialize the segmentation pipeline, we created a simple interactive annotation tool that
enables rapid construction of training labels from raw worm images. The goal of this first
stage is to convert a small amount of manual input into consistent, per-worm binary masks
that can be used to train downstream deep-learning models.
User-guided annotations
To create training masks for the model the user must provide the following:
1. Per-instance centerline (a polyline drawn along the worm’s midbody, Fig 5B)
2. Per-instance width measurement (a short cross-body line at the widest point).
3. A tapering profile of the generic object (denoted in fractions relative to max width).
4. (Optional) Per-instance head location
A typical tapering profile of an adult C. elegans is provided in the code, but the pipeline
can be adapted to any elongated object by a custom profile.
These sparse annotations capture the essential worm geometry while keeping annotation
time low.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Mask extrapolation from sparse annotations (centerline + width)
Each worm mask is synthesized from the user -drawn centerline by expanding outward
around the midline to form a filled body region in accordance with the tapering profile (Fig
X A-F). The tapering profile was obtained once by measuring the width of 20 worms at
multiple points along the body, expressing width as percentage of the maximum width of
each worm, and averaging (Fig 5C). The final profile is slightly asymmetric due to the
different widths in the anterior and posterior regions of the worm
To form the object masks, the centerline is first densified to obtain a smooth sequence of
points, and at each point the local centerline direction is estimated and a short cross -
section perpendicular to the midline is constructed (Fig 5D,E). The length of these cross-
sections is scaled by the user’s width measurement and modulated by the tapering profile,
so the mask gradually narrows toward the head and tail. The union of all cross-sections is
then converted into a single contiguous binary region and lightly regularized (gap-closing)
to yield the final per -worm pixel mask (Fig 5F). This produces anatomically plausible
masks from a single width measurement while still allowing variation across worms via the
user-provided width.
For each image, the software saves the user annotations (centerline and width) together
with the derived outputs (per-worm binary masks and basic geometry summaries). These
outputs serve as the ground-truth labels for training the segmentation model in subsequent
stages.
The annotation tool was originally written as a MATLAB (MathWorks© Inc.) app, and later
rewritten and expanded in Python. Both versions are available at
https://github.com/zaslab/SegBio.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Figure 5. Full instance masks are extrapolated from minimal manual annotations
A) A brightfield image of adult C. elegans nematodes
B) User-provided few-points centerline (red) and max width (blue) of an individual
nematode
C) Tapering profile as a fraction of max width, obtained by averaging width
measurements along the bodies of 20 worms
D) The mask is constructed by extending lines (blue) orthogonally from the centerline
(red), scaled by the max width and tapering profile
E) The centerline is interpolated to densify the mask and cover the entire worm
F) The full mask is obtained by intersecting the orthogonal lines and filling in any
remaining gaps.
Model training pipeline
Training target generation from instance masks
To train the network with supervision that supports both object presence and instance
separation, the manually created per -worm instance mask (integer label image;
Background
= 0, worm IDs = 1. . 𝑁) is converted into three complementary target maps:
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
foreground, boundary, and seed (Fig X D -H) to be used as ground truth during model
training.
Foreground target
A binary foreground map is produced directly from the instance labels providing dense
supervision for “worm vs background” classification.
Boundary target
To encourage separation of touching or nearby worms, a thin boundary band is computed
along the outer rim of each labeled instance (and between neighbors). This rim is
optionally dilated to a user -defined pixel width (boundary_width) to form a thicker
supervision band. The boundary target is restricted to foreground pixels to avoid labeling
Background
edges unrelated to worms.
Seed target
In addition to boundaries, a seed target is generated to provide one compact “core” signal
per worm instance, intended to anchor downstream instance reconstruction.
The default seed method derives a per -instance skeleton, prunes a small fraction from
each end to avoid ambiguous head/tail tips, then slightly thickens the skeleton to form a
learnable region. Seed pixels overlapping the boundary band are removed, and sma ll
spurIoUs components are filtered by a minimum -area criterion to preserve only robust
seed blobs.
As an alternative, seeds can be generated from the distance transform of the foreground
mask, producing a soft interior -confidence map that peaks near the center of worms;
boundary-adjacent pixels are suppressed so the seed emphasizes the worm interior.
The output is a dictionary containing the three aligned targets:
• fg: foreground worm mask
• boundary: boundary band for separation
• seed: per-instance interior anchors (skeleton-based by default)
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Neural network architecture
The model is a parameterized U -Net–style fully convolutional network that predicts the
training targets from input microscopy images. The model is designed to be lightweight,
configurable (depth, channel width, output classes), and stable under small batch sizes to
accommodate less powerful hardware.
Overall structure
The network follows the standard encoder–bottleneck–decoder U-Net topology:
• Encoder (contracting path): repeated blocks of two 3 × 3 convolutions with
normalization and ReLU, followed by 2 × 2 max-pooling for downsampling.
Feature channel count doubles with depth.
• Bottleneck: a double-convolution block at the deepest level with 2 x the channels
of the deepest encoder stage.
• Decoder (expanding path): transposed -convolution upsampling, concatenation
with the corresponding encoder feature map via skip connections, then a double-
convolution block to fuse features.
This symmetric design preserves fine spatial detail (via skips) while still learning high-level
context (via downsampling).
Each convolutional block is a DoubleConv module implementing:
(Conv → Norm → ReLU) × 2,
with 3 × 3 convolutions (padding to preserve spatial size) and Group Normalization. The
number of groups is automatically chosen as the largest value ≤ 8 that divides the channel
count, improving robustness when batch size is small.
A final 1 × 1 convolution maps decoder features to the n_classes output channels. During
upsampling, feature maps are padded when needed so that the upsampled tensor
matches the skip -connection tensor size, ensuring correct concatenation even for odd
image dimensions.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Configurable capacity
The training code is designed to allow for quick and easy adaptation of the model to new
datasets. Users can provide their own annotated images, or annotate images themselves
using the annotation module (i), and retrain the model to better fit the new data. Model
capacity can be adjusted to the complexity of the new task by controlling the following
parameters:
• depth: number of encoder/decoder levels (≥2),
• base_filters: number of channels at the first level, with doubling at each deeper
level,
• n_channels: input channels (e.g., brightfield only or multi-channel),
• n_classes: output channels, enabling additional segmentation targets such as
head regions.
Dataset format and preprocessing
Training samples are stored as per -image folders containing a raw input image (in.mat)
and a corresponding manual annotation mask (out.mat). During loading, inputs and masks
are resized to a fixed 512 × 512 resolution (image interpolation preserves intensity, while
masks are interpolated with nearest -neighbor to preserve binary labels), and image
intensities are normalized to [0,1].
Data augmentation (supported transforms)
Training uses an augmentation module that applies the same geometric transform to the
input image and all target channels and applies photometric transforms to the image only.
Geometric (image + targets, shared per sample):
• Random zoom-out / scale -down with random placement on a canvas (acts like
zoom + random padding/positioning).
• Horizontal flip and vertical flip (each with 0.5 probability).
• Random affine transform consisting of rotation (±rot_deg) and a small -scale jitter
(≈0.95–1.05).
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Photometric (image only):
• Brightness jitter (multiplicative intensity scaling).
• Contrast jitter (contrast change around the per-channel mean).
• Gaussian blur applied with probability blur_p.
Multi-augmentation per batch: the training loop can generate multiple independen t
augmentations of each minibatch and accumulate gradients across them prior to the
optimizer step.
Loss function
Training optimizes a multi-task objective combining Binary Cross Entropy (BCE) and Dice,
with class weights of [1,6,2] for foreground, boundary, and seed, respectively. The final
loss is the weighted sum 𝐵𝐶𝐸 + 0.5 × 𝐷𝑖𝑐𝑒.
Optimization, scheduling, and validation
Models are trained using the AdamW optimizer with weight decay and a step learning-rate
schedule. Training optionally uses mixed precision (AMP) via gradient scaling, and can
optionally use torch.compile for accelerated execution. A held-out validation split is created
once at the beginning , and training logs loss/Dice metrics to a CSV file while saving
periodic checkpoints.
Post processing
At inference time, the network produces three probability maps (foreground, boundary,
seed). These are binarized by thresholds and then cleaned using connected -component
filtering to suppress small fragments and retain only substantial regions. Boundary maps
are additionally thinned to sharpen separation cues.
Instance labels are obtained with a marker -controlled watershed over a distance -based
(or probability -based) surface, constrained to the cleaned foreground region and
discouraged from merging across the predicted boundaries. This converts the three maps
into a final integer label image where each worm is a distinct instance.
After instance generation, labels can be filtered using skeleton -derived morphology: per-
instance skeleton length, width estimate (via medial axis distance), and length/width ratio
are computed and used to remove implausible detections. An optional rule re moves
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
instances touching the image border , avoiding partial objects . Remaining instances are
relabeled into a compact 1. . 𝑁 index set.
Interactive inference and manual correction GUI
Inference is performed with a standalone graphical application that couples the trained
network with an interactive editing workflow. The purpose of this stage is to (i) make
segmentation usable for non -expert users, and (ii) ensure high -quality final labe ls in
difficult frames by allowing fast, local corrections to either the model output or the final
inference masks.
Model execution and initial segmentation
After opening an image, the application runs the network once to generate three outputs:
a foreground probability map, and two auxiliary maps used for instance separation
(boundary and seed). These are thresholded to initialize editable “boundary” and “see d”
layers, and an initial set of instance labels is generated from the three-map representation.
Human-in-the-loop editing
The GUI exposes the network-derived boundary and seed maps as paintable layers. This
design makes edits intuitive:
• Fix merges by painting boundary lines between worms to force separation.
• Fix splits or missing worms by removing extra boundaries or adding seed regions
to missed worms
• Delete false positives with a dedicated button by clicking an unwanted instance
label
• Manually paint instance masks for ultimate control and precision
Edits are lightweight and local, and do not require re-running the neural network.
After edits, the user triggers “Re -segment,” which recomputes the instance labels using
the original predicted foreground together with the edited boundary and seed layers. This
allows rapid iteration: edit → re -segment → inspect, until the instance labels are
satisfactory. A small threshold control is also provided to adjust sensitivity when foreground
is under- or over-inclusive. The GUI exports the final instance label image along with the
edited boundary/seed layers and basic per-instance size summaries.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
A detailed user guide is provided in https://github.com/zaslab/SegBio.
Training Hardware
The entire pipeline was designed to be as lightweight and accessible as possible, without
relying on extensive programming knowledge, or high-end hardware for either training or
inference.
The model was trained on the following hardware:
• Lenovo Legion 5 laptop
• Intel(R) Core(TM) i7-14650HX (2.20 GHz) processor
• 32GB RAM
• Nvidia GeForce RTX 4060 laptop GPU, 8GB VRAM
With a depth of 4, filter count of 32, a batch size of 6, and 5 augmentations per image
training engaged most of the available VRAM. Training time was ~7 seconds per batch.
With a total of 175 training images, a full training run of 100 epochs can be compl eted in
under 5 hours.
Funding. This work was supported by the Israeli Science Foundation (1939/23), the
Jérôme Lejeune Foundation, and the MDBR grant program of the Orphan Disease Center.
AZ is the Greenfield Chair in Neurobiology.
Data availability. All data, analysis scripts, and the entire segBio package is available in
https://github.com/zaslab/SegBio.
Competing interests. The authors declare no competing interests.
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
References
1. Ronneberger, O., Fischer, P . & Brox, T. U-Net: Convolutional Networks for
Biomedical Image Segmentation. in Medical Image Computing and Computer-
Assisted Intervention – MICCAI 2015 (eds Navab, N., Hornegger, J., Wells, W. M. &
Frangi, A. F.) vol. 9351 234–241 (Springer International Publishing, Cham, 2015).
2. Falk, T. et al. U-Net: deep learning for cell counting, detection, and morphometry.
Nat. Methods 16, 67–70 (2019).
3. Galimov, E. & Yakimovich, A. A tandem segmentation-classification approach for the
localization of morphological predictors of C. elegans lifespan and motility. Aging 14,
(2022).
4. He, K., Gkioxari, G., Dollár, P . & Girshick, R. Mask R-CNN. Preprint at
https://doi.org/10.48550/arXiv.1703.06870 (2018).
5. Dong, B. & Chen, W. A high precision method of segmenting complex postures in
Caenorhabditis elegans and deep phenotyping to analyze lifespan. Sci. Rep. 15,
8870 (2025).
6. Liu, X. et al. Automated C. elegans behavior analysis via deep learning-based
detection and tracking. PLOS Comput. Biol. 21, e1013707 (2025).
7. Schmidt, U., Weigert, M., Broaddus, C. & Myers, G. Cell Detection with Star-Convex
Polygons. in Medical Image Computing and Computer Assisted Intervention –
MICCAI 2018 (eds Frangi, A. F., Schnabel, J. A., Davatzikos, C., Alberola-López, C.
& Fichtinger, G.) vol. 11071 265–273 (Springer International Publishing, Cham,
2018).
8. Mais, L., Hirsch, P . & Kainmueller, D. PatchPerPix for Instance Segmentation. in
Computer Vision – ECCV 2020 (eds Vedaldi, A., Bischof, H., Brox, T. & Frahm, J.-
M.) vol. 12370 288–304 (Springer International Publishing, Cham, 2020).
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
9. Deserno, M. & Bozek, K. WormSwin: Instance segmentation of C. elegans using
vision transformer. Sci. Rep. 13, 11021 (2023).
10. Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist
algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).
11. Cutler, K. J. et al. Omnipose: a high-precision morphology-independent solution for
bacterial cell segmentation. Nat. Methods 19, 1438–1448 (2022).
12. Vincent, L. & Soille, P. Watersheds in digital spaces: an efficient algorithm based on
immersion simulations. IEEE Trans. Pattern Anal. Mach. Intell. 13, 583–598 (1991).
13. Naylor, P., Laé, M., Reyal, F. & Walter, T. Segmentation of Nuclei in Histopathology
Images by Deep Regression of the Distance Map. IEEE Trans. Med. Imaging 38,
448–459 (2019).
14. Xie, L., Qi, J., Pan, L. & Wali, S. Integrating deep convolutional neural networks with
marker-controlled watershed for overlapping nuclei segmentation in histopathology
images. Neurocomputing 376, 166–179 (2020).
15. Shah, P., Bao, Z. & Zaidel-Bar, R. Visualizing and quantifying molecular and cellular
processes in Caenorhabditis elegans using light microscopy. Genetics 221, iyac068
(2022).
16. Wählby, C. et al. An image analysis toolbox for high-throughput C. elegans assays.
Nat. Methods 9, 714–716 (2012).
17. Weheliye, W. H. et al. A neural network model enables worm tracking in challenging
conditions and increases signal-to-noise ratio in phenotypic screens. PLOS Comput.
Biol. 21, e1013345 (2025).
18. Sydney Brenner. The Genetics of Caenorhabditis elegans. Genetics 71–94 (1974)
doi:10.1093/genetics/77.1.71.
19. Barlow, I. L. et al. Megapixel camera arrays enable high-resolution animal tracking in
multiwell plates. Commun. Biol. 5, 253 (2022).
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
20. Ambros, V. R. et al. From nematode to Nobel: How community-shared resources
fueled the rise of Caenorhabditis elegans as a research organism. Proc. Natl. Acad.
Sci. 122, e2522808122 (2025).
21. Min, H., Park, G. & Lee, S.-J. V. Brief guide to Caenorhabditis elegans imaging and
quantification. Mol. Cells 48, 100249 (2025).
22. Ji, H., Dian, Chen, & Fang-Yen, Christopher. Automated multimodal imaging of
Caenorhabditis elegans behavior in multi-well plates. Genetics 228, (2024).
23. Baek, J.-H., Cosman, P ., Feng, Z., Silver, J. & Schafer, W. R. Using machine vision
to analyze and classify Caenorhabditis elegans behavioral phenotypes quantitatively.
J. Neurosci. Methods 118, 9–21 (2002).
24. Feng, Z., Cronin, C. J., Wittig, J. H., Sternberg, P . W. & Schafer, W. R. An imaging
system for standardized quantitative analysis of C. elegans behavior. BMC
Bioinformatics 5, 115 (2004).
25. Stephens, G. J., Johnson-Kerner, B., Bialek, W. & Ryu, W. S. Dimensionality and
Dynamics in the Behavior of C. elegans. PLOS Comput. Biol. 4, e1000028 (2008).
26. Itskovits, E., Levine, A., Cohen, E. & Zaslaver, A. A multi-animal tracker for studying
complex behaviors. BMC Biol. 15, 29 (2017).
27. Ljosa, V., Sokolnicki, K. L. & Carpenter, A. E. Annotated high-throughput microscopy
image sets for validation. Nat. Methods 9, 637–637 (2012).
28. Escobar-Benavides, S., García-Garví, A., Layana-Castro, P . E. & Sánchez-
Salmerón, A.-J. Towards generalization for Caenorhabditis elegans detection.
Comput. Struct. Biotechnol. J. 21, 4914–4922 (2023).
29. Venkatachalam, V. et al. Pan-neuronal imaging in roaming Caenorhabditis elegans.
Proc. Natl. Acad. Sci. 113, (2016).
30. Yu, X. et al. Fast deep neural correspondence for tracking and identifying neurons in
C. elegans using semi-synthetic training. eLife 10, e66410 (2021).
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
31. Park, C. F. et al. Automated neuron tracking inside moving and deforming C. elegans
using deep learning and targeted augmentation. Nat. Methods 21, 142–149 (2024).
32. Sprague, D. Y. et al. Unifying community whole-brain imaging datasets enables
robust neuron identification and reveals determinants of neuron position in C.
elegans. Cell Rep. Methods 5, 100964 (2025).
33. Khare, P . et al. Size dependent toxicity of zinc oxide nano-particles in soil nematode
Caenorhabditis elegans. Nanotoxicology 9, 423–432 (2015).
34. Moyson, S., Town, R. M., Vissenberg, K. & Blust, R. The effect of metal mixture
composition on toxicity to C. elegans at individual and population levels. PLOS ONE
14, e0218929 (2019).
35. Castro, P. E. L. et al. SegElegans: Instance segmentation using dual convolutional
recurrent neural network decoder in Caenorhabditis elegans microscopic images.
Comput. Biol. Med. 190, 110012 (2025).
36. Fischer, F. et al. Ingestion of single guide RNAs induces gene overexpression and
extends lifespan in Caenorhabditis elegans via CRISPR activation. J. Biol. Chem.
298, 102085 (2022).
37. Kapulkin, V., Hiester, B. G. & Link, C. D. Compensatory regulation among ER
chaperones in C. elegans. FEBS Lett. 579, 3063–3068 (2005).
38. Yoneda, T. et al. Compartment-specific perturbation of protein handling activates
genes encoding mitochondrial chaperones. J. Cell Sci. 117, 4055–4066 (2004).
39. Kamath, R. S. et al. Systematic functional analysis of the Caenorhabditis elegans
genome using RNAi. Nature 421, 231–237 (2003).
40. Durieux, J., Wolff, S. & Dillin, A. The Cell-Non-Autonomous Nature of Electron
Transport Chain-Mediated Longevity. Cell 144, 79–91 (2011).
.CC-BY-NC 4.0 International licenseavailable under a
was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 6, 2026. ; https://doi.org/10.64898/2026.04.03.716031doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.