MOSAIC: A Structured Multi-level Framework for Probabilistic and Interpretable Cell-type Annotation

preprint OA: closed
Full text JSON View at publisher
Full text 2,287 characters · extracted from oa-doi-fallback · click to expand
Abstract Accurate cell-type annotation is a foundational task in single-cell RNA sequencing analysis, yet remains fundamentally challenged by cellular heterogeneity, gradual lineage transitions, and technical noise. As single-cell atlases expand in scale and resolution, most existing annotation approaches operate at a single analytical level and encode cell identity as fixed categorical labels, limiting their ability to represent uncertainty, mixed biological states, and population-level structure. Here we introduce MOSAIC (Multi-level prObabilistic and Structured Adaptive IdentifiCation), a structured multi-level annotation framework that integrates cell-level marker evidence with cluster-level population context within a unified probabilistic system. Rather than treating annotation as an independent per-cell prediction task, MOSAIC formulates cell-type assignment as a coordinated multi-level inference process, in which probabilistic evidence at the single-cell level is aggregated, constrained, and refined by population context. MOSAIC integrates direction-aware marker scoring with dual-layer probabilistic representation and adaptive cross-level refinement, enabling uncertainty to be quantified and propagated across biological scales. This design yields coherent annotations that preserve fine-grained single-cell variation while maintaining population-level consistency, and allows ambiguous or transitional states to be represented explicitly rather than collapsed into hard labels. Across six diverse tissues and under controlled dropout perturbations, MOSAIC consistently matches or outperforms representative marker-based, reference-based, and machine-learning annotation methods. Beyond accuracy, MOSAIC provides structured uncertainty estimates and coherent population-level structure, enabling the identification of stable intermediate cell states that arise from gradual lineage transitions rather than technical noise. Together, MOSAIC advances cell-type annotation from a single-level classification task to a structured multi-level inference problem, and establishes a general, interpretable, and uncertainty-aware computational framework for large-scale single-cell analysis. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00