Discussion
Here, we investigated the role of ACC area 24 in resolving the CPP as freely -moving marmosets
engaged in natural vocal communication in a noisy environment. We found that single neurons
selectively responded to partner vocalizations or colony calls, effectively parsing different categories
of conspecific speakers in the acoustic scene (Fig. 1) . Critically, neurons selective for partner calls
responded invariantly when calls overlapped acoustically with other sounds, showing that neural
mechanisms to distinguish meaningful voices from other interfering voices, are present in marmoset
ACC area 24 (Fig. 2). These findings suggests that ACC may be a keystone neural substrate for resolving
the CPP, highlighting the significance of this process for mediating social interactions in natural scenes,
rather than solely a challenge for audition.
As a midline forebrain area conserved across all mammals 33 that is both anatomically connected to
fronto-lateral34 and auditor y cortex 35, and modulated by attention 27–29, it may perhaps not be
surprising that the ACC supports resolution of the CPP. Although not classically considered part of the
language network , results here show that ACC neurons are involved in a range of communicative
functions, including representing each of the call types produced or perceived in vocal exchanges (Fig.
1, 3). Recent marmoset fMRI studies have indeed implicate d ACC in vocalization processing 9,25,
suggesting strong interactions with the broader communication network. Moreover, if we view the
CPP as a non-literal language comprehension problem, resolving it requires listeners to recognize each
speaker’s voice. Afferent connections from hippocampus, which robustly encodes identity in social
signals32, to area 24 35, could provide this necessary input. Taken together, these findings and the
underlying anatomical connectivity suggest that ACC area 24 plays a more central role in resolving the
CPP than previously thought, potentially emphasizing that the CPP is not merely a challenge for
audition but one of sociality.
Marmosets engage in vocal turn-taking12,14,36,37. In the noisy colony setting, we observed differences in
the relative timing of the call types produced in these social exchanges in the presence of background
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
noise. Whereas high amplitude, long-distance phee calls were produced in synchrony with other phee
calls from the colony, trill calls – a call used when in close visual contact - were mostly produced during
periods of silence in the colony. This shows that marmosets monitor the acoustic landscape to optimize
signaling efficacy during these communicative exchanges38–41. Moreover, ACC appears to play a role in
this process, as the social context in which these calls were produced and perceived could be reliably
decoded from neural activity alone (Fig. 4). In parallel, fMRI findings in humans point out that ACC is
activated by listening to conversations rather than monologues 42, and more generally that ACC key
role in theory of mind is integral to natural language processing43–46, and language production, as ACC
lesions in humans produce akinetic mutism 47. Finally, ACC function s for social monitoring48 and
exploration49 make it a logical node to be recruited for processing the CPP.
By capturing neural activity during natural vocal exchanges of freely-moving primates in rich acoustic
environments, we discovered that the anterior cingulate cortex area 24— a region long overlooked in
this domain— implements core computations necessary for resolving the CPP. Certainly, ACC is not
solely responsible for resolving the CPP, but results here do suggest a likely central role of this neural
substrate for integrating the requisite processes necessary for this foundational auditory process.
Moreover, these results emphasize that CPP computations extend beyond sound parsing to
orchestrating social communication in noise by integrating perception, action, and co ntext,
highlighting the role of this area in mediating social communication. This work showcases the
transformative potential of naturalistic neuroscience and points to a broader reevaluation of how and
where the brain solves real-world cognitive challenges
13.
References
1. McDermott, J. H. The cocktail party problem. Curr. Biol. CB 19, R1024-1027 (2009).
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
2. Bee, M. A. & Micheyl, C. The cocktail party problem: what is it? How can it be solved? And why
should animal behaviorists study it? J. Comp. Psychol. Wash. DC 1983 122, 235–251 (2008).
3. Mesgarani, N. & Chang, E. F. Selective cortical representation of attended speaker in multi-talker
speech perception. Nature 485, 233–236 (2012).
4. O’Sullivan, J. et al. Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech
Perception. Neuron 104, 1195-1209.e3 (2019).
5. Nelken, I., Bizley, J., Shamma, S. A. & Wang, X. Auditory cortical processing in real-world
listening: the auditory system going real. J. Neurosci. Off. J. Soc. Neurosci. 34, 15135–15138
(2014).
6. Joshi, N. et al. Temporal coherence shapes cortical responses to speech mixtures in a ferret
cocktail party. Commun. Biol. 7, 1392 (2024).
7. Klein, J. T., Shepherd, S. V. & Platt, M. L. Social attention and the brain. Curr. Biol. CB 19, R958-
962 (2009).
8. Jürgens, U. Neural pathways underlying vocal control. Neurosci. Biobehav. Rev. 26, 235–258
(2002).
9. Jafari, A. et al. A vocalization-processing network in marmosets. Cell Rep. 42, 112526 (2023).
10. Putnam, P. T. & Chang, S. W. C. Social processing by the primate medial frontal cortex. Int. Rev.
Neurobiol. 158, 213–248 (2021).
11. Nieder, A. & Mooney, R. The neurobiology of innate, volitional and learned vocalizations in
mammals and birds. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 375, 20190054 (2020).
12. Burkart, J. M. et al. A convergent interaction engine: vocal communication among marmoset
monkeys. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 377, 20210098 (2022).
13. Miller, C. T. et al. Natural behavior is the language of the brain. Curr. Biol. CB 32, R482–R493
(2022).
14. Bezerra, B. M. & Souto, A. Structure and Usage of the Vocal Repertoire of Callithrix jacchus. Int. J.
Primatol. 29, 671–701 (2008).
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
15. Digby, L. J. & Barreto, C. E. Social organization in a wild population of Callithrix jacchus. I. Group
composition and dynamics. Folia Primatol. Int. J. Primatol. 61, 123–134 (1993).
16. Lazaro-Perea, C. Intergroup interactions in wild common marmosets, Callithrix jacchus: territorial
defence and assessment of neighbours. Anim. Behav. 62, 11–21 (2001).
17. Miller, C. T. et al. Marmosets: A Neuroscientific Model of Human Social Behavior. Neuron 90,
219–233 (2016).
18. Burkart, J. M. & van Schaik, C. P. Marmoset prosociality is intentional. Anim. Cogn. 23, 581–594
(2020).
19. Vitale, A., Zanzoni, M., Queyras, A. & Chiarotti, F. Degree of social contact affects the emission of
food calls in the common marmoset (Callithrix jacchus). Am. J. Primatol. 59, 21–28 (2003).
20. Landman, R. et al. Close-range vocal interaction in the common marmoset (Callithrix jacchus).
PloS One 15, e0227392 (2020).
21. Robinson, B. W. Vocalization evoked from forebrain in Macaca mulatta. Physiol. Behav. 2, 345–
354 (1967).
22. Sperli, F., Spinelli, L., Pollo, C. & Seeck, M. Contralateral smile and laughter, but no mirth,
induced by electrical stimulation of the cingulate cortex. Epilepsia 47, 440–443 (2006).
23. Gavrilov, N., Hage, S. R. & Nieder, A. Functional Specialization of the Primate Frontal Lobe during
Cognitive Control of Vocalizations. Cell Rep. 21, 2393–2406 (2017).
24. West, R. A. & Larson, C. R. Neurons of the anterior mesial cortex related to faciovocal activity in
the awake monkey. J. Neurophysiol. 74, 1856–1869 (1995).
25. Dureux, A., Zanini, A., Trapeau, R., Belin, P. & Everling, S. Functional organization of voice patches
in marmosets and cross-species comparisons with macaques and humans. Curr. Biol. CB S0960-
9822(25)00874–7 (2025) doi:10.1016/j.cub.2025.07.008.
26. Bregman, A. S. Auditory scene analysis: Hearing in complex environments. (1993).
27. Petersen, S. E. & Posner, M. I. The attention system of the human brain: 20 years after. Annu.
Rev. Neurosci. 35, 73–89 (2012).
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
28. Schneider, K. N., Sciarillo, X. A., Nudelman, J. L., Cheer, J. F. & Roesch, M. R. Anterior Cingulate
Cortex Signals Attention in a Social Paradigm that Manipulates Reward and Shock. Curr. Biol. CB
30, 3724-3735.e2 (2020).
29. Benedict, R. H. B. et al. Covert auditory attention generates activation in the rostral/dorsal
anterior cingulate cortex. J. Cogn. Neurosci. 14, 637–645 (2002).
30. Li, J., Aoi, M. C. & Miller, C. T. Representing the dynamics of natural marmoset vocal behaviors in
frontal cortex. Neuron 112, 3542-3550.e3 (2024).
31. Miller, C. T., Mandel, K. & Wang, X. The communicative content of the common marmoset phee
call during antiphonal calling. Am. J. Primatol. 72, 974–980 (2010).
32. Tyree, T. J., Metke, M. & Miller, C. T. Cross-modal representation of identity in the primate
hippocampus. Science 382, 417–423 (2023).
33. Burgos-Robles, A., Gothard, K. M., Monfils, M. H., Morozov, A. & Vicentic, A. Conserved features
of anterior cingulate networks support observational learning across species. Neurosci. Biobehav.
Rev. 107, 215–228 (2019).
34. Ducret, M. et al. Medial to lateral frontal functional connectivity mapping reveals the
organization of cingulate cortex. Cereb. Cortex N. Y. N 1991 34, bhae322 (2024).
35. Vogt, B. A. & Pandya, D. N. Cingulate cortex of the rhesus monkey: II. Cortical afferents. J. Comp.
Neurol. 262, 271–289 (1987).
36. Bosshard, A. B. et al. Beyond bigrams: call sequencing in the common marmoset (Callithrix
jacchus) vocal system. R. Soc. Open Sci. 11, 240218 (2024).
37. Oren, G. et al. Vocal labeling of others by nonhuman primates. Science 385, 996–1003 (2024).
38. Eliades, S. J. & Wang, X. Neural correlates of the lombard effect in primate auditory cortex. J.
Neurosci. Off. J. Soc. Neurosci. 32, 10737–10748 (2012).
39. Tsunada, J. & Eliades, S. J. Frontal-auditory cortical interactions and sensory prediction during
vocal production in marmoset monkeys. Curr. Biol. CB S0960-9822(25)00393–8 (2025)
doi:10.1016/j.cub.2025.03.077.
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
40. Löschner, J., Pomberger, T. & Hage, S. R. Marmoset monkeys use different avoidance strategies
to cope with ambient noise during vocal behavior. iScience 26, 106219 (2023).
41. Löschner, J. & Hage, S. R. Sound amongst the din: primate strategies against noise. Trends Cogn.
Sci. 29, 111–113 (2025).
42. Olson, H. A., Chen, E. M., Lydic, K. O. & Saxe, R. R. Left-Hemisphere Cortical Language Regions
Respond Equally to Observed Dialogue and Monologue. Neurobiol. Lang. Camb. Mass 4, 575–610
(2023).
43. Fedorenko, E., Ivanova, A. A. & Regev, T. I. The language network as a natural kind within the
broader landscape of the human brain. Nat. Rev. Neurosci. 25, 289–312 (2024).
44. Ferstl, E. C. & von Cramon, D. Y. What does the frontomedian cortex contribute to language
processing: coherence or theory of mind? NeuroImage 17, 1599–1612 (2002).
45. Amodio, D. M. & Frith, C. D. Meeting of minds: the medial frontal cortex and social cognition.
Nat. Rev. Neurosci. 7, 268–277 (2006).
46. Wittmann, M. K., Lockwood, P. L. & Rushworth, M. F. S. Neural Mechanisms of Social Cognition
in Primates. Annu. Rev. Neurosci. 41, 99–118 (2018).
47. Barris, R. W. & Schuman, H. R. [Bilateral anterior cingulate gyrus lesions; syndrome of the
anterior cingulate gyri]. Neurology 3, 44–52 (1953).
48. Clairis, N. & Lopez-Persem, A. Debates on the dorsomedial prefrontal/dorsal anterior cingulate
cortex: insights for future research. Brain J. Neurol. 146, 4826–4844 (2023).
49. Kolling, N., Behrens, T. E., Mars, R. B. & Rushworth, M. F. Neural Mechanisms of Foraging.
Science 336, 95 (2012).
50. Paxinos, G., Watson, C., Petrides, M., Rosa, M. & Tokuno, H. The Marmoset Brain in Stereotaxic
Coordinates. (Elsevier Academic Press, 2012).
51. McMahon, D. B. T., Bondar, I. V., Afuwape, O. A. T., Ide, D. C. & Leopold, D. A. One month in the
life of a neuron: longitudinal single-unit electrophysiology in the monkey visual system. J.
Neurophysiol. 112, 1748–1762 (2014).
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
52. Oikarinen, T. et al. Deep convolutional network for animal sound classification and source
attribution using dual audio recordings. J. Acoust. Soc. Am. 145, 654 (2019).
53. Pachitariu, M., Steinmetz, N., Kadir, S., Carandini, M. & D, H. K. Kilosort: realtime spike-sorting
for extracellular electrophysiology with hundreds of channels. 061481 Preprint at
https://doi.org/10.1101/061481 (2016).
54. Tyree, T. J. Applications of Mathematical Physics to Quantitative Biology. (2023).
55. Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. in 785–794 (2016).
doi:10.1145/2939672.2939785.
56. Jovanovic, V., Fishbein, A. R., de la Mothe, L., Lee, K.-F. & Miller, C. T. Behavioral context affects
social signal representations within single primate prefrontal cortex neurons. Neuron S0896-
6273(22)00059–9 (2022) doi:10.1016/j.neuron.2022.01.020.
Figure legends
Figure 1. Segregated ACC neuronal populations encode vocalizations perceived from the partner and
the colony. a) Schema of the experimental paradigm. Top – recordings took place directly in the home
cage and transfer box in the colony room. Center - both subject and partner monkeys were equipped
with a portable microphone, and the implanted subject was also equipped with a neurologger. Bottom
– Neurons were recorded with brush arrays or Neuropixel probes implanted in ACC area 24. b) Coronal
plane (AP = +12mm) at whic h probes were implanted. c) Percentage of neurons responding to
vocalization production and perception. d) Vocalizations produced by the partner or other marmosets
in the colony (vs produced ones) were analyzed separately. e) Normalized average increase fr om
neurons with significantly upregulated activity during partner call perception. f) Normalized average
increase from neurons with significantly upregulated activity during colony call perception. g-h)
Percentage accuracy of the neuronal multiclass decoder for partner g) and colony (h) perceived call
type for real data and data with shuffled labels. i ) Example neuron responding specifically to twitter
calls from the partner but not from the colony. j) Example neuron responding specifically to phee calls
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
from the colony but not from the partner. k) Percentage accuracy of the neuronal SVM decoder for
identity (partner vs colony) for real data and data with shuffled labels. *: p < 0.05, error bars indicate
s.e.m, (standard error to the mean).
Figure 2. ACC neurons respond invariantly to calls with and without overlapping vocalizations. a)
Example spectrograms from the colony and partner microphones showing the occurrence of a call
from the partner (twitter) that is perceived simultaneously with a call from t he colony (phee). b)
Example spectrograms from the colony and partner microphones showing the occurrence of a call
from the partner (twitter) that is perceived sequentially to calls from the colony (twitters). c) Example
neurons responding to twitter calls from the partner that are unaffected by the presence (or absence)
of simultaneous calls from the colony. d) Normalized average increase from neurons with significantly
upregulated activity during partner twitter call perception is unaffected by the presence (or absence)
of simultaneous calls from the colony. *: p < 0.05, error bars indicate s.e.m.
Figure 3. Marmosets Monitor Social Scenes to Optimize Vocal Turn Taking in a Cocktail Party. a)
Example spectrograms showing trill to trill conversations betwee n the subject and its partner. b)
Example spectrograms showing phee to phee conversations between the subject and other monkeys
in the colony. c) Normalized partner call rate for trills (purple) and phees (orange) aligned on subject’s
trills and phees onse ts. d) Colony call rate for phees aligned on subject’s trills (purple) and phees
(orange).
Figure 4. ACC neurons encode active turn taking with the partner and the colony. a) Normalized
average increase from neurons with significantly upregulated activity during call production. b)
Example neuron responding to the production of several call types. c) Normalized average decrease
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
from neurons with significantly downregulated activity during call production. d) Percentage accuracy
of the neuronal multiclass decoder for produced call type for real data and data with shuffled labels.
e-f) Receiver Operator Characteristic (ROC) traces for our neural decoder trained on predictive time
bins representing whether trills answered trills with the subject’s partner in panel e, the answering of
phee calls with the colony in panel f. Blue solid lines represent calls initially produced by the subject
while orange solid lines represent calls initially perceived by the subject. Random chance is
represented by the red dashed line as a guide. The area under the curve (AUC) quantifies the success
of the neural decoder as our outcome measure. *: p < 0.05, error bars indicate s.e.m.
Methods
Animals
Four adult marmosets (Callithrix jacchus) living in bonded pairs (1 female and 1 male per cage) for at
least 3 months were used for this experiment. All pairs were housed in a room with other marmoset
cages including two adjacent cages with visual separators (between 41 and 47 marmosets in total).
Three of the marmosets, monkey S (female), G (female) and L (male) were implanted with electrodes
while monkey K (male) was only used for behavior. Animals had unrestricted access to food and water.
All experiments were approved by the UCSD Institutional Animal Care and Use Committee and were
performed in the Cortical Systems and Behavior Laboratory at University of California San Diego
(UCSD).
Behavioral paradigm
Marmosets were first habituated to be handled and taken out in transport boxes by the experimenters
for a month. Once habituated to handling (measured by assessing the comfort of animals while being
handled and taking treats from the experimenters’ hands), we gradually habituated them over one or
two months to wear the leather harness containing the microphone until they spent less than 20% of
the time chewing on it. Experimental sessions lasted between 1 hour to 1.5 hours. We recorded only
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
one pair of monkeys per session. During a session, the pair was taken out of their home cage in
transport boxes and brought to an adjacent experimental room for preparation. We then equipped
both the subject and the partner monkey with the custom -made leather harness contain ing a
microphone (micro voice recorder, Spycentre Security®). Additionally, we connected a neurologger to
the subject’s head-cap. Both monkeys were brought back to the colony room. The partner was placed
back in its home-cage and the subject was kept in a transparent transport box placed at the entrance
of the home-cage (Fig 1a, Fig S2a) giving the pair visual and auditory access to each other. To record
colony calls, we used an omnidirectional microphone (H3-VR, Zoom®) placed in the middle of the room.
This paradigm allowed us to record acoustic interactions between the subject and its partner, and
between the subject and the colony. In marmosets, which are capable of turn -taking, these
interactions involve the coordinated exchange of vocal signals between individuals, characterized by
social coordination and minimal acoustic overlap, allowing speakers to alternate the timing of their
calls within a shared acoustic space.
Neurophysiological procedures
To target the desired brain regions for electrophysiological recording using external probes, the brain
anatomy of the subjects was imaged using a 3T MRI (Siemens MAGNETOM Prisma) scanner and a
human knee coil ( Siemens/QED Transmit/Receive 15 channels ). T2 and T1 weighted images of the
whole marmoset brain were obtained in an anaesthetized animal placed in a custom non-magnetic 3D
printed stereotax with earbars and bite plate. We adapted the coordinates from the Paxinos atlas50 of
the marmoset brain based on individual brain size. We targeted the left hemisphere area 24 (Site 1:
AP +12mm, ML +1mm, Site 2: AP +13.5mm, ML +1mm). On the day of surgery, monkeys were intubated
and maintained under isoflurane anesthesia, then placed in a surgical stereotaxic frame (Kopf). We
performed a skin incision and a craniotomy (1.5mm diameter), then inserted a ground electrode
between the dura and the skull before covering the skull with metab ond dental cement ( C&B
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Metabond®). We employed 2 different strategies. For monkey S and G, we used micro wire brush
arrays (Microprobe Maryland, USA, Monkey S:1x64 channels, Monkey G: 2x32 channels with 2mm
space in between) with microdrives (custom made f rom 51, for Monkey G, an additional hole was
pierced and the drive adapted to hold 2 micro brush arrays). The electrode guide tube(s) (Microlumen,
High performance medical tubing, 310-I.5 PTFE), containing a 24G needle, attached to the microdrive
was/were inserted straight into the brain at a depth of 1.5mm. The needle(s) was/were retracted and
the electrode(s) array(s) were lowered in to the guide tube(s) until they protruded out of the guide
tube by 1mm. Kwik Sil (World Precision Instrument ) was used to fill the craniotomy and seal the gap
at the top of the guide tube(s). Then the Microdrive and the omnetics connector were cemented to
the metabond using dental acrylic, after soldering the ground electrode to the array ground wire.
Finally, the skin was glued to the metabond using Vetbond ( 3M). For monkey L, we employed
Spikegadget wireless neuropixel system. The procedure was similar, except that a durotomy was
performed and the 2 probes (Npx 1.0 and Npx NHP 1.0 10mm) inserted 5mm deep from the cortical
surface before we covered them with kiwk- Sil and fixed them directly to the skull with metabond.
Probes were then connected to the Spikegadget head cap that was built around the electrodes and
cemented to the metabond. After the surgery, monkeys received analgesic treatment for 3 days.
Recording sessions started 2 weeks after the surgery for monkey S and G and were performed twic e
per week, and started 4 days after surgery and happen ed every day for monkey L (monkey S: 21
sessions, monkey G: 19 sessions, Monkey L: 6 sessions). Implantation sites were verified by structural
MRI for monkey G and S (Fig S1b), but this could not be done for monkey L.
To wirelessly record neurons, we used a 64 channels Neurologger (Deuteron Technologies) for monkey
S and G with a sample rate of 32 KHz, and Neuropixels Datalogger Headstage (SpikeGadgets) with a
sample rate of 30 KHz for monkey L.
Data preprocessing
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Vocalizations: To automatically label subjects’ and partners’ vocalizations, we retrained the algorithm
described in 20 on the data recorded from our 4 monkeys with the wearable microphones. This neural
network52 identifies the call type and determine the source (subject or partner) by comparing the
amplitude. If the amplitude is equal in both microphones, the call is considered to be coming from the
colony. Following this step, all audio files were manually corrected to remove eventual mislabeling by
the algorithm and determine precisely (<20ms) each call onset and offset. We considered only 4 call
types: Trill, Twitter, Phee and Chirp. Other calls (Tsik, Ek, Chatter, TrillPhee , Whistle) had too few
occurrences to be analyzed. Additionally, to investigate the cocktail party effect, Twitter calls coming
from the partner were labeled as “simultaneous” or “sequential” depending on if a call from the colony
was overlapping or not with it.
For calls coming from the colony (recorded with the omnidirectional microphone), a first pass was
done with ACDC neural network (https://github.com/mineraldragon/ACDC_2022) and then corrected
manually. We only labeled Twitter and Phee calls from the colony, as other call types such as Trill and
Chirp were not loud enough to be recorded consistently, or had too few occurrences of them (eg Tsik,
Chatter, etc.). Note that for technical reasons, colony calls were not always recorded (34 out of 46
sessions with colony recordings).
We removed all calls that were happening less than 2 seconds after another call from the same source,
to ensure that neuronal r esponses analyzed were not influenced by a recent prior call. This was the
case for all analyses except for PSTH of phee calls perceived from the partner (Fig 1f), due to low
number of calls. Thus, Fig 1f phee call data may contain residual activity from successive calls.
Finally, we removed calls that occurred when the neural signal was poor due to artifacts, defined as
calls for which 20% of the neural signal (from 1sec before to 1 sec after the call onset) was labeled as
artifact (see neural data).
Neural data: All neural data were band pass filtered ( 300-7000Hz). Signal from brush arrays (but not
Neuropixel) contained artifacts , that we re removed by sett ing signal value to 0 if the standard
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
deviation of one channel was above a threshold (arbitrarily set to 100 after visual inspection of signal).
If more than 20% of a session was removed, the session was not included in our analyses. W e then
used Kilosort 2 53 to performed automatically the spike sorting for both micro brush arrays and
Neuropixels data. Each identified unit was manually checked and validated and labeled as single or
multi-unit or rejected based on the waveform shape and auto -correlogram using Phy (open -source
Python library for spike visualization and curation). Only units with a minimum firing rate of 0.5Hz were
considered. Across the 3 monkeys (Monkey S: units, Monkey G: 574 units, Monkey L: 794 units) a total
of 1599 units were identified, of which 932 were well isolated single units (using the PCA). However,
because we did not see differences between single and multi-units in later analyses, we decided to pull
them together and to refer to them (single and multi-neurons) as neurons throughout this article. All
PSTH in this article are showing results from well isolated single units.
Synchronization: All data streams (3 microphones and neurologger) were synchronized using 250ms
audio tone pulses played via a buzzer generated by an ESP32 microcontroller programmed using
Arduino IDE.
All .wave files were manually aligned with Audacity® (v 3.4). Two tones (one at the start of the session
and another at the end) were used to bookend the whole session which allowed us to identify and
correct the temporal drift of each microphone in Audacity using the “change speed” functionality.
Audacity was able to correct the temporal drift in each microphone to less than 20ms. Following this,
a MATLAB script was used to randomly remove data points until the temporal time drift was reduced
to <1ms. This did not influence the labeling of calls by the artificial neural network.
A 100 mv TTL pulse (100 ms) was sent to the Neurologger using the same ESP32 microcontroller neuron
at the beginning of the session. The time difference between the Neurologger TTL and first acoustic
pulse was used to align the timing of vocalizations onset and offset to the neurologger time. Using this
approach, we were able to synchronize different sys tems with varying sampling frequencies to a
common master clock.
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Statistical analyses
PSTH responses: Firing rate was computed over 100ms bins. To test neurons’ response to produced or
perceived calls we first employed a peak detection approach. For each call type, if there were at least
5 calls, we calculated the mean firing rate between -5s to -2s before call onset for produced calls and
between -5sec to 0sec before call onset for perceived calls as a baseline . We then searched for the
peak response between -2s and +1s around call onset for produced calls and between 0sec and 3sec
after call onset for calls perceived. To ensure a minimum peak width, we then tested if the bins around
the peak bin reached at least 50% of the peak value (thresho ld = baseline + - 0.5*(peak-baseline)) to
have at least 3 consecutive bins (300ms) beyond this threshold. Finally, we tested if the peak values
were significantly higher than the baseline values using the non -parametric paired Wilcoxon signed
rank test. We used this approach to allow for time flexibility because we observed that neurons
responded at various time windows (Fig S4, 6). Note that both maximum and minimum peak responses
were computed to look for excited and inhibited neurons. Each neuron was tested for each produced
and perceived call types.
Finally, the demonstrational PSTH traces reported in Figure S7 were first computed at a bin width of
0.3 seconds, then up -sampled to a bin width of 0.1 seconds using linear interpolation that assigned
knots to the mean time of any given bin, and finally smoothed with a sliding window with a duration
of 2 seconds using a Savitzky -Golay filtration. Uncertainty of these demonstrational PSTH traces
indicates 95% confidence estimated via bootstrap and is shown by the relatively horizontal, yet curved
shaded regions in Figure S7.
Decoding: To decode call type or caller category, we employed a multiclass decoder (error-correcting
output codes model, MATLAB function fitcecoc) with a one-versus-all coding design. For each session,
the input was the average firing rate of each neuron between -2s and +1s around call onset for
produced calls and between 0sec and 3sec after call onset for calls perceived. The number of calls of
each type ( Trill, Twitter, Phee or Chirp) or caller (partner or colony) was equalized by subsampling
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
categories with more calls down to the category that had the least. Only calls that had at least 10
occurrences were considered, leading to different number of categories and therefore chance levels
for each session (25%, 33% or 50%). The decoder was trained on 80% of the data and tested on the
remaining 20%. To control for random effects, we also decoded on neural data with shuffled labels .
This procedure was repeated 100 times to obtain a representative average decoding value for the data
and the shuffle. Finally, we used Wilcoxon sign rank test to compare the average decoding accuracies
obtained on the real and the shuffled data.
Time bins: To quantify the neural representations that identified whether a Trill with the subject’s
partner was answered or whether a Phee with the subject’s colony was answered, we used a predictive
time bin analysis similar to previous work in the lab 32,54. In this study, we applied the same four -fold
stratified cross-validation both to detect predictive time bins and to decode their apparent firing rates
using an ensemble of gradient -boosted decision trees 55. Recording sessions were selected for these
analyses if they had at least ten repeatedly spiking neurons recorded, at least four recorded calls that
were answered, and at least four recorded calls that were not answered. A Trill call was considered an
answer if it happened within 5 seconds of a Trill call from the other monkey, and a Phee call was
considered an answer if it h appened within 10 seconds of a Phee call from other colony monkeys.
These timings were based on previous literature 20,56. As a control, the same analysis was repeated
with pseudo-randomly shuffled labels, and decoding results are reported in the main text and in Figure
4. All time bins considered only the times from five seconds before to five seconds after call onset and
were constrained to be no briefer than four hundred milliseconds in duration, resulting in a median
activation time of t=-0.17 (IQR: -2.34 through +2.14) seconds (N
bins = 4326 predictive time bins) relative
to call onset at t=0. Predictive time bins were also constrained to be non -overlapping for any given
neuron such that no action potential was considered twice for a given type of neural representation.
Reported tree- based decoder accuracies were computed usi ng a threshold parameter value that
maximized apparent accuracy. Area under the ROC curves provide an alternative outcome measure
that is parameterless and is reported in the Fig 4d-g.
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
We found 2139 significant predictive time bins across 948 neurons (Extended Data Fig. 7a, c) that were
significantly predictive if a Trill call was part of a turn -taking bout in terms of median firing rate (two -
sided Mann-Whitney’s U-test, p < 0.05). Only 1098 neurons were considered in these Trill call analyses
by the minimal session selection criteria, which was used equally in all predictive time bin analyses .
We found 2187 predictive time bins across 560 neurons (Extended Data Fig. 7b, d) that were
significantly predictive if a Phee call was part of a turn-taking bout (two-sided Mann-Whitney’s U-test,
p < 0.05). Only 763 neurons were considered in these Phee call analyses by the same minimal session
selection criteria.
Acknowledgments
This work is supported by grants to AL (Marie Sklodowska-Curie fellowship 101018877 and ERC Starting
grant 101116110) and to CTM (NIH R01 DC 012087).
Competing interests
The authors declare no competing interests
Corresponding Author
Correspondence to Arthur Lefevre (arthur.lefevre{at}isc.cnrs.fr).
Supplementary Figure legends
Figure S1. a) Coronal section from the Paxinos atlas showing the location of brush array (top) cannula
insertion and Neuropixel (bottom) implantation site. b) Coronal MRI planes of monkey S before the
first recording session (top) and after the last one (bottom), showing the trajectory of the 64 channels
brush array through area 24. c) Sagittal MRI plane of monkey G showing (white arrows) the two 32
channels brush arrays implantation sites. d) Number of units (total = 2355) recorded per day with the
2 Neuropixel probes implanted in monkey L. Only sessions d4 to d9 were used for data analysis.
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Figure S2. a) Picture taken during a recording session. b) Table summarizing the number of each call
type per caller identity. c) Pie charts representing call type pro portions for the subject (top) and the
partner (bottom). Note that most of the perceived Tsiks and Eks came from a few specific sessions
and could not be analyzed. d) Examples of spectrograms of recorded vocalizations.
Figure S3. a) Normalized activity o f neurons with significant increased response to vocalization
production, per call type. b) Normalized activity of neurons with significant decreased response to
vocalization production, per call type. c) Normalized activity of neurons with significant inc reased
response to partner vocalization perception, per call type. d) Pie chart representing the neurons (n =
1599) responses: 217 neurons with increased activity for 1 type of vocalization production, 47 neurons
with increased activity for several types o f vocalization production, 482 neurons with decreased
activity for 1 type of vocalization production, 170 neurons with decreased activity for several types of
vocalization production, 116 neurons with increased activity for 1 type of partner vocalization
perception, 15 neurons with increased activity for several types of partner vocalization perception, 93
neurons with increased activity for 1 type of colony vocalization perception, 51 neurons with increased
activity for both vocalization production and partner vocalization perception, 408 neurons with no
significant responses.
Figure S4. a) Decoding partner call type accuracy for each session (gray lines) depending on the number
of neurons removed, red line is the average. b) Percentage accuracy of the neuronal multiclass decoder
for perceived call type for real data and data with shuffled labels, after removing neurons with
significant activity increase around call onset (Wilcoxon signed rank test, z = 3.65, p 0.05). Error bars indicate s.e.m. (standard error to the mean).
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Figure S 5. a) Decoding accuracy expressed in fold above chance for produced and perceived
vocalizations. * indicate significant decoding compared to chance (Wilcoxon sign rank test, p < 0.05,
Bonferroni corrected). b) Correlation between the number of neurons and the decoding quality. c)
Decoding produced call type performance for each session (gray lines) depending on the number of
neurons removed, red line is the average. d) Percentage accuracy of the neuronal multiclass decoder
for produced call type for real data and data with shuffled labels, after removing neurons with
significant activity increase around call onset (Wilcoxon signed rank test, z = 4.95, p 0.05). Error bars indicate s.e.m. (standard error to the mean).
Figure S6. a) PSTH examples and spike rasters of significant time bins predicting whether a trill call was
part of an exchange (within 5 sec of a trill from the other monkey). b) PSTH examples and spike rasters
of significant time bins predicting whether a phee call was part of an exchange (within 5 sec of a phee
from the other monkey). c-d) Receiver Operator Characteristic (ROC) traces for our neural decoder
trained on predictive time bins and tested on data with shuffled labels representing whether trills
answered trills with the subject’s partner in panel c, the answering of phee calls with the colony in
panel d. Blue solid lines represent calls initially produced by the subject while orange solid lines
represent calls initially perceived by the subject. Random chance is represented by the red dashed line
as a guide. The area under the curve (AUC) quantifies the success of the neural decoder as our outcome
measure.
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
*
Colony call type70
65
60
55
50
45 % Accuracy
40
Data Shuffled
*
Partner call type100
90
80
70
60
50
40
30
20 % Accuracy
Data Shuffled
Colony
microphone
and camera
Partner
Subject
Perceiving calls
Subject
(n = 3)
a
Partner
Colony
35
30
25
20
15
10
-4 -3 -2 -1 0 1 2 3 4 (s)
5
Firing rate (Hz)
Call onseti
5
4
3
2
1
0
-4 -3 -2 -1 0 1 2 3 4 (s)
Firing rate (Hz)
Call onsetj
-2
Trill
Twitter
Phee
Chirp
Call onset
Normalized FR
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-4 -3 -2 -1 0 1 2 3 4 (s)
0.2
0.15
0.1
0.05
0
-0.05Normalized FR
-4 -3 -2 -1 0 1 2 3 4 (s)
Twitter
Phee
Call onset
-0.1
f
Partner
Colony
Fig 1. Segregated ACC neuronal populations encode vocalizations perceived from the partner and the colony
Figure 1. Segregated ACC neuronal populations encode vocalizations perceived from the partner and the colony. a) Schema of the
experimental paradigm. Top – recordings took place directly in the home cage and transfer box in the colony room. Center - both subject
and partner monkeys were equipped with a portable microphone, and the implanted subject was also equipped with a neurologger.
Bottom – Neurons were recorded with brush arrays or Neuropixel probes implanted in ACC area 24. b) Coronal plane (AP = + 12mm) at
which probes were implanted. c) Percentage of neurons responding to vocalization production and perception. d) Vocalizations produced
by the partner or other marmosets in the colony (vs produced ones) were analyzed separately. e) Normalized average increase from
neurons with significantly upregulated activity during partner call perception. f) Normalized average increase from neurons with
significantly upregulated activity during colony call perception. g-h) Percentage accuracy of the neuronal multiclass decoder for partner g)
and colony (h) perceived call type for real data and data with shuffled labels. i) Example unit responding specifically to twitter calls from the
partner but not from the colony. j) Example unit responding specifically to phee calls from the colony but not from the partner. k)
Percentage accuracy of the neuronal SVM decoder for identity (partner vs colony) for real data and data with shuffled labels. *: p < 0. 05,
error bars indicate s.e.m, (standard error to the mean).
d
Population average Population average
Example
neuron
Example
neuron
A24
Wireless brush arrays
or Neuropixels
1599 units
b
e
g h
Data Shuffled
90
80
70
60
50
40 % Accuracy
k
*
Social category
c
57%
17%
26%
Production Perception
None
Modulated units
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Fig 2 ACC units responds invariantly to calls with and without overlapping vocalizations
Figure 2. ACC units responds invariantly to calls with and without overlapping vocalizations . a) Example
spectrograms from the colony and partner microphones showing the occurrence of a call from the partner (twitter)
that is perceived simultaneously with a call from the colony (phee). b) Example spectrograms from the colony and
partner microphones showing the occurrence of a call from the partner (twitter) that is perceived sequentially to calls
from the colony (twitters). c) Example neurons responding to twitter calls from the partner that are unaffected by the
presence (or absence) of simultaneous calls from the colony. d) Normalized average increase from neurons with
significantly upregulated activity during partner twitter call perception is unaffected by the presence (or absence) of
simultaneous calls from the colony. *: p < 0.05, error bars indicate s.e.m.
b
Sequential
Twitter Twitter
Twitter
Colony
calls
Partner
calls
a
Colony
mic
Partner
mic
Simultaneous
Phee Phee Twitter
Twitter
2kHz
20kHz
2kHz
20kHz
16
12
8
4
0
c
Example
neurons
All Tw Sequential Tw Simultaneous Tw
-4 -3 -2 -1 0 1 2 3 4 (s)
Firing rate (Hz)
0.5
0.4
0.3
0.2
0.1
0
-0.1Normalized FR
-4 -3 -2 -1 0 1 2 3 4 (s)
d
30
25
20
15
10
5
0
-4 -3 -2 -1 0 1 2 3 4 (s)
Population
average
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Fig 3 Marmosets Monitor Social Scenes to Optimize Vocal Turn Taking
in a Cocktail Party
Figure 3. Marmosets Monitor Social Scenes to Optimize Vocal Turn Taking in a Cocktail Party. a) Example
spectrograms showing trill to trill conversations between the subject and its partner. b) Example spectrograms showing
phee to phee conversations between the subject and other monkeys in the colony. c) Normalized partner call rate for
trills (purple) and phees (orange) aligned on subject’strills and phees onsets. d) Colony call rate for phees aligned on
subject’strills (purple) and phees (orange).
a
Subject
calls
Partner
calls
Trill interactions
with partner
Phee interactions
with colony
Subject
mic
Subject
calls
Colony
calls
Partner
mic
Subject
mic
Colony
mic
Trills
Trill
Phees
Phees
b
Trill
Phee
d
0.2
0.15
0.1
0.05
0
-0.05
-0.1Normalized colony call rate
Subject call onset
-15 -10 -5 0 5 10 15 (s)
*
*
c
Trill
Phee
0.08
0.06
0.04
0.02
0
-0.02
Subject call onset
-15 -10 -5 0 5 10 15 (s)
-0.04Normalized partner call rate
*
*
2kHz
20kHz
2kHz
20kHz
2kHz
20kHz
2kHz
20kHz
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
-4 -3 -2 -1 0 1 2 3 4 (s)
Fig 4 ACC neurons encode active turn taking with the partner and the colony
1.0
0.8
0.6
0.4
0.2
0
True positive rate
e
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
Predicting Tr answering Tr
Produced
Perceived
Random
AUC = 0.8465
AUC = 0.8472
Figure 4. ACC neurons encode active turn taking with the partner and the colony. a) Normalized average increase from neurons with significantly
upregulated activity during call production. b) Example neuron responding to the production of several call types. c) Normalized average decrease from
neurons with significantly downregulated activity during call production. d) Percentage accuracy of the neuronal multiclass decoder for produced call
type for real data and data with shuffled labels. e-f) Receiver Operator Characteristic (ROC) traces for our neural decoder trained on predictive time bins
representing whether trills answered trills with the subject’s partner in panel e, the answering of phee calls with the colony in panel f. Blue solid lines
represent calls initially produced by the subject while orange solid lines represent calls initially perceived by the subject. Random chance is represented
by the red dashed line as a guide. The area under the curve (AUC) quantifies the success of the neural decoder as our outcome measure. *: p < 0.05, error
bars indicate s.e.m.
Call onset
-4 -3 -2 -1 0 1 2 3 4 (s)
b Example unit
45
40
35
30
25
20
15
10
5
Firing rate (Hz)
Population Average
Call onset
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
Normalized FR
-0.2
c
Population average
Call onset
a Trill
Twitter
Phee
Chirp
0.6
0.4
0.2
0
-0.2
-0.4
-0.6 Normalized FR
d
Shuffled
*
Data
Subject call type
100
90
80
70
60
50
40
30
20
% Accuracy
Predicting Ph answering Ph
1.0
0.8
0.6
0.4
0.2
0
True positive rate
f
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
Produced
Perceived
Random
AUC = 0.7011
AUC = 0.9626
-4 -3 -2 -1 0 1 2 3 4 (s)
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
a b
Figure S1
c
Figure S1. a) Coronal section from the Paxinos atlas showing the location of brush array (top) cannula
insertion and Neuropixel (bottom) implantation site. b) Coronal MRI planes of monkey S before the first
recording session (top) and after the last one (bottom), showing the trajectory of the 64 channels brush
array through area 24. c) Sagittal MRI plane of monkey G showing (white arrows) the two 32 channels
brush arrays implantation sites. d) Number of units (total = 2355) recorded per day with the 2
Neuropixel probes implanted in monkey L. Only sessions d4 to d9 were used for data analysis.
0
50
100
150
200
250
300
0 1 2 3 4 5 6 7 8 9 10 13 15 20 24 27 30
Number of units
Days after surgery
Npx mouse 1.0
Npx NHP
d
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Figure S2
Voc type Subject Partner Colony
Tr 5386 3201
Ph 1774 778 12293
Tw 877 959 9350
Chi 1677 643
Ts 243 452
Ek 126 802
a
b
c
Perceived
Produced
Trill
Twitter
Phee
Chirp
Tsik
Ek
Figure S2. a) Picture taken during a recording session. b) Table summarizing the number of each call
type per caller identity. c) Pie charts representing call type proportions for the subject (top) and the
partner (bottom). Note that most of the perceived Tsiks and Eks came from a few specific sessions and
could not be analyzed. d) Examples of spectrograms of recorded vocalizations.
Trill
Phee
TwitterChirp
d
2kHz
20kHz
2kHz
20kHz
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
d 1 Prod
>1 Prod
1 Dec
>1 Dec
1 Perc
>1 Perc
Col
Both
NR
Figure S3
c
Tr
Tw
Ph
Chi
Call onset
b Call onset
Tr
Tw
Ph
Chi
Figure S3. a) Normalized activity of neurons with significant increased response to vocalization
production, per call type. b) Normalized activity of neurons with significant decreased response to
vocalization production, per call type. c) Normalized activity of neurons with significant increased
response to partner vocalization perception, per call type. d) Pie chart representing the neurons (n =
1599) responses: 217 neurons with increased activity for 1 type of vocalization production, 47 neurons
with increased activity for several types of vocalization production, 482 neurons with decreased
activity for 1 type of vocalization production, 170 neurons with decreased activity for several types of
vocalization production, 116 neurons with increased activity for 1 type of partner vocalization
perception, 15 neurons with increased activity for several types of partner vocalization perception, 80
neurons with increased activity for 1 type of colony vocalization perception, 51 neurons with increased
activity for both vocalization production and partner vocalization perception, 408 neurons with no
significant responses.
a
Tr
Tw
Ph
Chi
Call onset
50
100
150
200
250
300
20
40
60
80
100
120
140
100
200
300
400
500
600
700
800
-2 -1 0 1 2 3 s
-2 -1 0 1 2 3 s
-2 -1 0 1 2 3 s
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Figure S4
b
Figure S4. a) Decoding partner call type accuracy for each session (gray lines) depending on the number of neurons
removed, red line is the average. b) Percentage accuracy of the neuronal multiclass decoder for perceived call type for
real data and data with shuffled labels, after removing neurons with significant activity increase around call onset
(Wilcoxon signed rank test, z = 3.65, p 0.05). Error bars indicate s.e.m. (standard error to the
mean).
Data Shuffled
*
Partner call type
100
90
80
70
60
50
40
30
20
% Accuracy
aAccuracy
(fold above chance)
180 160 140 120 100 80 60 40 20 0
Nb Neurons left
Decoding while removing random neurons
2.2
2
1.8
1.6
1.4
1.2
1
0.8
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Figure S5
b
Figure S5. a) Decoding accuracy expressed in fold above chance for produced and perceived
vocalizations. * indicate significant decoding compared to chance (Wilcoxon sign rank test, p < 0. 05,
Bonferroni corrected). b) Correlation between the number of neurons and the decoding quality. c)
Decoding produced call type performance for each session (gray lines) depending on the number of
neurons removed, red line is the average. d) Percentage accuracy of the neuronal multiclass decoder
for produced call type for real data and data with shuffled labels, after removing neurons with
significant activity increase around call onset (Wilcoxon signed rank test, z = 4. 95, p 0.05). Error bars indicate s.e.m. (standard error to the mean).
Call onset
Fold above chance
*
*
a
*
-5 -4 -3 -2 -1 0 1 2 3 4 5 s
Time in sec
Produced
Perceived
d
Data Shuffled
*
Subject call type
100
90
80
70
60
50
40
30
20
10 % Accuracy
Accuracy
(fold above chance)
c Decoding while removing random neurons
140 120 100 80 60 40 20 0
Nb Neurons left
2.2
2
1.8
1.6
1.4
1.2
1
0.8
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Figure S6
b
Figure S6. a) PSTH examples and spike rasters of significant time bins predicting whether a trill call was part of an exchange
(within 5 sec of a trill from the other monkey). b) PSTH examples and spike rasters of significant time bins predicting whether a
phee call was part of an exchange (within 5 sec of a phee from the other monkey). c-d) Receiver Operator Characteristic (ROC)
traces for our neural decoder trained on predictive time bins and tested on data with shuffled labels representing whether
trills answered trills with the subject’spartner in panel c, the answering of phee calls with the colony in panel d. Blue solid lines
represent calls initially produced by the subject while orange solid lines represent calls initially perceived by the subject.
Random chance is represented by the red dashed line as a guide. The area under the curve (AUC) quantifies the success of the
neural decoder as our outcome measure.
1.0
0.8
0.6
0.4
0.2
0
True positive rate
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
Predicting Tr answering Tr (shuffled)
Produced
Perceived
Random
AUC = 0.5290
AUC = 0.5940
c
1.0
0.8
0.6
0.4
0.2
0
True positive rate
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate
Predicting answering Ph (shuffled)
Produced
Perceived
Random
AUC = 0.4519
AUC = 0.5463
d
a
.CC-BY-NC 4.0 International licenseperpetuity. It is made available under a
preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.