Primate ACC encodes natural vocal interactions in a ‘cocktail party’

doi:10.1101/2025.10.17.683014

Primate ACC encodes natural vocal interactions in a ‘cocktail party’

2025 · doi:10.1101/2025.10.17.683014

preprint OA: closed CC-BY-NC-4.0

📄 Open PDF Full text JSON View at publisher

Full text 79,106 characters · extracted from oa-pdf · 3 sections · click to expand

Discussion

Here, we investigated the role of ACC area 24 in resolving the CPP as freely -moving marmosets engaged in natural vocal communication in a noisy environment. We found that single neurons selectively responded to partner vocalizations or colony calls, effectively parsing different categories of conspecific speakers in the acoustic scene (Fig. 1) . Critically, neurons selective for partner calls responded invariantly when calls overlapped acoustically with other sounds, showing that neural mechanisms to distinguish meaningful voices from other interfering voices, are present in marmoset ACC area 24 (Fig. 2). These findings suggests that ACC may be a keystone neural substrate for resolving the CPP, highlighting the significance of this process for mediating social interactions in natural scenes, rather than solely a challenge for audition. As a midline forebrain area conserved across all mammals 33 that is both anatomically connected to fronto-lateral34 and auditor y cortex 35, and modulated by attention 27–29, it may perhaps not be surprising that the ACC supports resolution of the CPP. Although not classically considered part of the language network , results here show that ACC neurons are involved in a range of communicative functions, including representing each of the call types produced or perceived in vocal exchanges (Fig. 1, 3). Recent marmoset fMRI studies have indeed implicate d ACC in vocalization processing 9,25, suggesting strong interactions with the broader communication network. Moreover, if we view the CPP as a non-literal language comprehension problem, resolving it requires listeners to recognize each speaker’s voice. Afferent connections from hippocampus, which robustly encodes identity in social signals32, to area 24 35, could provide this necessary input. Taken together, these findings and the underlying anatomical connectivity suggest that ACC area 24 plays a more central role in resolving the CPP than previously thought, potentially emphasizing that the CPP is not merely a challenge for audition but one of sociality. Marmosets engage in vocal turn-taking12,14,36,37. In the noisy colony setting, we observed differences in the relative timing of the call types produced in these social exchanges in the presence of background .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint noise. Whereas high amplitude, long-distance phee calls were produced in synchrony with other phee calls from the colony, trill calls – a call used when in close visual contact - were mostly produced during periods of silence in the colony. This shows that marmosets monitor the acoustic landscape to optimize signaling efficacy during these communicative exchanges38–41. Moreover, ACC appears to play a role in this process, as the social context in which these calls were produced and perceived could be reliably decoded from neural activity alone (Fig. 4). In parallel, fMRI findings in humans point out that ACC is activated by listening to conversations rather than monologues 42, and more generally that ACC key role in theory of mind is integral to natural language processing43–46, and language production, as ACC lesions in humans produce akinetic mutism 47. Finally, ACC function s for social monitoring48 and exploration49 make it a logical node to be recruited for processing the CPP. By capturing neural activity during natural vocal exchanges of freely-moving primates in rich acoustic environments, we discovered that the anterior cingulate cortex area 24— a region long overlooked in this domain— implements core computations necessary for resolving the CPP. Certainly, ACC is not solely responsible for resolving the CPP, but results here do suggest a likely central role of this neural substrate for integrating the requisite processes necessary for this foundational auditory process. Moreover, these results emphasize that CPP computations extend beyond sound parsing to orchestrating social communication in noise by integrating perception, action, and co ntext, highlighting the role of this area in mediating social communication. This work showcases the transformative potential of naturalistic neuroscience and points to a broader reevaluation of how and where the brain solves real-world cognitive challenges 13.

References

1. McDermott, J. H. The cocktail party problem. Curr. Biol. CB 19, R1024-1027 (2009). .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint 2. Bee, M. A. & Micheyl, C. The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it? J. Comp. Psychol. Wash. DC 1983 122, 235–251 (2008). 3. Mesgarani, N. & Chang, E. F. Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485, 233–236 (2012). 4. O’Sullivan, J. et al. Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception. Neuron 104, 1195-1209.e3 (2019). 5. Nelken, I., Bizley, J., Shamma, S. A. & Wang, X. Auditory cortical processing in real-world listening: the auditory system going real. J. Neurosci. Off. J. Soc. Neurosci. 34, 15135–15138 (2014). 6. Joshi, N. et al. Temporal coherence shapes cortical responses to speech mixtures in a ferret cocktail party. Commun. Biol. 7, 1392 (2024). 7. Klein, J. T., Shepherd, S. V. & Platt, M. L. Social attention and the brain. Curr. Biol. CB 19, R958- 962 (2009). 8. Jürgens, U. Neural pathways underlying vocal control. Neurosci. Biobehav. Rev. 26, 235–258 (2002). 9. Jafari, A. et al. A vocalization-processing network in marmosets. Cell Rep. 42, 112526 (2023). 10. Putnam, P. T. & Chang, S. W. C. Social processing by the primate medial frontal cortex. Int. Rev. Neurobiol. 158, 213–248 (2021). 11. Nieder, A. & Mooney, R. The neurobiology of innate, volitional and learned vocalizations in mammals and birds. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 375, 20190054 (2020). 12. Burkart, J. M. et al. A convergent interaction engine: vocal communication among marmoset monkeys. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 377, 20210098 (2022). 13. Miller, C. T. et al. Natural behavior is the language of the brain. Curr. Biol. CB 32, R482–R493 (2022). 14. Bezerra, B. M. & Souto, A. Structure and Usage of the Vocal Repertoire of Callithrix jacchus. Int. J. Primatol. 29, 671–701 (2008). .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint 15. Digby, L. J. & Barreto, C. E. Social organization in a wild population of Callithrix jacchus. I. Group composition and dynamics. Folia Primatol. Int. J. Primatol. 61, 123–134 (1993). 16. Lazaro-Perea, C. Intergroup interactions in wild common marmosets, Callithrix jacchus: territorial defence and assessment of neighbours. Anim. Behav. 62, 11–21 (2001). 17. Miller, C. T. et al. Marmosets: A Neuroscientific Model of Human Social Behavior. Neuron 90, 219–233 (2016). 18. Burkart, J. M. & van Schaik, C. P. Marmoset prosociality is intentional. Anim. Cogn. 23, 581–594 (2020). 19. Vitale, A., Zanzoni, M., Queyras, A. & Chiarotti, F. Degree of social contact affects the emission of food calls in the common marmoset (Callithrix jacchus). Am. J. Primatol. 59, 21–28 (2003). 20. Landman, R. et al. Close-range vocal interaction in the common marmoset (Callithrix jacchus). PloS One 15, e0227392 (2020). 21. Robinson, B. W. Vocalization evoked from forebrain in Macaca mulatta. Physiol. Behav. 2, 345– 354 (1967). 22. Sperli, F., Spinelli, L., Pollo, C. & Seeck, M. Contralateral smile and laughter, but no mirth, induced by electrical stimulation of the cingulate cortex. Epilepsia 47, 440–443 (2006). 23. Gavrilov, N., Hage, S. R. & Nieder, A. Functional Specialization of the Primate Frontal Lobe during Cognitive Control of Vocalizations. Cell Rep. 21, 2393–2406 (2017). 24. West, R. A. & Larson, C. R. Neurons of the anterior mesial cortex related to faciovocal activity in the awake monkey. J. Neurophysiol. 74, 1856–1869 (1995). 25. Dureux, A., Zanini, A., Trapeau, R., Belin, P. & Everling, S. Functional organization of voice patches in marmosets and cross-species comparisons with macaques and humans. Curr. Biol. CB S0960- 9822(25)00874–7 (2025) doi:10.1016/j.cub.2025.07.008. 26. Bregman, A. S. Auditory scene analysis: Hearing in complex environments. (1993). 27. Petersen, S. E. & Posner, M. I. The attention system of the human brain: 20 years after. Annu. Rev. Neurosci. 35, 73–89 (2012). .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint 28. Schneider, K. N., Sciarillo, X. A., Nudelman, J. L., Cheer, J. F. & Roesch, M. R. Anterior Cingulate Cortex Signals Attention in a Social Paradigm that Manipulates Reward and Shock. Curr. Biol. CB 30, 3724-3735.e2 (2020). 29. Benedict, R. H. B. et al. Covert auditory attention generates activation in the rostral/dorsal anterior cingulate cortex. J. Cogn. Neurosci. 14, 637–645 (2002). 30. Li, J., Aoi, M. C. & Miller, C. T. Representing the dynamics of natural marmoset vocal behaviors in frontal cortex. Neuron 112, 3542-3550.e3 (2024). 31. Miller, C. T., Mandel, K. & Wang, X. The communicative content of the common marmoset phee call during antiphonal calling. Am. J. Primatol. 72, 974–980 (2010). 32. Tyree, T. J., Metke, M. & Miller, C. T. Cross-modal representation of identity in the primate hippocampus. Science 382, 417–423 (2023). 33. Burgos-Robles, A., Gothard, K. M., Monfils, M. H., Morozov, A. & Vicentic, A. Conserved features of anterior cingulate networks support observational learning across species. Neurosci. Biobehav. Rev. 107, 215–228 (2019). 34. Ducret, M. et al. Medial to lateral frontal functional connectivity mapping reveals the organization of cingulate cortex. Cereb. Cortex N. Y. N 1991 34, bhae322 (2024). 35. Vogt, B. A. & Pandya, D. N. Cingulate cortex of the rhesus monkey: II. Cortical afferents. J. Comp. Neurol. 262, 271–289 (1987). 36. Bosshard, A. B. et al. Beyond bigrams: call sequencing in the common marmoset (Callithrix jacchus) vocal system. R. Soc. Open Sci. 11, 240218 (2024). 37. Oren, G. et al. Vocal labeling of others by nonhuman primates. Science 385, 996–1003 (2024). 38. Eliades, S. J. & Wang, X. Neural correlates of the lombard effect in primate auditory cortex. J. Neurosci. Off. J. Soc. Neurosci. 32, 10737–10748 (2012). 39. Tsunada, J. & Eliades, S. J. Frontal-auditory cortical interactions and sensory prediction during vocal production in marmoset monkeys. Curr. Biol. CB S0960-9822(25)00393–8 (2025) doi:10.1016/j.cub.2025.03.077. .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint 40. Löschner, J., Pomberger, T. & Hage, S. R. Marmoset monkeys use different avoidance strategies to cope with ambient noise during vocal behavior. iScience 26, 106219 (2023). 41. Löschner, J. & Hage, S. R. Sound amongst the din: primate strategies against noise. Trends Cogn. Sci. 29, 111–113 (2025). 42. Olson, H. A., Chen, E. M., Lydic, K. O. & Saxe, R. R. Left-Hemisphere Cortical Language Regions Respond Equally to Observed Dialogue and Monologue. Neurobiol. Lang. Camb. Mass 4, 575–610 (2023). 43. Fedorenko, E., Ivanova, A. A. & Regev, T. I. The language network as a natural kind within the broader landscape of the human brain. Nat. Rev. Neurosci. 25, 289–312 (2024). 44. Ferstl, E. C. & von Cramon, D. Y. What does the frontomedian cortex contribute to language processing: coherence or theory of mind? NeuroImage 17, 1599–1612 (2002). 45. Amodio, D. M. & Frith, C. D. Meeting of minds: the medial frontal cortex and social cognition. Nat. Rev. Neurosci. 7, 268–277 (2006). 46. Wittmann, M. K., Lockwood, P. L. & Rushworth, M. F. S. Neural Mechanisms of Social Cognition in Primates. Annu. Rev. Neurosci. 41, 99–118 (2018). 47. Barris, R. W. & Schuman, H. R. [Bilateral anterior cingulate gyrus lesions; syndrome of the anterior cingulate gyri]. Neurology 3, 44–52 (1953). 48. Clairis, N. & Lopez-Persem, A. Debates on the dorsomedial prefrontal/dorsal anterior cingulate cortex: insights for future research. Brain J. Neurol. 146, 4826–4844 (2023). 49. Kolling, N., Behrens, T. E., Mars, R. B. & Rushworth, M. F. Neural Mechanisms of Foraging. Science 336, 95 (2012). 50. Paxinos, G., Watson, C., Petrides, M., Rosa, M. & Tokuno, H. The Marmoset Brain in Stereotaxic Coordinates. (Elsevier Academic Press, 2012). 51. McMahon, D. B. T., Bondar, I. V., Afuwape, O. A. T., Ide, D. C. & Leopold, D. A. One month in the life of a neuron: longitudinal single-unit electrophysiology in the monkey visual system. J. Neurophysiol. 112, 1748–1762 (2014). .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint 52. Oikarinen, T. et al. Deep convolutional network for animal sound classification and source attribution using dual audio recordings. J. Acoust. Soc. Am. 145, 654 (2019). 53. Pachitariu, M., Steinmetz, N., Kadir, S., Carandini, M. & D, H. K. Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. 061481 Preprint at https://doi.org/10.1101/061481 (2016). 54. Tyree, T. J. Applications of Mathematical Physics to Quantitative Biology. (2023). 55. Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. in 785–794 (2016). doi:10.1145/2939672.2939785. 56. Jovanovic, V., Fishbein, A. R., de la Mothe, L., Lee, K.-F. & Miller, C. T. Behavioral context affects social signal representations within single primate prefrontal cortex neurons. Neuron S0896- 6273(22)00059–9 (2022) doi:10.1016/j.neuron.2022.01.020. Figure legends Figure 1. Segregated ACC neuronal populations encode vocalizations perceived from the partner and the colony. a) Schema of the experimental paradigm. Top – recordings took place directly in the home cage and transfer box in the colony room. Center - both subject and partner monkeys were equipped with a portable microphone, and the implanted subject was also equipped with a neurologger. Bottom – Neurons were recorded with brush arrays or Neuropixel probes implanted in ACC area 24. b) Coronal plane (AP = +12mm) at whic h probes were implanted. c) Percentage of neurons responding to vocalization production and perception. d) Vocalizations produced by the partner or other marmosets in the colony (vs produced ones) were analyzed separately. e) Normalized average increase fr om neurons with significantly upregulated activity during partner call perception. f) Normalized average increase from neurons with significantly upregulated activity during colony call perception. g-h) Percentage accuracy of the neuronal multiclass decoder for partner g) and colony (h) perceived call type for real data and data with shuffled labels. i ) Example neuron responding specifically to twitter calls from the partner but not from the colony. j) Example neuron responding specifically to phee calls .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint from the colony but not from the partner. k) Percentage accuracy of the neuronal SVM decoder for identity (partner vs colony) for real data and data with shuffled labels. *: p < 0.05, error bars indicate s.e.m, (standard error to the mean). Figure 2. ACC neurons respond invariantly to calls with and without overlapping vocalizations. a) Example spectrograms from the colony and partner microphones showing the occurrence of a call from the partner (twitter) that is perceived simultaneously with a call from t he colony (phee). b) Example spectrograms from the colony and partner microphones showing the occurrence of a call from the partner (twitter) that is perceived sequentially to calls from the colony (twitters). c) Example neurons responding to twitter calls from the partner that are unaffected by the presence (or absence) of simultaneous calls from the colony. d) Normalized average increase from neurons with significantly upregulated activity during partner twitter call perception is unaffected by the presence (or absence) of simultaneous calls from the colony. *: p < 0.05, error bars indicate s.e.m. Figure 3. Marmosets Monitor Social Scenes to Optimize Vocal Turn Taking in a Cocktail Party. a) Example spectrograms showing trill to trill conversations betwee n the subject and its partner. b) Example spectrograms showing phee to phee conversations between the subject and other monkeys in the colony. c) Normalized partner call rate for trills (purple) and phees (orange) aligned on subject’s trills and phees onse ts. d) Colony call rate for phees aligned on subject’s trills (purple) and phees (orange). Figure 4. ACC neurons encode active turn taking with the partner and the colony. a) Normalized average increase from neurons with significantly upregulated activity during call production. b) Example neuron responding to the production of several call types. c) Normalized average decrease .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint from neurons with significantly downregulated activity during call production. d) Percentage accuracy of the neuronal multiclass decoder for produced call type for real data and data with shuffled labels. e-f) Receiver Operator Characteristic (ROC) traces for our neural decoder trained on predictive time bins representing whether trills answered trills with the subject’s partner in panel e, the answering of phee calls with the colony in panel f. Blue solid lines represent calls initially produced by the subject while orange solid lines represent calls initially perceived by the subject. Random chance is represented by the red dashed line as a guide. The area under the curve (AUC) quantifies the success of the neural decoder as our outcome measure. *: p < 0.05, error bars indicate s.e.m.

Methods

Animals Four adult marmosets (Callithrix jacchus) living in bonded pairs (1 female and 1 male per cage) for at least 3 months were used for this experiment. All pairs were housed in a room with other marmoset cages including two adjacent cages with visual separators (between 41 and 47 marmosets in total). Three of the marmosets, monkey S (female), G (female) and L (male) were implanted with electrodes while monkey K (male) was only used for behavior. Animals had unrestricted access to food and water. All experiments were approved by the UCSD Institutional Animal Care and Use Committee and were performed in the Cortical Systems and Behavior Laboratory at University of California San Diego (UCSD). Behavioral paradigm Marmosets were first habituated to be handled and taken out in transport boxes by the experimenters for a month. Once habituated to handling (measured by assessing the comfort of animals while being handled and taking treats from the experimenters’ hands), we gradually habituated them over one or two months to wear the leather harness containing the microphone until they spent less than 20% of the time chewing on it. Experimental sessions lasted between 1 hour to 1.5 hours. We recorded only .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint one pair of monkeys per session. During a session, the pair was taken out of their home cage in transport boxes and brought to an adjacent experimental room for preparation. We then equipped both the subject and the partner monkey with the custom -made leather harness contain ing a microphone (micro voice recorder, Spycentre Security®). Additionally, we connected a neurologger to the subject’s head-cap. Both monkeys were brought back to the colony room. The partner was placed back in its home-cage and the subject was kept in a transparent transport box placed at the entrance of the home-cage (Fig 1a, Fig S2a) giving the pair visual and auditory access to each other. To record colony calls, we used an omnidirectional microphone (H3-VR, Zoom®) placed in the middle of the room. This paradigm allowed us to record acoustic interactions between the subject and its partner, and between the subject and the colony. In marmosets, which are capable of turn -taking, these interactions involve the coordinated exchange of vocal signals between individuals, characterized by social coordination and minimal acoustic overlap, allowing speakers to alternate the timing of their calls within a shared acoustic space. Neurophysiological procedures To target the desired brain regions for electrophysiological recording using external probes, the brain anatomy of the subjects was imaged using a 3T MRI (Siemens MAGNETOM Prisma) scanner and a human knee coil ( Siemens/QED Transmit/Receive 15 channels ). T2 and T1 weighted images of the whole marmoset brain were obtained in an anaesthetized animal placed in a custom non-magnetic 3D printed stereotax with earbars and bite plate. We adapted the coordinates from the Paxinos atlas50 of the marmoset brain based on individual brain size. We targeted the left hemisphere area 24 (Site 1: AP +12mm, ML +1mm, Site 2: AP +13.5mm, ML +1mm). On the day of surgery, monkeys were intubated and maintained under isoflurane anesthesia, then placed in a surgical stereotaxic frame (Kopf). We performed a skin incision and a craniotomy (1.5mm diameter), then inserted a ground electrode between the dura and the skull before covering the skull with metab ond dental cement ( C&B .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint Metabond®). We employed 2 different strategies. For monkey S and G, we used micro wire brush arrays (Microprobe Maryland, USA, Monkey S:1x64 channels, Monkey G: 2x32 channels with 2mm space in between) with microdrives (custom made f rom 51, for Monkey G, an additional hole was pierced and the drive adapted to hold 2 micro brush arrays). The electrode guide tube(s) (Microlumen, High performance medical tubing, 310-I.5 PTFE), containing a 24G needle, attached to the microdrive was/were inserted straight into the brain at a depth of 1.5mm. The needle(s) was/were retracted and the electrode(s) array(s) were lowered in to the guide tube(s) until they protruded out of the guide tube by 1mm. Kwik Sil (World Precision Instrument ) was used to fill the craniotomy and seal the gap at the top of the guide tube(s). Then the Microdrive and the omnetics connector were cemented to the metabond using dental acrylic, after soldering the ground electrode to the array ground wire. Finally, the skin was glued to the metabond using Vetbond ( 3M). For monkey L, we employed Spikegadget wireless neuropixel system. The procedure was similar, except that a durotomy was performed and the 2 probes (Npx 1.0 and Npx NHP 1.0 10mm) inserted 5mm deep from the cortical surface before we covered them with kiwk- Sil and fixed them directly to the skull with metabond. Probes were then connected to the Spikegadget head cap that was built around the electrodes and cemented to the metabond. After the surgery, monkeys received analgesic treatment for 3 days. Recording sessions started 2 weeks after the surgery for monkey S and G and were performed twic e per week, and started 4 days after surgery and happen ed every day for monkey L (monkey S: 21 sessions, monkey G: 19 sessions, Monkey L: 6 sessions). Implantation sites were verified by structural MRI for monkey G and S (Fig S1b), but this could not be done for monkey L. To wirelessly record neurons, we used a 64 channels Neurologger (Deuteron Technologies) for monkey S and G with a sample rate of 32 KHz, and Neuropixels Datalogger Headstage (SpikeGadgets) with a sample rate of 30 KHz for monkey L. Data preprocessing .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint Vocalizations: To automatically label subjects’ and partners’ vocalizations, we retrained the algorithm described in 20 on the data recorded from our 4 monkeys with the wearable microphones. This neural network52 identifies the call type and determine the source (subject or partner) by comparing the amplitude. If the amplitude is equal in both microphones, the call is considered to be coming from the colony. Following this step, all audio files were manually corrected to remove eventual mislabeling by the algorithm and determine precisely (<20ms) each call onset and offset. We considered only 4 call types: Trill, Twitter, Phee and Chirp. Other calls (Tsik, Ek, Chatter, TrillPhee , Whistle) had too few occurrences to be analyzed. Additionally, to investigate the cocktail party effect, Twitter calls coming from the partner were labeled as “simultaneous” or “sequential” depending on if a call from the colony was overlapping or not with it. For calls coming from the colony (recorded with the omnidirectional microphone), a first pass was done with ACDC neural network (https://github.com/mineraldragon/ACDC_2022) and then corrected manually. We only labeled Twitter and Phee calls from the colony, as other call types such as Trill and Chirp were not loud enough to be recorded consistently, or had too few occurrences of them (eg Tsik, Chatter, etc.). Note that for technical reasons, colony calls were not always recorded (34 out of 46 sessions with colony recordings). We removed all calls that were happening less than 2 seconds after another call from the same source, to ensure that neuronal r esponses analyzed were not influenced by a recent prior call. This was the case for all analyses except for PSTH of phee calls perceived from the partner (Fig 1f), due to low number of calls. Thus, Fig 1f phee call data may contain residual activity from successive calls. Finally, we removed calls that occurred when the neural signal was poor due to artifacts, defined as calls for which 20% of the neural signal (from 1sec before to 1 sec after the call onset) was labeled as artifact (see neural data). Neural data: All neural data were band pass filtered ( 300-7000Hz). Signal from brush arrays (but not Neuropixel) contained artifacts , that we re removed by sett ing signal value to 0 if the standard .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint deviation of one channel was above a threshold (arbitrarily set to 100 after visual inspection of signal). If more than 20% of a session was removed, the session was not included in our analyses. W e then used Kilosort 2 53 to performed automatically the spike sorting for both micro brush arrays and Neuropixels data. Each identified unit was manually checked and validated and labeled as single or multi-unit or rejected based on the waveform shape and auto -correlogram using Phy (open -source Python library for spike visualization and curation). Only units with a minimum firing rate of 0.5Hz were considered. Across the 3 monkeys (Monkey S: units, Monkey G: 574 units, Monkey L: 794 units) a total of 1599 units were identified, of which 932 were well isolated single units (using the PCA). However, because we did not see differences between single and multi-units in later analyses, we decided to pull them together and to refer to them (single and multi-neurons) as neurons throughout this article. All PSTH in this article are showing results from well isolated single units. Synchronization: All data streams (3 microphones and neurologger) were synchronized using 250ms audio tone pulses played via a buzzer generated by an ESP32 microcontroller programmed using Arduino IDE. All .wave files were manually aligned with Audacity® (v 3.4). Two tones (one at the start of the session and another at the end) were used to bookend the whole session which allowed us to identify and correct the temporal drift of each microphone in Audacity using the “change speed” functionality. Audacity was able to correct the temporal drift in each microphone to less than 20ms. Following this, a MATLAB script was used to randomly remove data points until the temporal time drift was reduced to <1ms. This did not influence the labeling of calls by the artificial neural network. A 100 mv TTL pulse (100 ms) was sent to the Neurologger using the same ESP32 microcontroller neuron at the beginning of the session. The time difference between the Neurologger TTL and first acoustic pulse was used to align the timing of vocalizations onset and offset to the neurologger time. Using this approach, we were able to synchronize different sys tems with varying sampling frequencies to a common master clock. .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint Statistical analyses PSTH responses: Firing rate was computed over 100ms bins. To test neurons’ response to produced or perceived calls we first employed a peak detection approach. For each call type, if there were at least 5 calls, we calculated the mean firing rate between -5s to -2s before call onset for produced calls and between -5sec to 0sec before call onset for perceived calls as a baseline . We then searched for the peak response between -2s and +1s around call onset for produced calls and between 0sec and 3sec after call onset for calls perceived. To ensure a minimum peak width, we then tested if the bins around the peak bin reached at least 50% of the peak value (thresho ld = baseline + - 0.5*(peak-baseline)) to have at least 3 consecutive bins (300ms) beyond this threshold. Finally, we tested if the peak values were significantly higher than the baseline values using the non -parametric paired Wilcoxon signed rank test. We used this approach to allow for time flexibility because we observed that neurons responded at various time windows (Fig S4, 6). Note that both maximum and minimum peak responses were computed to look for excited and inhibited neurons. Each neuron was tested for each produced and perceived call types. Finally, the demonstrational PSTH traces reported in Figure S7 were first computed at a bin width of 0.3 seconds, then up -sampled to a bin width of 0.1 seconds using linear interpolation that assigned knots to the mean time of any given bin, and finally smoothed with a sliding window with a duration of 2 seconds using a Savitzky -Golay filtration. Uncertainty of these demonstrational PSTH traces indicates 95% confidence estimated via bootstrap and is shown by the relatively horizontal, yet curved shaded regions in Figure S7. Decoding: To decode call type or caller category, we employed a multiclass decoder (error-correcting output codes model, MATLAB function fitcecoc) with a one-versus-all coding design. For each session, the input was the average firing rate of each neuron between -2s and +1s around call onset for produced calls and between 0sec and 3sec after call onset for calls perceived. The number of calls of each type ( Trill, Twitter, Phee or Chirp) or caller (partner or colony) was equalized by subsampling .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint categories with more calls down to the category that had the least. Only calls that had at least 10 occurrences were considered, leading to different number of categories and therefore chance levels for each session (25%, 33% or 50%). The decoder was trained on 80% of the data and tested on the remaining 20%. To control for random effects, we also decoded on neural data with shuffled labels . This procedure was repeated 100 times to obtain a representative average decoding value for the data and the shuffle. Finally, we used Wilcoxon sign rank test to compare the average decoding accuracies obtained on the real and the shuffled data. Time bins: To quantify the neural representations that identified whether a Trill with the subject’s partner was answered or whether a Phee with the subject’s colony was answered, we used a predictive time bin analysis similar to previous work in the lab 32,54. In this study, we applied the same four -fold stratified cross-validation both to detect predictive time bins and to decode their apparent firing rates using an ensemble of gradient -boosted decision trees 55. Recording sessions were selected for these analyses if they had at least ten repeatedly spiking neurons recorded, at least four recorded calls that were answered, and at least four recorded calls that were not answered. A Trill call was considered an answer if it happened within 5 seconds of a Trill call from the other monkey, and a Phee call was considered an answer if it h appened within 10 seconds of a Phee call from other colony monkeys. These timings were based on previous literature 20,56. As a control, the same analysis was repeated with pseudo-randomly shuffled labels, and decoding results are reported in the main text and in Figure 4. All time bins considered only the times from five seconds before to five seconds after call onset and were constrained to be no briefer than four hundred milliseconds in duration, resulting in a median activation time of t=-0.17 (IQR: -2.34 through +2.14) seconds (N bins = 4326 predictive time bins) relative to call onset at t=0. Predictive time bins were also constrained to be non -overlapping for any given neuron such that no action potential was considered twice for a given type of neural representation. Reported tree- based decoder accuracies were computed usi ng a threshold parameter value that maximized apparent accuracy. Area under the ROC curves provide an alternative outcome measure that is parameterless and is reported in the Fig 4d-g. .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint We found 2139 significant predictive time bins across 948 neurons (Extended Data Fig. 7a, c) that were significantly predictive if a Trill call was part of a turn -taking bout in terms of median firing rate (two - sided Mann-Whitney’s U-test, p < 0.05). Only 1098 neurons were considered in these Trill call analyses by the minimal session selection criteria, which was used equally in all predictive time bin analyses . We found 2187 predictive time bins across 560 neurons (Extended Data Fig. 7b, d) that were significantly predictive if a Phee call was part of a turn-taking bout (two-sided Mann-Whitney’s U-test, p < 0.05). Only 763 neurons were considered in these Phee call analyses by the same minimal session selection criteria. Acknowledgments This work is supported by grants to AL (Marie Sklodowska-Curie fellowship 101018877 and ERC Starting grant 101116110) and to CTM (NIH R01 DC 012087). Competing interests The authors declare no competing interests Corresponding Author Correspondence to Arthur Lefevre (arthur.lefevre{at}isc.cnrs.fr). Supplementary Figure legends Figure S1. a) Coronal section from the Paxinos atlas showing the location of brush array (top) cannula insertion and Neuropixel (bottom) implantation site. b) Coronal MRI planes of monkey S before the first recording session (top) and after the last one (bottom), showing the trajectory of the 64 channels brush array through area 24. c) Sagittal MRI plane of monkey G showing (white arrows) the two 32 channels brush arrays implantation sites. d) Number of units (total = 2355) recorded per day with the 2 Neuropixel probes implanted in monkey L. Only sessions d4 to d9 were used for data analysis. .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint Figure S2. a) Picture taken during a recording session. b) Table summarizing the number of each call type per caller identity. c) Pie charts representing call type pro portions for the subject (top) and the partner (bottom). Note that most of the perceived Tsiks and Eks came from a few specific sessions and could not be analyzed. d) Examples of spectrograms of recorded vocalizations. Figure S3. a) Normalized activity o f neurons with significant increased response to vocalization production, per call type. b) Normalized activity of neurons with significant decreased response to vocalization production, per call type. c) Normalized activity of neurons with significant inc reased response to partner vocalization perception, per call type. d) Pie chart representing the neurons (n = 1599) responses: 217 neurons with increased activity for 1 type of vocalization production, 47 neurons with increased activity for several types o f vocalization production, 482 neurons with decreased activity for 1 type of vocalization production, 170 neurons with decreased activity for several types of vocalization production, 116 neurons with increased activity for 1 type of partner vocalization perception, 15 neurons with increased activity for several types of partner vocalization perception, 93 neurons with increased activity for 1 type of colony vocalization perception, 51 neurons with increased activity for both vocalization production and partner vocalization perception, 408 neurons with no significant responses. Figure S4. a) Decoding partner call type accuracy for each session (gray lines) depending on the number of neurons removed, red line is the average. b) Percentage accuracy of the neuronal multiclass decoder for perceived call type for real data and data with shuffled labels, after removing neurons with significant activity increase around call onset (Wilcoxon signed rank test, z = 3.65, p 0.05). Error bars indicate s.e.m. (standard error to the mean). .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint Figure S 5. a) Decoding accuracy expressed in fold above chance for produced and perceived vocalizations. * indicate significant decoding compared to chance (Wilcoxon sign rank test, p < 0.05, Bonferroni corrected). b) Correlation between the number of neurons and the decoding quality. c) Decoding produced call type performance for each session (gray lines) depending on the number of neurons removed, red line is the average. d) Percentage accuracy of the neuronal multiclass decoder for produced call type for real data and data with shuffled labels, after removing neurons with significant activity increase around call onset (Wilcoxon signed rank test, z = 4.95, p 0.05). Error bars indicate s.e.m. (standard error to the mean). Figure S6. a) PSTH examples and spike rasters of significant time bins predicting whether a trill call was part of an exchange (within 5 sec of a trill from the other monkey). b) PSTH examples and spike rasters of significant time bins predicting whether a phee call was part of an exchange (within 5 sec of a phee from the other monkey). c-d) Receiver Operator Characteristic (ROC) traces for our neural decoder trained on predictive time bins and tested on data with shuffled labels representing whether trills answered trills with the subject’s partner in panel c, the answering of phee calls with the colony in panel d. Blue solid lines represent calls initially produced by the subject while orange solid lines represent calls initially perceived by the subject. Random chance is represented by the red dashed line as a guide. The area under the curve (AUC) quantifies the success of the neural decoder as our outcome measure. .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint * Colony call type70 65 60 55 50 45 % Accuracy 40 Data Shuffled * Partner call type100 90 80 70 60 50 40 30 20 % Accuracy Data Shuffled Colony microphone and camera Partner Subject Perceiving calls Subject (n = 3) a Partner Colony 35 30 25 20 15 10 -4 -3 -2 -1 0 1 2 3 4 (s) 5 Firing rate (Hz) Call onseti 5 4 3 2 1 0 -4 -3 -2 -1 0 1 2 3 4 (s) Firing rate (Hz) Call onsetj -2 Trill Twitter Phee Chirp Call onset Normalized FR 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -4 -3 -2 -1 0 1 2 3 4 (s) 0.2 0.15 0.1 0.05 0 -0.05Normalized FR -4 -3 -2 -1 0 1 2 3 4 (s) Twitter Phee Call onset -0.1 f Partner Colony Fig 1. Segregated ACC neuronal populations encode vocalizations perceived from the partner and the colony Figure 1. Segregated ACC neuronal populations encode vocalizations perceived from the partner and the colony. a) Schema of the experimental paradigm. Top – recordings took place directly in the home cage and transfer box in the colony room. Center - both subject and partner monkeys were equipped with a portable microphone, and the implanted subject was also equipped with a neurologger. Bottom – Neurons were recorded with brush arrays or Neuropixel probes implanted in ACC area 24. b) Coronal plane (AP = + 12mm) at which probes were implanted. c) Percentage of neurons responding to vocalization production and perception. d) Vocalizations produced by the partner or other marmosets in the colony (vs produced ones) were analyzed separately. e) Normalized average increase from neurons with significantly upregulated activity during partner call perception. f) Normalized average increase from neurons with significantly upregulated activity during colony call perception. g-h) Percentage accuracy of the neuronal multiclass decoder for partner g) and colony (h) perceived call type for real data and data with shuffled labels. i) Example unit responding specifically to twitter calls from the partner but not from the colony. j) Example unit responding specifically to phee calls from the colony but not from the partner. k) Percentage accuracy of the neuronal SVM decoder for identity (partner vs colony) for real data and data with shuffled labels. *: p < 0. 05, error bars indicate s.e.m, (standard error to the mean). d Population average Population average Example neuron Example neuron A24 Wireless brush arrays or Neuropixels 1599 units b e g h Data Shuffled 90 80 70 60 50 40 % Accuracy k * Social category c 57% 17% 26% Production Perception None Modulated units .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint Fig 2 ACC units responds invariantly to calls with and without overlapping vocalizations Figure 2. ACC units responds invariantly to calls with and without overlapping vocalizations . a) Example spectrograms from the colony and partner microphones showing the occurrence of a call from the partner (twitter) that is perceived simultaneously with a call from the colony (phee). b) Example spectrograms from the colony and partner microphones showing the occurrence of a call from the partner (twitter) that is perceived sequentially to calls from the colony (twitters). c) Example neurons responding to twitter calls from the partner that are unaffected by the presence (or absence) of simultaneous calls from the colony. d) Normalized average increase from neurons with significantly upregulated activity during partner twitter call perception is unaffected by the presence (or absence) of simultaneous calls from the colony. *: p < 0.05, error bars indicate s.e.m. b Sequential Twitter Twitter Twitter Colony calls Partner calls a Colony mic Partner mic Simultaneous Phee Phee Twitter Twitter 2kHz 20kHz 2kHz 20kHz 16 12 8 4 0 c Example neurons All Tw Sequential Tw Simultaneous Tw -4 -3 -2 -1 0 1 2 3 4 (s) Firing rate (Hz) 0.5 0.4 0.3 0.2 0.1 0 -0.1Normalized FR -4 -3 -2 -1 0 1 2 3 4 (s) d 30 25 20 15 10 5 0 -4 -3 -2 -1 0 1 2 3 4 (s) Population average .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint Fig 3 Marmosets Monitor Social Scenes to Optimize Vocal Turn Taking in a Cocktail Party Figure 3. Marmosets Monitor Social Scenes to Optimize Vocal Turn Taking in a Cocktail Party. a) Example spectrograms showing trill to trill conversations between the subject and its partner. b) Example spectrograms showing phee to phee conversations between the subject and other monkeys in the colony. c) Normalized partner call rate for trills (purple) and phees (orange) aligned on subject’strills and phees onsets. d) Colony call rate for phees aligned on subject’strills (purple) and phees (orange). a Subject calls Partner calls Trill interactions with partner Phee interactions with colony Subject mic Subject calls Colony calls Partner mic Subject mic Colony mic Trills Trill Phees Phees b Trill Phee d 0.2 0.15 0.1 0.05 0 -0.05 -0.1Normalized colony call rate Subject call onset -15 -10 -5 0 5 10 15 (s) * * c Trill Phee 0.08 0.06 0.04 0.02 0 -0.02 Subject call onset -15 -10 -5 0 5 10 15 (s) -0.04Normalized partner call rate * * 2kHz 20kHz 2kHz 20kHz 2kHz 20kHz 2kHz 20kHz .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint -4 -3 -2 -1 0 1 2 3 4 (s) Fig 4 ACC neurons encode active turn taking with the partner and the colony 1.0 0.8 0.6 0.4 0.2 0 True positive rate e 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Predicting Tr answering Tr Produced Perceived Random AUC = 0.8465 AUC = 0.8472 Figure 4. ACC neurons encode active turn taking with the partner and the colony. a) Normalized average increase from neurons with significantly upregulated activity during call production. b) Example neuron responding to the production of several call types. c) Normalized average decrease from neurons with significantly downregulated activity during call production. d) Percentage accuracy of the neuronal multiclass decoder for produced call type for real data and data with shuffled labels. e-f) Receiver Operator Characteristic (ROC) traces for our neural decoder trained on predictive time bins representing whether trills answered trills with the subject’s partner in panel e, the answering of phee calls with the colony in panel f. Blue solid lines represent calls initially produced by the subject while orange solid lines represent calls initially perceived by the subject. Random chance is represented by the red dashed line as a guide. The area under the curve (AUC) quantifies the success of the neural decoder as our outcome measure. *: p < 0.05, error bars indicate s.e.m. Call onset -4 -3 -2 -1 0 1 2 3 4 (s) b Example unit 45 40 35 30 25 20 15 10 5 Firing rate (Hz) Population Average Call onset 0.15 0.1 0.05 0 -0.05 -0.1 -0.15 Normalized FR -0.2 c Population average Call onset a Trill Twitter Phee Chirp 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 Normalized FR d Shuffled * Data Subject call type 100 90 80 70 60 50 40 30 20 % Accuracy Predicting Ph answering Ph 1.0 0.8 0.6 0.4 0.2 0 True positive rate f 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Produced Perceived Random AUC = 0.7011 AUC = 0.9626 -4 -3 -2 -1 0 1 2 3 4 (s) .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint a b Figure S1 c Figure S1. a) Coronal section from the Paxinos atlas showing the location of brush array (top) cannula insertion and Neuropixel (bottom) implantation site. b) Coronal MRI planes of monkey S before the first recording session (top) and after the last one (bottom), showing the trajectory of the 64 channels brush array through area 24. c) Sagittal MRI plane of monkey G showing (white arrows) the two 32 channels brush arrays implantation sites. d) Number of units (total = 2355) recorded per day with the 2 Neuropixel probes implanted in monkey L. Only sessions d4 to d9 were used for data analysis. 0 50 100 150 200 250 300 0 1 2 3 4 5 6 7 8 9 10 13 15 20 24 27 30 Number of units Days after surgery Npx mouse 1.0 Npx NHP d .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint Figure S2 Voc type Subject Partner Colony Tr 5386 3201 Ph 1774 778 12293 Tw 877 959 9350 Chi 1677 643 Ts 243 452 Ek 126 802 a b c Perceived Produced Trill Twitter Phee Chirp Tsik Ek Figure S2. a) Picture taken during a recording session. b) Table summarizing the number of each call type per caller identity. c) Pie charts representing call type proportions for the subject (top) and the partner (bottom). Note that most of the perceived Tsiks and Eks came from a few specific sessions and could not be analyzed. d) Examples of spectrograms of recorded vocalizations. Trill Phee TwitterChirp d 2kHz 20kHz 2kHz 20kHz .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint d 1 Prod >1 Prod 1 Dec >1 Dec 1 Perc >1 Perc Col Both NR Figure S3 c Tr Tw Ph Chi Call onset b Call onset Tr Tw Ph Chi Figure S3. a) Normalized activity of neurons with significant increased response to vocalization production, per call type. b) Normalized activity of neurons with significant decreased response to vocalization production, per call type. c) Normalized activity of neurons with significant increased response to partner vocalization perception, per call type. d) Pie chart representing the neurons (n = 1599) responses: 217 neurons with increased activity for 1 type of vocalization production, 47 neurons with increased activity for several types of vocalization production, 482 neurons with decreased activity for 1 type of vocalization production, 170 neurons with decreased activity for several types of vocalization production, 116 neurons with increased activity for 1 type of partner vocalization perception, 15 neurons with increased activity for several types of partner vocalization perception, 80 neurons with increased activity for 1 type of colony vocalization perception, 51 neurons with increased activity for both vocalization production and partner vocalization perception, 408 neurons with no significant responses. a Tr Tw Ph Chi Call onset 50 100 150 200 250 300 20 40 60 80 100 120 140 100 200 300 400 500 600 700 800 -2 -1 0 1 2 3 s -2 -1 0 1 2 3 s -2 -1 0 1 2 3 s .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint Figure S4 b Figure S4. a) Decoding partner call type accuracy for each session (gray lines) depending on the number of neurons removed, red line is the average. b) Percentage accuracy of the neuronal multiclass decoder for perceived call type for real data and data with shuffled labels, after removing neurons with significant activity increase around call onset (Wilcoxon signed rank test, z = 3.65, p 0.05). Error bars indicate s.e.m. (standard error to the mean). Data Shuffled * Partner call type 100 90 80 70 60 50 40 30 20 % Accuracy aAccuracy (fold above chance) 180 160 140 120 100 80 60 40 20 0 Nb Neurons left Decoding while removing random neurons 2.2 2 1.8 1.6 1.4 1.2 1 0.8 .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint Figure S5 b Figure S5. a) Decoding accuracy expressed in fold above chance for produced and perceived vocalizations. * indicate significant decoding compared to chance (Wilcoxon sign rank test, p < 0. 05, Bonferroni corrected). b) Correlation between the number of neurons and the decoding quality. c) Decoding produced call type performance for each session (gray lines) depending on the number of neurons removed, red line is the average. d) Percentage accuracy of the neuronal multiclass decoder for produced call type for real data and data with shuffled labels, after removing neurons with significant activity increase around call onset (Wilcoxon signed rank test, z = 4. 95, p 0.05). Error bars indicate s.e.m. (standard error to the mean). Call onset Fold above chance * * a * -5 -4 -3 -2 -1 0 1 2 3 4 5 s Time in sec Produced Perceived d Data Shuffled * Subject call type 100 90 80 70 60 50 40 30 20 10 % Accuracy Accuracy (fold above chance) c Decoding while removing random neurons 140 120 100 80 60 40 20 0 Nb Neurons left 2.2 2 1.8 1.6 1.4 1.2 1 0.8 .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint Figure S6 b Figure S6. a) PSTH examples and spike rasters of significant time bins predicting whether a trill call was part of an exchange (within 5 sec of a trill from the other monkey). b) PSTH examples and spike rasters of significant time bins predicting whether a phee call was part of an exchange (within 5 sec of a phee from the other monkey). c-d) Receiver Operator Characteristic (ROC) traces for our neural decoder trained on predictive time bins and tested on data with shuffled labels representing whether trills answered trills with the subject’spartner in panel c, the answering of phee calls with the colony in panel d. Blue solid lines represent calls initially produced by the subject while orange solid lines represent calls initially perceived by the subject. Random chance is represented by the red dashed line as a guide. The area under the curve (AUC) quantifies the success of the neural decoder as our outcome measure. 1.0 0.8 0.6 0.4 0.2 0 True positive rate 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Predicting Tr answering Tr (shuffled) Produced Perceived Random AUC = 0.5290 AUC = 0.5940 c 1.0 0.8 0.6 0.4 0.2 0 True positive rate 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Predicting answering Ph (shuffled) Produced Perceived Random AUC = 0.4519 AUC = 0.5463 d a .CC-BY-NC 4.0 International licenseperpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for thisthis version posted October 17, 2025. ; https://doi.org/10.1101/2025.10.17.683014doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-NC-4.0