Full text
92,145 characters
· extracted from
preprint-html
· click to expand
Action or Stimulus: Individual beliefs about learned associations influence the processing of immediate and delayed feedback | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL European Journal of Neuroscience This is a preprint and has not been peer reviewed. Data may be preliminary. 1 December 2025 V1 Latest version Share on Action or Stimulus: Individual beliefs about learned associations influence the processing of immediate and delayed feedback Authors : Christine Albrecht 0000-0002-8388-2533 [email protected] , Marta Ghio , and Christian Bellebaum Authors Info & Affiliations https://doi.org/10.22541/au.176463133.30560512/v1 Published European Journal of Neuroscience Version of record Peer review timeline 178 views 161 downloads Contents Abstract Supplementary Material Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract Feedback learning seems to involve two systems, the striatal reward system and the medial temporal lobe (MTL), which have both been linked to event-related potential (ERP) components such as the feedback-related negativity (FRN) or reward positivity (RewP) and the N170, respectively. In this study we tested the hypothesis that the former system is more involved in associating the feedback with previous actions and the latter in associating the feedback with previous stimuli. More specifically, we hypothesized that the engagement of these systems depends on individual beliefs in credit assignment, i.e., whether participants linked the feedback they received to actions or stimuli, possibly modulated by feedback timing. Electroencephalography (EEG) data were recorded from 43 participants performing an ambiguous feedback-learning task, in which feedback could be attributed to either a performed action or a selected stimulus, according to the instruction. As revealed by an Action Index derived from behavioural data, the focus on stimulus-feedback associations was generally stronger than that on action-feedback associations. We found that both FRN/RewP and N170 were influenced by individual beliefs about learned associations, with the FRN/RewP showing stronger feedback valence coding across feedback delays when participants took action-feedback associations into account. Also prediction error coding in the N170 was more pronounced for stronger action-feedback association learning. The results seem to suggest that both learning systems are recruited, at least to some extent, when action-feedback and stimulus-feedback associations are considered simultaneously. Action or Stimulus: Individual beliefs about learned associations influence the processing of immediate and delayed feedback Christine Albrecht, Marta Ghio, Christian Bellebaum Heinrich Heine University Düsseldorf, Faculty of Mathematics and Natural Sciences, Institute of Experimental Psychology, Universitätsstraße 1, 40225, Düsseldorf, Germany Running Head: Individual Beliefs in Feedback Processing Word Count (main text): 9245 Word Count (total): 12154 Figure Count: 8 Acknowledgements This work was funded by the German Research Association (Grant 467460456). The authors thank René Mergenthaler for his help in data acquisition. Abstract Feedback learning seems to involve two systems, the striatal reward system and the medial temporal lobe (MTL), which have both been linked to event-related potential (ERP) components such as the feedback-related negativity (FRN) or reward positivity (RewP) and the N170, respectively. In this study we tested the hypothesis that the former system is more involved in associating the feedback with previous actions and the latter in associating the feedback with previous stimuli. More specifically, we hypothesized that the engagement of these systems depends on individual beliefs in credit assignment, i.e., whether participants linked the feedback they received to actions or stimuli, possibly modulated by feedback timing. Electroencephalography (EEG) data were recorded from 43 participants performing an ambiguous feedback-learning task, in which feedback could be attributed to either a performed action or a selected stimulus, according to the instruction. As revealed by an Action Index derived from behavioural data, the focus on stimulus-feedback associations was generally stronger than that on action-feedback associations. We found that both FRN/RewP and N170 were influenced by individual beliefs about learned associations, with the FRN/RewP showing stronger feedback valence coding across feedback delays when participants took action-feedback associations into account. Also prediction error coding in the N170 was more pronounced for stronger action-feedback association learning. The results seem to suggest that both learning systems are recruited, at least to some extent, when action-feedback and stimulus-feedback associations are considered simultaneously. Keywords: FRN/RewP, N170, Feedback-Association Types, Feedback Delay, Prediction Error List of Abbreviations EEG electroencephalography ERP event-related potential FRN feedback-related negativity GLME generalized linear mixed effect LME linear mixed effect MTL medial temporal lobe PE prediction error RewP reward positivity Action or Stimulus: Individual beliefs about learned material influence the processing of immediate and delayed feedback Introduction The ability to learn from feedback has primarily been linked to the dopaminergic reward system (Holroyd & Coles, 2002) and the striatum (Marco-Pallarés et al., 2007; Vassiliadis et al., 2024; see Daniel & Pollmann, 2014 for a review). The striatum projects to (and receives projections from) the anterior cingulate cortex (Chau et al., 2018; Hauser et al., 2014; see Holroyd & Coles, 2002), which has been identified as the neural generator of an event-related potential (ERP) component in response to feedback. The component appeared as a frontocentral negative deflection occurring about 250 ms after feedback presentation (Miltner et al., 1997; Yeung et al., 2005). It was thus termed feedback-related negativity (FRN; Gehring & Willoughby, 2002; Hauser et al., 2014; Nieuwenhuis et al., 2004). However, its exact nature remains unclear: While the more negative deflection after negative compared to positive feedback (Hajcak et al., 2006; Miltner et al., 1997) was initially interpreted as an indicator of negative feedback processing, more recent research suggests that the difference is rather caused by a positive deflection following positive feedback, termed Reward Positivity (RewP; Proudfit, 2015). In line with this, principal component analysis (Foti et al., 2011) revealed an N200-like component that was identical for rewards and non-rewards, and a positivity that occurred only for rewards and reduced the amplitude of the negative component in the combined signal. Both striatal activity (Carvalheiro et al., 2021; Diederen et al., 2017; Schönberg et al., 2007) and the ERP amplitude in the N200 time window (Burnside et al., 2019; Fischer & Ullsperger, 2013; Röhlinger, Albrecht, & Bellebaum, 2025; Röhlinger, Albrecht, Ghio, et al., 2025; Sambrook & Goslin, 2015; Weber & Bellebaum, 2024) reflect a reward prediction error (PE), highlighting the role of the dopamine system in feedback processing, as single dopamine neurons also show PE-related activity (Schultz et al., 1997; Zaghloul et al., 2009). Some studies show that such expectation/PE effects in the N200 time window are strongly pronounced only for positive feedback (Kirsch et al., 2022; Weber & Bellebaum, 2024), with more positive amplitudes the more unexpected the feedback, supporting the notion that the ERP signal is driven by a RewP. However, although expectation effects for negative feedback are less consistent, they have been reported (Hoy et al., 2021; Röhlinger, Albrecht, & Bellebaum, 2025), with more negative amplitudes the more unexpected the feedback. This argues against the assumption that ERP amplitude modulations in the N200 time window are only caused by a RewP, and rather suggests that the N200 can be modulated by feedback in two directions. As the components driving the ERP amplitude in the N200 time window remain unclear, we refer to the signal as FRN/RewP throughout this manuscript. This does, however, not imply that FRN and RewP are the same component, but rather that they determine the amplitude in combination. Evidence suggests that feedback learning can also involve the medial temporal lobe (MTL), specifically the hippocampus (Dickerson & Delgado, 2015; Dickerson et al., 2011). This involvement seems to depend on feedback timing (Foerde et al., 2013; Foerde & Shohamy, 2011): the striatum is more active in learning from immediate feedback (up to 1 s after the event the feedback refers to), while the MTL is more active in learning from delayed feedback (e.g., about 6 s after the event the feedback refers to). In this context, the MTL may help bridge the temporal gap between event and feedback (DuBrow & Davachi, 2016; Staresina et al., 2009). Recent studies suggest that this process may be reflected in the N170 ERP component (Albrecht et al., 2023; Arbel et al., 2017; Höltje & Mecklinger, 2020; Kim & Arbel, 2019; Röhlinger, Albrecht, & Bellebaum, 2025; Röhlinger, Albrecht, Ghio, et al., 2025), a negative deflection at temporoparietal sites occurring approximately 170 ms after the presentation of feedback. The N170 has traditionally been associated with higher-order visual processing (Rossion, 2014; Rossion et al., 2003) and originates in the fusiform gyrus (Deffke et al., 2007; Gao et al., 2019). In the context of feedback processing, the increased N170 for delayed compared to immediate feedback (e.g., Arbel et al., 2017; Höltje & Mecklinger, 2020) may reflect hippocampal activity (as suggested by Arbel et al., 2017). This is in line with recent evidence suggesting that, like the FRN/RewP, also the N170 reflects a PE during feedback processing (Röhlinger, Albrecht, & Bellebaum, 2025; Röhlinger, Albrecht, Ghio, et al., 2025), as PE-dependent activation patterns have been found in the hippocampus as well (Bein et al., 2020; Dickerson et al., 2011; Sinclair et al., 2021). Feedback delay influences how strongly the striatum and MTL contribute to feedback processing, but an open question is whether the type of association learned via feedback also shapes the recruitment of these systems, possibly in interaction with feedback timing. On the one hand, the striatum might be particularly suited for learning associations between actions (as opposed to stimuli) and feedback. This hypothesis rests on two lines of evidence. First, the striatum is involved in action selection and inhibition (Aubert et al., 2000; Calabresi et al., 2014) and specifically in motor learning (Nikooyan & Ahmed, 2015; Vassiliadis et al., 2024; Wessel et al., 2023). Second, parts of the dopaminergic reward system are more strongly involved in feedback processing when feedback is given for an own choice action (Bellebaum & Colosio, 2014; Bellebaum et al., 2012; Cooper et al., 2012; Kobza et al., 2012; O’Doherty et al., 2004; Yeung et al., 2005). On the other hand, given that the N170 reflects higher-order visual processing (Rossion, 2014; Rossion et al., 2003), it may reflect a reactivation of visual areas in the fusiform gyrus representing the stimulus associated with reward (Röhlinger, Albrecht, & Bellebaum, 2025), possibly modulated by the hippocampus (Foerde et al., 2013; Foerde & Shohamy, 2011; Lighthall et al., 2018). Such a mechanism could help bind reward outcomes to preceding stimuli (Singer & Frank, 2009). For instance, reactivation of the primary somatosensory cortex was observed at reward delivery in a somatosensory discrimination task (Pleger et al., 2008; Pleger et al., 2009), and stimulus-specific visual representations were reactivated upon feedback presentation in a task with visual stimuli (Schiffer et al., 2014). We first addressed the research question about the potential effects of the type of association that is learned on feedback learning and processing in two recent ERP studies (Röhlinger, Albrecht, & Bellebaum, 2025; Röhlinger, Albrecht, Ghio, et al., 2025). Specifically, based on the assumptions that the striatum is more involved in linking actions to feedback, while the MTL is more involved in linking (visual) stimuli to feedback, we tested the hypotheses that the FRN plays a more important role in feedback processing when action-feedback associations are learned (Röhlinger, Albrecht, Ghio et al., 2025) and the N170 is more important for learning associations between visual stimuli and feedback (Röhlinger, Albrecht, Ghio et al., 2025; Röhlinger, Albrecht, Bellebaum, 2025). Indeed, we found initial evidence that feedback processing in the FRN and the N170 depends on the type of learned association, although the results were only partially in line with our hypotheses. We found that the PE effect on the N170 was more pronounced for stimulus-feedback associations than action-feedback associations (Röhlinger, Albrecht, Ghio et al., 2025), and more pronounced for associations between visual stimuli and feedback than auditory stimuli and feedback, especially for delayed feedback and over the right hemisphere (Röhlinger, Albrecht and Bellebaum, 2025). For the FRN/RewP, PE processing was more pronounced for action-feedback associations compared to a condition with stimulus-feedback associations without response requirement, as expected. PE processing in the FRN/RewP was strongest, however, when feedback was associated with stimuli. A potential explanation for this finding is that in these previous studies the stimuli had to be actively chosen, which may have created the belief in the participants that feedback also depended on their chosen action. Indeed, it can be ambiguous whether feedback refers to stimulus or response, a phenomenon related to the credit assignment problem (see e.g. Bogacz et al., 2007; Dam et al., 2013; Schiffer et al., 2014; for a review, see Rubin et al., 2021). In the present study we investigated in how far feedback processing depends on participants’ individual beliefs on whether a feedback stimulus refers to an action or a visual stimulus. We also examined whether feedback timing modulates this effect. The current study thus aims to expand upon previous findings by having all participants perform the same feedback-learning task, with instructions designed to modulate their tendency to learn either action-feedback or stimulus-feedback associations. We hypothesized the difference between positive and negative feedback as well as PE processing in the FRN/RewP to be most pronounced in immediate feedback and for participants who attribute feedback to a preceding action. In contrast, we expected the N170 amplitude as well as PE processing in the N170 to be most pronounced for participants who attribute feedback to a preceding stimulus, especially for delayed feedback. Method The study was preregistered at https://doi.org/10.17605/OSF.IO/Z9MPA. Changes compared to the procedure described in the preregistration will be pointed out in the Method section. 2.1 Participants Previous studies showed that 20 participants per group sufficed to reveal feedback and PE processing differences in feedback-related ERPs (Bellebaum & Colosio, 2014; Burnside et al., 2019). Therefore, we aimed at overall 40 participants, distributed in two groups (i.e., the stimulus-instruction group and the action-instruction group, see below). Assuming a dropout rate of 20%, we acquired 50 participants, as determined in the preregistration. We excluded 7 participants overall: one because the instructions were not followed correctly; three because of accuracy rates below 55% in the feedback learning task; one because of more than 5% of missed responses; two because of alpha activity in the EEG data. The remaining 43 participants were between 19 and 37 years old ( M = 25.1 years, SD = 4.7 years; 29 women, 14 men). All had normal or corrected-to-normal vision and normal hearing abilities, no neurological or psychological illnesses and did not take any medication affecting the central nervous system. Two participants were ambidextrous, 33 right-handed and 8 left-handed. The groups consisted of 22 participants in the stimulus-instruction group (19-37 years, M = 26.2 years, SD = 5.5 years; 13 women, 9 men) and 21 participants in the action-instruction group (19-30 years, M = 23.9 years, SD = 3.4 years, 16 women, 5 men, see below for details concerning the groups). The study complied with the declaration of Helsinki and was approved by the ethics committee of the Faculty of Mathematics and Natural Sciences at Heinrich-Heine-University, Düsseldorf. 2.2 Experimental task The experimental task was an adapted version of Röhlinger, Albrecht, Ghio, et al. (2025) and included 6 learning sessions, each with 4 blocks. One block always contained one learning part of 20 trials, followed by a test part of 20 trials. In each trial of the learning parts, participants were asked to choose between two actions/stimuli and received either positive or negative monetary feedback, that is, +4 ct vs. – 2 ct in Euro (€) currency. Participants received a baseline amount of 7 € and could maximize their reward during the task, which would be paid to them after the completion of the study. Participants were instructed that they could realistically achieve a reward between 17 and 25 €. After completing the task, they learned that their reward was rounded up to 25 €, ensuring equal reimbursement for all. We manipulated Feedback Timing within-subject in different sessions: participants received either immediate feedback (1 s delay) or delayed feedback (7 s delay) in two learning sessions each. In two other learning sessions, feedback was also delayed, but six regular tones were presented each second of the delay to enable similar temporal predictability of delayed feedback as for immediate feedback (Kimura & Kimura, 2016). As in Röhlinger, Albrecht, Ghio, et al., this resulted in three Feedback Timing conditions: immediate, delayed without tone, and delayed with tone. The Feedback Timing was always constant within one learning session, and the order of the conditions across learning sessions was the same for the first three and last three sessions (e.g., ABCABC) and counterbalanced across participants. Participants were instructed that learning started anew in each learning session. In our previous study (Röhlinger, Albrecht, Ghio, et al., 2025), one group of participants learned associations between actions and feedback and another group associations between stimuli and feedback. In this study, however, all participants engaged in the same learning task that involved both stimuli and actions. The associations between stimuli and actions, on the one hand, and feedback, on the other hand, were ambiguous. We manipulated the instructions to have participants focus on either action-feedback or stimulus-feedback associations. To achieve ambiguity, reward probabilities depending on the stimuli and on actions were the same. We used two stimuli in each learning session that could either be presented such that stimulus A was on the left and stimulus B on the right, or the other way around. For one constellation (e.g., stimulus A on the left), a specific action (e.g., a left button press) led to reward in 80% of the trials and to punishment in 20% (probabilities were reversed for the other button press). For the other constellation (e.g., stimulus A on the right), however, reward and punishment probabilities were 50% for both actions. This design created ambiguous contingencies: participants could learn that either a particular action or a particular stimulus led to reward more often than the alternative choice. The respective probabilities for reward and punishment then were 65% vs. 35% for both stimulus-dependent and action-dependent learning (as in Röhlinger, Albrecht, Ghio, et al., 2025). For the example provided above, participants could either learn that a left button press was more often rewarded than a right button press or that stimulus A was more often rewarded than stimulus B. Participants in the action-instruction group were instructed that rewards and punishments were dependent on their action (right or left button press), while participants in the stimulus-instruction group were instructed that rewards and punishments were dependent on the stimulus they had chosen (A or B). Action-instruction participants were told that the stimuli were irrelevant for learning. In contrast, stimulus-instruction participants were instructed that the side on which the stimuli appeared, and thus the chosen action, was of no relevance. We used 12 Hiragana characters as stimuli in the experiment (see Röhlinger, Albrecht, & Bellebaum, 2025). These were randomly chosen from a set of 12 stimuli for each participant, so that two stimuli were assigned to each learning session. It was also randomly determined which of the stimuli was more often rewarded than the other. In three random learning sessions, the correct stimulus was rewarded more often when it was presented on the right, and in the other three learning sessions, when it was presented on the left. Figure 1 shows the trial sequence. Each experimental trial in the learning parts started with a 500 ms fixation cross that we instructed participants to focus on. Subsequently, two stimuli appeared on the screen, one on each side of a centrally presented fixation cross (screen side was counterbalanced). Participants had a maximum of 3000 ms to press a right or left button as a response. After the button press, the respective stimulus was highlighted for 700 ms before disappearing, leaving only the fixation cross visible on the screen. Depending on the Feedback Timing condition, the fixation cross stayed on screen for 300 ms (immediate feedback condition) or 6300 ms (two delayed feedback conditions). In the delayed feedback without tone condition, participants saw only the fixation cross during the delay. In the delayed feedback with tone condition, participants additionally heard six 800 Hz tones that were 700 ms long. The first tone started exactly one second after the choice (or 300 ms after the highlighted stimulus disappeared), and the following tones each one second after the previous tone onset. After the delay, feedback (+4 ct or -2 ct in either blue or red, respectively) was presented for 1000 ms. If participants failed to answer within 3000 ms, they immediately received a note to answer more quickly, and the trial was discarded from the data analyses. Trials in the test part matched trials in the learning part, but participants received no feedback in these trials. Instead, the next trial started immediately after the highlighting of the chosen stimulus and the display of a 300 ms fixation cross. 2.3 EEG Recording EEG data were collected using a BrainAmp DC amplifier and Brain Vision Recorder software (BrainProducts, Germany) with 60 scalp electrodes affixed with an actiCap textile softcap (BrainProducts, Germany). The electrodes were distributed according to the extended 10-20 system, and were placed at the following scalp locations: AF3, AF4, AF7, AF8, C1, C2, C3, C4, C5, C6, CP1, CP2, CP3, CP4, CP5, CP6, CPz, Cz, F1, F2, F3, F4, F5, F6, F7, F8, FC1, FC2, FC3, FC4, FC5, FC6, FT10, FT7, FT8, FT9, Fz, O1, O2, Oz, P1, P2, P3, P4, P5, P6, P7, P8, PO10, PO3, PO4, PO7, PO8, PO9, POz, Pz, T7, T8, TP7, and TP8. The online reference was recorded at FCz, while the ground electrode was placed at AFz. Two additional electrodes were positioned on the left and right mastoids. Vertical eye movements were recorded with two electrodes (vertical electroocolugram) below and above the left eye. The EEG was recorded at a sampling rate of 1000 Hz using an online low-pass filter of 100 Hz. Impedances were kept below 15 kΩ. 2.4 Procedure Upon arrival, participants gave written informed consent to participate in the study and completed a demographic questionnaire. EEG electrodes were then attached to the scalp. The following computer experiment was presented on a 1920 * 1080 px desktop monitor and responses were made on a Cedrus response pad (RB-740, Cedrus Corporation, San Pedro, CA, USA). The computer experiment itself lasted about 75 minutes, and, together with EEG preparation, the session lasted about 2.5 hours. The experiment was controlled by Presentation Software (Version 22.0, Neurobehavioral Systems, Albany, CA, USA). After completing the experiment, participants received 25 € reimbursement in the week following their participation. 2.5 Data Analysis 2.5.1 Behavioral Data Analyses 2.5.1.1 Accuracy. For the behavioral data analyses, we investigated accuracy, and thus learning, based on those trials in which the correct response for stimulus-based learning was the same as for action-based learning (e.g., when the stimulus appearing on the left was more often rewarded than the stimulus on the right). Although we preregistered analyses of the test parts only, we decided to also analyze the learning parts for the following reasons: first, participants might respond differently when feedback is available; second, they might already have learned associations in the first learning block, making effects harder to detect in the test blocks alone. Separate analyses were conducted for the learning parts and test parts. Specifically, we conducted two general linear mixed effect (GLME) analysis using the lme4 package in R (Bates et al., 2015): Accuracy in the learning parts was set as dependent variable in the model specified for the first analysis, whereas accuracy in the test parts was set as dependent variable in the model specified for the second analysis. In both models, the fixed effect predictors entering the analysis were Instruction (categorical between-subject; action-instruction group [-0.5], stimulus-instruction group [0.5]) and Feedback Timing (categorical within-subject; immediate feedback, delayed feedback without tone, and delayed feedback with tone [using a simple coding contrast matrix with immediate feedback as baseline]). To investigate learning effects over time, we also included the factor Block (continuous within-subject; blocks 1-4 for each session, scaled to lie between -0.5 and 0.5). We included random intercepts and slopes per participant. Following guidelines for best practice (Meteyard & Davies, 2020), we set the highest complex model possible, as long as the inclusion of each random slope did not lead to over- or underfitting, using the function buildmer (version 2.11). This resulted in the following model for both analyses: \begin{equation} Accuracy\ \sim\ Instruction*Feedback\ Timing*Block+\left(1+Feedback\ Timing\ *\ Block\ \right|Subject)\nonumber \\ \end{equation} 2.5.1.2 Action Index. We calculated an Action Index for each participant in each block, reflecting the participants’ belief whether a specific action or a specific stimulus predicts reward. To calculate the Action Index, we analyzed choices from those test trials in which both responses by the participants led to reward in half of the trials during the learning, (e.g., when stimulus A was presented on the right side in the example provided above). In these trials, the “better action” and the “better stimulus” were dissociated. For example, if a participant had learned that stimulus A predicted reward more often than B, they would press the right button when stimulus A was presented on the right side. If, however, a participant had learned that left button presses were rewarded more often than right button presses, they would press the left button also in trials in which stimulus A was shown on the right. A higher Action Index indicated that participants attributed feedback more to their action, while a lower Action Index indicated that participants attributed feedback more to the chosen stimulus. The Action Index thus reflected to what extent the participants’ beliefs concerning associations between actions and feedback, on the one hand, and stimulus and feedback, on the other hand, corresponded to the instruction. According to the preregistration we wanted to use one Action Index per participant, across all blocks within-subject. However, because we observed differences in the Action Index between Feedback Timing conditions and because the Action Index depends on learning success (high and low Action Index values for high learning success, medium values for low learning success), we decided to calculate the Action Index for each block. To check for differences in Action Index between the Instruction groups, but also between Feedback Timing conditions and across the Blocks, we specified a model containing the Action Index as dependent variable and the same fixed effects as in the analyses of accuracy (see above). The random effect structure was determined according to the procedure described above. This resulted in the model: \(Action\ Index\ \sim\ Instruction*Feedback\ Timing*Block+\left(1\ \right|Subject)\) 2.5.1.3 PE Modelling. We modeled PEs based on learning trial responses using a reinforcement learning model (Rescorla & Wagner, 1972; Sutton & Barto, 2018) to calculate the PE: \begin{equation} Q_{c,t+1}=\ Q_{c,t}+\alpha\ *\ \delta_{c,t}\nonumber \\ \end{equation} \(Q_{c,t}\) is the value of the chosen action/stimulus in a given trial,α is the participant’s individual learning rate. The PE (\(\delta_{c,t}\)) was calculated as: \begin{equation} \delta_{c,t}\ =\ r_{t}-Q_{c,t}\nonumber \\ \end{equation} The reward \(r_{t}\) is 1 for positive and 0 for negative feedback. Defining the best fitting model involves some degrees of freedom (see Burnside et al., 2019; Röhlinger, Albrecht, & Bellebaum, 2025; Weber & Bellebaum, 2024). To account for that, we tested 16 different model constraints and chose the one with the lowest Akaike Information Criterion (for a detailed description of all models and their comparisons, see section S1 of the Supplementary Material). The final model was based on both chosen actions and chosen stimuli: For each participant, independent reinforcement learning models were fitted for action-feedback and stimulus-feedback associations. Each of the six action pairs and six stimulus pairs (for six learning sessions) started with a value (\(Q_{c,t}\)) of 0.5. Values were updated (using the respective reinforcement learning model) in each trial based on the participants’ choice and the respective feedback. The value of the unchosen stimulus or action was always 1-\(Q_{c,t}\). The probability\(p\) of the combined action and stimulus models selecting the same action/stimulus as the participant was calculated for each trial using a softmax function depending on the prior values of both available actions and stimuli combined. The prior values were calculated as a combination of action and stimulus expectations, weighted by the Action Index across all trials per participant (AI in the formula; the overall Action Index weight resulted in better model fit than Action Index per block, see Supplementary Material, Section S1): \begin{equation} Qc,t\ combined\ =\ Qc,t\ action\ *\ \ AI\ +\ Qc,t\ stimulus\ *\ \ (1-AI)\nonumber \\ \end{equation}\begin{equation} Qu,t\ combined\ =\ 1\ -\ Qc,t\ combined\nonumber \\ \end{equation} The exploration parameter \(\beta\) reflects the degree to which the prior values affect the participant’s choices. \begin{equation} p_{c,t}=\ \frac{e^{Qc,t\ *\ \beta}}{e^{Qc,t\ *\ \beta}+\ e^{Qu,t\ *\ \beta}}\nonumber \\ \end{equation} The determined probabilities \(p\) were used to calculate the negative summed log-likelihood \((-LL)\) which was used as a measure for the model’s goodness of fit: \begin{equation} -\Sigma\ \log(p_{c,t_{1},\ldots,n_{\text{trials}}})\nonumber \\ \end{equation} The final model included two different learning rates for positive and negative feedback, meaning that the model allowed participants to differ in how well they learned from positive compared to negative feedback. For trials with positive feedback, the learning rate\(\alpha_{\text{pos}}\) was used, for trials with negative feedback, the learning rate \(\alpha_{\text{neg}}\) was used. We used MATLAB’s fmincon function to estimate the free parameter values (\(\alpha_{\text{con}}\), \(\alpha_{\text{dis}}\) , \(\beta\) ) for the reinforcement level that best matched the model’s predictions to participant behavior by minimizing \(-LL\). Parameters were constrained to [0; 1] for \(\alpha_{\text{con}}\)and \(\alpha_{\text{dis}}\) and [0;100] for \(\beta\). To avoid local minima, we ran the fitting process 50 times with random starting values (within their constraints) for the three free parameters. After determining the best-fitting free parameters per participant, the parameters were used to calculate action and stimulus values in each trial for the respective participant, using independent reinforcement learning models, and combined PE values dependent on the action and stimulus values: \begin{equation} \delta_{c,t}\ =\ r_{t}-{(Q}_{c,t\ action}\ *\ AI\ +Q_{c,t\ stimulus}\ *\ (1-AI))\ \nonumber \\ \end{equation} Please note that separate PE values for stimulus and action were used to update action and stimulus values in each trial, but the combined PE was used as a measure of expectancy for the FRN/RewP and N170 analyses. The model results in PE values between -1 (unexpected negative feedback) und 1 (unexpected positive feedback). To dissociate feedback valence and expectancy, we used the unsigned PE (absolute values) and feedback valence as separate predictors in the later EEG analyses (see Röhlinger, Albrecht, & Bellebaum, 2025; Röhlinger, Albrecht, Ghio, et al., 2025; Weber & Bellebaum, 2024). For an investigation of learning rates (α values) between instruction groups, and an exploration of expected values and PE across the trials of each learning session, see section S2 in the Supplementary Material. 2.5.2 EEG Data Analyses 2.5.2.1 Preprocessing. EEG data preprocessing was performed using BrainVision Analyzer 2.2 (Brain Products GmbH, 2018). The data were re-referenced to the average of all scalp electrodes (leaving out the mastoids because of bad data quality of the left mastoid electrode), then filtered with a 30 Hz low-pass and 0.1 Hz high-pass filter. We used independent component analysis (ICA) and reverse ICA to remove horizontal eye movement artifacts by identifying and removing a blink-related component. Segments were created from 200 ms before to 800 ms after feedback onset, baseline-corrected on the basis of the 200 ms before the feedback, and those with artifacts were automatically removed (i.e., all segments with voltage steps > 50 µV/ms, differences between values of > 100 µV or 80 µV or < -80 µV). We computed averages across segments for each feedback condition (negative or positive for immediate, delayed without tone, delayed with tone), resulting in six averages per participant. Averages and single-trial segments were exported for further analyses in MATLAB R2021a (The MathWorks, Inc., 2021). Although we originally planned to derive single trial ERP amplitudes based on the negative feedback-positive feedback difference wave, similar to our previous studies (Röhlinger, Albrecht, & Bellebaum, 2025; Röhlinger, Albrecht, Ghio, et al., 2025), visual inspection of the frontocentral ERPs showed that differences between conditions already occurred at and before the P2 component and thus before the FRN/RewP (see Figure 4A). Both the N200 and RewP have been identified to occur after the P2 (see Foti et al., 2011), and are thus influenced by this component. FRN/RewP calculation based on the difference wave thus seemed inappropriate. Instead, we opted for a peak-to-peak measure to assess modulations of the ERP by the FRN/RewP to account for differences of the preceding P2 (e.g., Holroyd et al., 2003; Höltje & Mecklinger, 2020; Peterburs et al., 2016). For the FRN/RewP analysis, we used data from a frontocentral cluster of electrodes (Fz, FC1, FCz, FC2, Cz; Weber & Bellebaum, 2024). For each feedback timing condition and each participant, we used the average signal to find the maximum negative peak between 200 and 400 ms after feedback onset and the preceding maximum positive peak between 100 ms after feedback onset and the latency of the negative peak (corresponding to the P2). Importantly, we derived these latencies individually by participant and condition. An overview of the mean and distribution of latencies of the FRN/RewP is given in the Supplementary Material, Table S1. We used these latencies to extract the amplitude values for single-trial segments: for each segment, we extracted the averaged amplitudes from 10 ms before to 10 ms after the respective negative and positive peak latency for the respective participant and condition. We then subtracted the amplitude corresponding to the positive peak from the amplitude corresponding to the negative peak in each segment. For the N170 analysis, we used data from electrodes P7 and P8 (as in Röhlinger, Albrecht, & Bellebaum, 2025; Röhlinger, Albrecht, Ghio, et al., 2025). Negative peak latencies between 140 ms and 250 ms post-feedback were determined in the averages for each participant, feedback condition and electrode. An overview of the mean and distribution of latencies of the N170 is given in the Supplementary Material, Table S2. Single-trial amplitudes were again calculated as the mean signal 10 ms before to 10 ms after the identified participant-, condition- and electrode-specific peak latency. 2.5.2.2 Statistical Analysis of the ERP data. The ERP data were analyzed using linear mixed effect (LME) analyses with the lme4-package in R (Bates et al., 2015). For the FRN/RewP, we created a model with single-trial amplitudes at the pooled signal of Fz, FCz, Cz, FC1 and FC2 as dependent variable (see above). As fixed-effect predictors, we modelled the Action Index (continuous between-subject factor, scaled and mean-centered), Feedback Timing (categorical within-subject factor with levels immediate feedback, delayed feedback without tones and delayed feedback with tones [using a simple coding matrix with immediate feedback as baseline]), Feedback Valence (categorical within-subject factor with levels negative feedback [-0.5] and positive feedback [0.5]), as well as unsigned PE (continuous within-subject factor, scaled and mean-centered). Because Action Index values differed both between and within subjects, adding random slopes by participant might lead to endogeneity effects (Antonakis et al., 2021). Therefore, we only included random intercepts by participant, resulting in the model: \begin{equation} FRN/RewP\ \sim\ Action\ Index*Feedback\ Timing*Feedback\ Valence*\text{PE}_{\text{absolute}}+\left(1\ \right|Subject)\nonumber \\ \end{equation} For the N170, we created a model with single-trial N170 amplitudes at P7 and P8 as dependent variable and the same fixed-effect predictors as for the FRN/RewP analysis. Again, we did not include random slopes by participant due to possible endogeneity, but random intercepts by participant. Additionally, we added random intercepts per electrode (P7, P8) to account for potential amplitude differences between the hemispheres, resulting in the model: \(N170\ \sim\ Action\ Index*Feedback\ Timing*Feedback\ Valence*\text{PE}_{\text{absolute}}+\left(1\ \right|Subject)\ +\ (1|Electrode)\) To determine outliers for both analyses, we first calculated the models based on all data. Subsequently, we calculated residuals and removed all datapoints in which the residuum differed more than 2 standard deviations from the mean. The model was then refitted to the cleaned data. The number of included datapoints in the final model is reported for both analyses in the Supplementary Material (Table S7 and S9). As we had hypotheses concerning four-way interactions for both models and it might be questionable whether our sample size was large enough for finding such an effect, we added Bayes statistics for the four-way interaction effects, calculated with the brms package in R (Bürkner, 2017). Results 3.1 Behavioral Results For descriptive results concerning accuracy in the learning trials see Figure 2A. There was a significant main effect of Block, z = 2.86, p = .004, b = 0.34. Accuracy increased in later compared to earlier blocks. All other effects were not significant (all p ≤ .139). For additional statistical results for the GLME analysis of accuracy in the learning trials, including estimates, z-values and confidence interval, see Table S3 in the Supplementary Material. For descriptive results concerning accuracy in the test trials, see Figure 2B. For accuracy in the test trials, we found a three-way interaction between Instruction, the contrast between immediate feedback and delayed feedback without tones, and Block, z = 2.31, p = .021. Resolving this interaction, there was no significant two-way interaction between Block and the contrast between immediate feedback and delayed feedback without tones, z = -0.86, p = .388, for the action-instruction group, but for the stimulus-instruction group, z = 2.30, p = .021. For delayed feedback without tones in the stimulus-instruction group, there was a trend effect for Block, z = 1.76, p = .079, b = 0.80, and no effect emerged for immediate feedback ( p = .120). No other main or interaction effects were significant ( p ≥ .230). Even though no increase in the number of correct responses over the blocks was found, accuracy was well above chance level in all conditions, indicating that participants learned to choose the more rewarding options, which is consistent with the results of the analysis of accuracy in the learning trials (see above). For additional statistical results for the GLME analysis of accuracy in the test trials, including estimates, z-values and confidence interval, see Table S4 in the Supplementary Material. For descriptive results concerning the Action Index, see Figure 3. The Action Index differed significantly between the action-instruction and the stimulus-instruction group, F (1,41.00) = 26.08, p < .001, b = -0.32, with larger Action Index values for the action-instruction group ( M = 0.59, SD = 0.24) than for the stimulus-instruction group ( M = 0.28, SD = 0.16). Additionally, the Action Index was dependent on Feedback Timing, F (2,979.00) = 3.23, p = .040, but with no difference either between immediate feedback and delayed feedback without tones ( p = .364), or between immediate feedback and delayed feedback with tones ( p = .109). In an additional contrast, we found a significant difference between the delayed feedback conditions, t = -2.51, p = .012, b = -0.06, with lower Action Index values for delayed feedback with tones. All other results were not significant (all p ≤ .078). For additional statistical results for the GLME analysis of the Action Index, see Table S5 in the Supplementary Material. To further explore Action Index values, we conducted two additional analyses: First, we checked whether variation within-subjects (between blocks) and between-subjects (within groups) differed between the action-instruction and stimulus-instruction groups. Second, we analyzed in how far Action Index values, or rather, their deviation from 0.5, depended on learning success. These checks are reported in the Supplementary Material, section S3. Action Index variation within- and between-subjects was high, but did not differ between Instruction groups. While Action Index values were modulated by learning success, the analysis showed that even for less successful learners, Action Index values differed from chance, making it a valid predictor of individual beliefs. 3.2 FRN/RewP Results The grand averages and topographies of the FRN/RewP are displayed in Figure 4. For descriptive model statistics, see Figure 5. Results concerning the effects of Feedback Valence, Feedback Timing and PE on the FRN/RewP replicated findings of previous studies (Arbel et al., 2017; Burnside et al., 2019; Kim & Arbel, 2019; Peterburs et al., 2016; Weber & Bellebaum, 2024; Weinberg et al., 2012): we found a significant main effect of Feedback Valence, F (1,19306.58) = 29.87, p < .001, b = 0.52, with more negative amplitudes after negative feedback. Further, an interaction between Feedback Timing and Feedback Valence was found, F (2,19304.92) = 6.28, p = .002, with a Feedback Valence effect for immediate feedback, F (1,19313.61) = 36.27, p < .001, b = 0.99, but no effect for delayed feedback without tones ( p = .135) and delayed feedback with tones ( p = .055). The interaction was present for both contrasts, immediate versus delayed without tone, t (19304.75) = 3.18, p = .001, and immediate versus delayed with tone, t (19304.98) = -2.95, p = .003. We found a significant Feedback Timing main effect, F (2,19307.09) = 48.56, p < .001, with more negative amplitudes for the two delayed feedback conditions compared to immediate ( t = -7.41, p < .001, b = -0.86 for delayed feedback without tones, t = -9.34, p < .001, b = -1.07 for delayed feedback with tones), in accordance with Peterburs et al. (2016). Finally, a PE main effect, F (1,19306.19) = 4.00, p = .046, b = 0.36, could be further explained by a significant PE by Feedback Valence interaction, F (1,19344.90) = 23.52, p < .001. Manual simple slopes analyses revealed that for positive feedback, F (1,19339.80) = 24.08, p < .001, b = 1.27, amplitudes were significantly more positive for high compared to low PE values. For negative feedback, amplitudes were significantly more negative for high compared to low PE values, F (1,19334.34) = 4.61, p = .032, b = -0.56. Resolving the interaction by PE, we found no significant Feedback Valence effect for low PE values ( p = .777), but for high PE values, F (1,19336.03) = 59.24, p < .001, b = 0.99, with more negative values for negative compared to positive feedback. With respect to our hypotheses for the FRN/RewP, we expected the Feedback Timing by Feedback Valence interaction described above to be further modulated by the Action Index. Indeed, we found an interaction between Feedback Timing, Feedback Valence and Action Index, F (2,19304.86) = 3.71, p = .024, that is depicted in Figure 8A. However, contrary to our expectations, simple slope analyses showed that a Feedback Timing by Feedback Valence interaction emerged only for lower Action Index values (i.e. for participants who attributed the feedback more to the preceding stimulus than action), F (2,19304.56) = 9.03, p < .001, and not for higher Action Index values ( p = .764). Further resolving by Feedback Timing, a Feedback Valence effect emerged for lower Action Index values only in immediate feedback, F (1,19313.02) = 25.43, p < .001, b = 1.27, with more negative amplitudes for negative feedback, and not in the two delayed feedback conditions ( p ≥ . 847). To investigate the pattern for higher Action Index values, we performed further simple slope analyses and found that for higher Action Index values, peaks were more negative for negative than positive feedback in all Feedback Timing conditions (immediate: p = .002, b = 0.71, delayed feedback without tones: p = .030, b = 0.49, delayed feedback with tones: p = .006, b = 0.66). We expected a four-way interaction between Action Index, Feedback Timing, Feedback Valence and PE, which was not significant ( p = .694, Bayes factor < 0.01). We also found no other significant interactions that included Action Index and PE, and no other main and interaction effects (all p ≥ .089). For additional statistical results of the FRN/RewP model, including b -values and confidence intervals, see Table S6 in the Supplementary Material. For information about the included datapoints per subject and condition, see Table S5 in the Supplementary Material. 3.3 N170 Results The grand averages and topographies of the N170 are displayed in Figure 6. For descriptive model statistics, see Figure 7. As for the FRN/RewP, main and interaction results including the factors Feedback Valence, Feedback Timing and PE replicated results of previous studies (Arbel et al., 2017; Höltje & Mecklinger, 2020; Kim & Arbel, 2019; Röhlinger, Albrecht, & Bellebaum, 2025; Röhlinger, Albrecht, Ghio, et al., 2025): we found a significant main effect of Feedback Valence, F (1,38631.46) = 114.73, p < .001, b = 0.88, with more negative amplitudes for negative compared to positive feedback, and a significant main effect of Feedback Timing, F (2,38631.56) = 97.52, p < .001, with larger (more negative) amplitudes for the two delayed feedback conditions compared to the immediate feedback condition, t = -9.11, p = < .001, b = -0.92 for delayed feedback without tones, t = -13.73, p = < .001, b = -1.37 for delayed feedback with tones. Replicating very recent findings from our group (Röhlinger, Albrecht, & Bellebaum, 2025; Röhlinger, Albrecht, Ghio, et al., 2025), we found a significant interaction between Feedback Valence and PE, F (1,38657.84) = 34.66, p < .001. Simple slope analyses revealed that a PE effect emerged for negative feedback, F (1,38645.89) = 15.72, p < .001, b = 0.90, with more negative amplitudes for lower compared to higher PE values, and for positive feedback, F (1,38650.08) = 21.15, p < .001, b = -1.03, with more positive amplitudes for lower compared to higher PE values. Resolving the interaction by PE, amplitudes were more negative for negative than positive feedback for low PE values, F (1,38648.23) = 123.22, p < .001, b = 1.38, and high PE values, F (1,38647.33) = 11.31, p = .001, b = 0.38, with a larger effect for low PE values. Concerning our hypotheses for the N170, we expected the Feedback Valence and Feedback Timing effects to be further modulated by Action Index, and expected a four-way-interaction between Feedback Valence, Feedback Timing, PE, and Action Index. However, this was not significant ( p = .954, Bayes Factor < 0.01) and we only found a significant three-way interaction between Action Index, Feedback Valence and PE, F (1,38654.51) = 4.55, p = .033. The two-way interaction between Feedback Valence and PE was stronger for higher Action Index values, F (1,38655.52) = 33.53, p < .001, than lower Action Index values, F (1,38658.83) = 7.33, p = .007. For higher Action Index values, a significant effect of PE emerged for negative feedback, F (1,38641.40) = 15.39, p < .001, b = 1.23 (low PE = more negative) and positive feedback, F (1,38650.39) = 19.80, p < .001, b = -1.37 (low PE = more positive). For lower Action Index values, a PE effect emerged only for positive feedback, F (1,38650.50) = 4.60, p = .032, b = -0.70 (low PE = more positive), but not for negative feedback ( p = .077). The three-way interaction is depicted in Figure 8B. We found no further significant results (all p ≥ .068). For additional statistical results of the N170 model, including b -values and confidence intervals, see Table S8 in the Supplementary Material. For information about the included datapoints per subject and condition, see Table S9 in the Supplementary Material. Discussion The present study investigated whether participants engaged different learning systems based on credit assignment for the feedback they received. More specifically, we analyzed in how far feedback processing depended on participants’ beliefs about the type of association they were learning, namely action-feedback or stimulus-feedback associations. Participants completed a feedback-learning task with either immediate or delayed feedback, in which they were instructed to either focus on pressing the preferable (i.e., more rewarding) button (action-instruction group) or selecting the preferable stimulus (stimulus-instruction group). Both groups underwent identical learning trials while EEG was recorded to assess feedback processing. We used ambiguous trials in subsequent test blocks to measure individual beliefs concerning the type of the learned association. We expected that participants with higher Action Index values, indicating a stronger belief in action-feedback associations, would primarily engage the striatal learning system. This would be reflected in pronounced coding of feedback valence and PE in the FRN/RewP, particularly for immediate feedback. Conversely, participants with a lower Action Index would rely more on the MTL learning system, as reflected in pronounced coding of feedback valence and PE in the N170, particularly for delayed feedback. Bayes analysis of the highest-order interaction of both EEG models revealed very strong evidence against our hypotheses (Lee & Wagenmakers, 2014). These distinctive Bayes factor values suggest that the negative findings were not caused by an insufficient sample size. Analyses of the Action Index derived from the participants’ choice data confirmed that the instructions shaped participants’ beliefs concerning credit assignment: Action Index values differed significantly between groups, with higher values in those participants who were told that feedback depended on their chosen action. The stimulus-feedback instruction was slightly more effective, producing Action Index values that deviated more clearly from chance than in the action-feedback instruction group, indicating stronger beliefs. It is important to note that since the Action Index is performance-dependent – being more extreme in better learners – it is less reliable for interpreting beliefs when learning is poor. However, accuracy in the learning trials was comparable between the two instruction groups and increased across blocks, indicating that learning occurred and did not depend on the belief concerning credit assignment. Also, while additional analyses showed that Action Index was dependent on learning success, the model fit suggested above-chance level Action Index values also for less successful learners. Although a trend effect of block in the test trials emerged only for the stimulus-feedback instruction in delayed feedback without tones, overall accuracy rates in the test trials were well above chance level and did not differ between conditions. It is likely that associations were formed already after the first learning block in most conditions, explaining the lack of block effects in the test trials. In the delayed feedback without tone condition and stimulus-feedback instructions, learning may have been slower initially, but accuracy descriptively caught up in later blocks. Therefore, differences in Action Index values are best explained by belief differences, not learning performance. Regarding neural processes linked to learning, we could replicate previous findings that both the FRN/RewP and the N170 reflect a PE. Interestingly, and in accordance with other recent findings from our group (Röhlinger, Albrecht, Ghio, et al., 2025), the PE by valence interaction for the N170 was driven by stronger valence effects for expected than unexpected events. The FRN/RewP showed a valence effect only for unexpected events, as was also shown before (Kirsch et al., 2022). Cautiously, in line with classical reinforcement learning theories (Holroyd & Coles, 2002), the FRN/RewP might signal a need to adapt actions and expectations, which arises when events are unexpected. The N170, in contrast, might code the strengthening of existing associations, possibly reflecting stronger reactivations of the events associated with feedback (e.g., Schiffer et al., 2014). Our first hypothesis was that the difference between positive and negative feedback in the FRN/RewP would be most pronounced for participants who attribute immediate feedback to a preceding action. In line with the hypothesis, we found a respective three-way interaction between Feedback Valence, Feedback Timing and the Action Index. But contrary to expectations, valence effects in the FRN/RewP were not particularly pronounced for higher Action Index values in the immediate feedback condition. Instead, a feedback valence effect emerged across all feedback timing conditions for participants who associated feedback with a preceding action (higher Action Index values). Lower Action Index values (when participants attributed feedback to a preceding stimulus) resulted in the typical pattern of reduced valence effects with increasing feedback delay often observed in the FRN/RewP (Albrecht et al., 2023; Höltje & Mecklinger, 2020; Peterburs et al., 2016; Weinberg et al., 2012). This pattern of the three-way interaction, although different from our hypotheses, still aligns with our underlying assumptions: when participants attribute feedback to a preceding action, the tendency to learn via the dopaminergic reward learning system may be stronger (Arbel et al., 2017; Foerde et al., 2013; Foerde & Shohamy, 2011; Peterburs et al., 2016; Weinberg et al., 2012), so that participants might have engaged this system irrespective of feedback delay, as reflected in the feedback valence effect across feedback delays. For participants attributing feedback to a stimulus, valence effects were only found for immediate feedback, suggesting that immediate feedback engages the dopaminergic reward system even if the feedback is not linked to a preceding action. Even though drawing direct inferences from ERP components to brain structures is difficult, the data suggest that the proposed role of the striatum for associating feedback with recent events (Jocham et al., 2016; Yagishita et al., 2014) may thus not be limited to recent actions. Our second hypothesis was that, for the FRN/RewP, strongest PE effects, as indicated by the interaction of absolute PE and feedback valence, would emerge if participants attribute feedback to a preceding action for immediate feedback. However, we found no evidence supporting this hypothesis. A significant PE by Valence interaction emerged, showing that more unexpected events elicited more positive amplitudes for positive feedback (see Kirsch et al., 2022; Weber & Bellebaum, 2024), but also more negative amplitudes for negative feedback (see Hoy et al., 2021; Röhlinger, Albrecht, & Bellebaum, 2025; Röhlinger, Albrecht, Ghio, et al., 2025). This interaction was not further modulated by either Action Index or feedback timing. This is in line with our previous studies showing no timing effect on PE coding in the FRN/RewP (Röhlinger, Albrecht, & Bellebaum, 2025; Weber & Bellebaum, 2024). While null effects are difficult to interpret, Bayesian statistics revealed strong evidence against a four-way interaction. Together with the previously-described three-way interaction between feedback valence, feedback timing and Action Index, this pattern of results suggests that valence coding in the FRN/RewP is dependent on participants’ belief about the learned associations, but the processing of PEs is not. Our third hypothesis predicted that the N170 would be most pronounced when participants attributed feedback to a preceding stimulus, particularly for delayed feedback. We found no evidence that supports this hypothesis. Finally, our fourth hypothesis proposed that the N170 would show the strongest PE effect when participants attributed feedback to a preceding stimulus, especially for delayed feedback. Even though we did not find a four-way-interaction between all involved factors, we did find a significant interaction between Feedback Valence, PE, and Action Index. This three-way interaction further explained an (absolute) PE by valence interaction (indicating PE coding), replicating previous findings from our group (Röhlinger, Albrecht, & Bellebaum, 2025; Röhlinger, Albrecht, Ghio, et al., 2025). The resolution revealed that the PE by valence interaction was more pronounced for higher than lower Action Index values, i.e., more pronounced if participants attributed the feedback to an action than to a stimulus, which is the opposite of the hypothesized effect. Combined with our FRN/RewP findings this suggests that the neural systems linked to both ERP components (presumably striatum and MTL-based learning systems) were more involved when learning depended more strongly on action-feedback associations, independent of feedback timing. Importantly, Action Index values were less extreme (i.e., less clearly deviating from the middle value of 0.5) in participants in the action instruction group than in the stimulus instruction group, indicating a tendency to learn from both, actions and stimuli, in the action instruction group. The result pattern for the two ERP components might thus suggest that both underlying learning systems were recruited for participants with higher vs. lower Action Index values. In this sense the finding is in line with our previous study (Röhlinger, Albrecht, Ghio, et al., 2025) where we found recruitment of both systems when participants actively chose between different stimuli, possibly linking feedback to stimuli and actions. We initially expected that N170 amplitudes reflected reactivations of brain regions involved in visual processing, particularly for stimulus-feedback associations. This expectation was based on findings that the MTL bridges the temporal gap by binding outcomes to preceding experiences (Singer & Frank, 2009) and reactivating associated representations (Pleger et al., 2008; Pleger et al., 2009; Schiffer et al., 2014). This mechanism, however, might also apply to preceding actions. Possibly, in the latter case, the feedback-locked N170 during learning of action-feedback associations might reflect MTL activity, that could drive the reactivation of motor cortices linked to previously performed actions. Evidence for such reactivation at reward presentation has been found for immediate feedback with short action-feedback intervals (1400 ms), then likely modulated by the striatal system (Cohen & Ranganath, 2007). Speculatively, such a reactivation could then allow both learning systems to connect action and feedback. Assuming that the N170 reflects MTL activity that drives the reactivation of motor cortices in action learning with delayed feedback favors Arbel et al. (2017)’s assumption that N170 directly reflects hippocampal activity. However, our group has found evidence that the N170 encodes PEs more strongly for visual stimulus-feedback associations than for auditory ones, suggesting that the component is linked to the reactivation of visual representations of learned stimuli in the in the fusiform gyrus (Röhlinger, Albrecht, & Bellebaum, 2025). Given the close anatomical proximity of the fusiform gyrus and the hippocampus/MTL, and EEG’s poor spatial resolution, the N170 may capture overlapping activity from both regions. It is therefore plausible that its neural source differs between stimulus-feedback and action-feedback learning. A magnetoencephalography study combined with MRI could help pinpoint the exact origin(s) of the N170 in feedback learning contexts. While thus both learning systems may be recruited in participants with relatively higher Action Index values, the question remains which neural mechanisms underly learning in participants with lower Action Index values. For immediate feedback the striatal reward system appeared to be also involved, as indicated by the pronounced feedback valence effect (see also Höltje & Mecklinger, 2020; Peterburs et al., 2016; Weinberg et al., 2012). With respect to the recruitment of the MTL-based learning system it is important to note that the N170 reflected the PE also for stimulus-feedback associations, with the restriction that the PE was reflected in N170 amplitude only for positive feedback. For immediate feedback, both learning systems may thus have also cooperated when learning was based on stimulus-feedback associations, while for delayed feedback the focus may have been on MTL-based learning, especially for positive feedback. As outlined above, the N170 may reflect the strengthening of existing or expected associations (Schiffer et al., 2014), which is particularly important for positive feedback. As in our previous study (Röhlinger, Albrecht, Ghio, et al., 2025), our findings contradict the account by Kimura and Kimura (2016), which attributes feedback delay effects solely to reduced temporal predictability. We observed significant differences between immediate feedback and both temporally predictable and temporally unpredictable delayed feedback conditions concerning the effect of feedback valence in the FRN/RewP. Since participants in Kimura and Kimura’s task could not learn from feedback, we suspect that temporal predictability plays a less central role in feedback learning tasks, where feedback is informative, than in tasks where it cannot be used for learning. In conclusion, our findings show that individual beliefs about the type of learned associations shape how feedback is processed during learning. The dopaminergic reward system, reflected in the FRN/RewP, was more strongly engaged when participants relied more strongly on action-feedback than stimulus-feedback associations, as indicated by robust valence effects regardless of feedback timing. This is consistent with a prominent role of the striatum in linking actions to feedback. At the same time the N170, possibly reflecting MTL and/or fusiform gyrus activity, showed more prominent PE coding for stronger action-feedback associations. PE coding might be particularly reflected in the N170 if striatum- and MTL-based learning systems cooperate. Especially for action-feedback associations, where modulations of both FRN/RewP and N170 were observed, we speculate that the MTL could help credit assignment in the striatal system by reactivating neural action representations. Future studies using combined imaging approaches could help clarify the neural origins of the N170 in feedback learning and how they vary by association type. Data Availability The study was preregistered on https://doi.org/10.17605/OSF.IO/Z9MPA. The data that support the findings of this study and all analysis scripts are openly accessible through the Open Science Framework at https://doi.org/10.17605/OSF.IO/F3R42. Author Contribution Christine Albrecht: Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Software, Visualization, Writing – Original Draft Preparation. Marta Ghio: Conceptualization, Methodology, Project Administration, Resources, Supervision, Validation, Writing – Review & Editing. Christian Bellebaum: Conceptualization, Funding Acquisition, Methodology, Project Administration, Resources, Supervision, Validation, Writing – Review & Editing. Text Editing ChatGPT (OpenAI, 2023, https://chat.openai.com/chat) was used for text editing of the manuscript. Funding Information Funded by the Deutsche Forschungsgemeinschaft (DFG; German Research Foundation) - project number 467460456. Conflict of Interest Disclosure The authors report no conflict of interest. References Albrecht, C., van de Vijver, R., & Bellebaum, C. (2023). Learning new words via feedback-Association between feedback-locked ERPs and recall performance-An exploratory study. Psychophysiology , 60 (10), e14324. https://doi.org/10.1111/psyp.14324 Antonakis, J., Bastardoz, N., & Rönkkö, M. (2021). On Ignoring the Random Effects Assumption in Multilevel Models: Review, Critique, and Recommendations. Organizational Research Methods , 24 (2), 443-483. https://doi.org/10.1177/1094428119877457 Arbel, Y., Hong, L., Baker, T. E., & Holroyd, C. B. (2017). It’s all about timing: An electrophysiological examination of feedback-based learning with immediate and delayed feedback. Neuropsychologia , 99 , 179–186. https://doi.org/10.1016/j.neuropsychologia.2017.03.003 Aubert, I., Ghorayeb, I., Normand, E., & Bloch, B. (2000). Phenotypical characterization of the neurons expressing the D1 and D2 dopamine receptors in the monkey striatum. J Comp Neurol , 418 (1), 22-32. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software , 67 (1), 1–48. https://doi.org/10.18637/jss.v067.i01 Bein, O., Duncan, K., & Davachi, L. (2020). Mnemonic prediction errors bias hippocampal states. Nature Communications , 11 (1), 3451. https://doi.org/10.1038/s41467-020-17287-1 Bellebaum, C., & Colosio, M. (2014). From feedback- to response-based performance monitoring in active and observational learning. Journal of Cognitive Neuroscience , 26 (9), 2111-2127. https://doi.org/10.1162/jocn_a_00612 Bellebaum, C., Jokisch, D., Gizewski, E. R., Forsting, M., & Daum, I. (2012). The neural coding of expected and unexpected monetary performance outcomes: Dissociations between active and observational learning. Behavioural brain research , 227 (1), 241-251. https://doi.org/https://doi.org/10.1016/j.bbr.2011.10.042 Bogacz, R., McClure, S. M., Li, J., Cohen, J. D., & Montague, P. R. (2007). Short-term memory traces for action bias in human reinforcement learning. Brain research , 1153 , 111–121. https://doi.org/10.1016/j.brainres.2007.03.057 Bürkner, P.-C. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software , 80 (1), 1 - 28. https://doi.org/10.18637/jss.v080.i01 Burnside, R., Fischer, A. G., & Ullsperger, M. (2019). The feedback-related negativity indexes prediction error in active but not observational learning. Psychophysiology , 56 (9), e13389. https://doi.org/https://doi.org/10.1111/psyp.13389 Calabresi, P., Picconi, B., Tozzi, A., Ghiglieri, V., & Di Filippo, M. (2014). Direct and indirect pathways of basal ganglia: a critical reappraisal. Nature neuroscience , 17 (8), 1022–1030. https://doi.org/10.1038/nn.3743 Carvalheiro, J., Conceição, V. A., Mesquita, A., & Seara-Cardoso, A. (2021). Acute stress blunts prediction error signals in the dorsal striatum during reinforcement learning. Neurobiol Stress , 15 , 100412. https://doi.org/10.1016/j.ynstr.2021.100412 Chau, B. K. H., Jarvis, H., Law, C.-K., & Chong, T. T.-J. (2018). Dopamine and reward: a view from the prefrontal cortex. Behavioural pharmacology , 29 (7), 569–583. https://doi.org/10.1097/fbp.0000000000000424 Cohen, M. X., & Ranganath, C. (2007). Reinforcement learning signals predict future decisions. The Journal of neuroscience : the official journal of the Society for Neuroscience , 27 (2), 371–378. https://doi.org/10.1523/jneurosci.4421-06.2007 Cooper, J. C., Dunne, S., Furey, T., & O’Doherty, J. P. (2012). Human Dorsal Striatum Encodes Prediction Errors during Observational Learning of Instrumental Actions. Journal of Cognitive Neuroscience , 24 (1), 106-118. https://doi.org/10.1162/jocn_a_00114 Dam, G., Kording, K., & Wei, K. (2013). Credit Assignment during Movement Reinforcement Learning. PloS one , 8 (2), e55352. https://doi.org/10.1371/journal.pone.0055352 Daniel, R., & Pollmann, S. (2014). A universal role of the ventral striatum in reward-based learning: evidence from human studies. Neurobiol Learn Mem , 114 , 90-100. https://doi.org/10.1016/j.nlm.2014.05.002 Deffke, I., Sander, T., Heidenreich, J., Sommer, W., Curio, G., Trahms, L., & Lueschow, A. (2007). MEG/EEG sources of the 170-ms response to faces are co-localized in the fusiform gyrus. NeuroImage , 35 (4), 1495–1501. https://doi.org/10.1016/j.neuroimage.2007.01.034 Dickerson, K. C., & Delgado, M. R. (2015). Contributions of the hippocampus to feedback learning. Cognitive, affective & behavioral neuroscience , 15 (4), 861–877. https://doi.org/10.3758/s13415-015-0364-5 Dickerson, K. C., Li, J., & Delgado, M. R. (2011). Parallel contributions of distinct human memory systems during probabilistic learning. NeuroImage , 55 (1), 266–276. https://doi.org/10.1016/j.neuroimage.2010.10.080 Diederen, K. M. J., Ziauddeen, H., Vestergaard, M. D., Spencer, T., Schultz, W., & Fletcher, P. C. (2017). Dopamine Modulates Adaptive Prediction Error Coding in the Human Midbrain and Striatum. The Journal of Neuroscience , 37 (7), 1708. https://doi.org/10.1523/JNEUROSCI.1979-16.2016 DuBrow, S., & Davachi, L. (2016). Temporal binding within and across events. Neurobiology of Learning and Memory , 134 Pt A (Pt A), 107-114. https://doi.org/10.1016/j.nlm.2016.07.011 Fischer, A. G., & Ullsperger, M. (2013). Real and fictive outcomes are processed differently but converge on a common adaptive mechanism. Neuron , 79 (6), 1243–1255. https://doi.org/10.1016/j.neuron.2013.07.006 Foerde, K., Race, E., Verfaellie, M., & Shohamy, D. (2013). A role for the medial temporal lobe in feedback-driven learning: evidence from amnesia. The Journal of neuroscience : the official journal of the Society for Neuroscience , 33 (13), 5698–5704. https://doi.org/10.1523/jneurosci.5217-12.2013 Foerde, K., & Shohamy, D. (2011). Feedback timing modulates brain systems for learning in humans. The Journal of neuroscience : the official journal of the Society for Neuroscience , 31 (37), 13157–13167. https://doi.org/10.1523/jneurosci.2701-11.2011 Foti, D., Weinberg, A., Dien, J., & Hajcak, G. (2011). Event-related potential activity in the basal ganglia differentiates rewards from nonrewards: temporospatial principal components analysis and source localization of the feedback negativity. Hum Brain Mapp , 32 (12), 2207-2216. https://doi.org/10.1002/hbm.21182 Gao, C., Conte, S., Richards, J. E., Xie, W., & Hanayik, T. (2019). The neural sources of N170: Understanding timing of activation in face-selective areas. Psychophysiology , 56 (6), e13336. https://doi.org/10.1111/psyp.13336 Gehring, W. J., & Willoughby, A. R. (2002). The medial frontal cortex and the rapid processing of monetary gains and losses. Science (New York, N.Y.) , 295 (5563), 2279–2282. https://doi.org/10.1126/science.1066893 Hajcak, G., Moser, J. S., Holroyd, C. B., & Simons, R. F. (2006). The feedback-related negativity reflects the binary evaluation of good versus bad outcomes. Biological psychology , 71 (2), 148–154. https://doi.org/10.1016/j.biopsycho.2005.04.001 Hauser, T. U., Iannaccone, R., Stämpfli, P., Drechsler, R., Brandeis, D., Walitza, S., & Brem, S. (2014). The feedback-related negativity (FRN) revisited: New insights into the localization, meaning and network organization. NeuroImage , 84 , 159-168. https://doi.org/https://doi.org/10.1016/j.neuroimage.2013.08.028 Holroyd, C. B., & Coles, M. G. H. (2002). The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychological review , 109 (4), 679–709. https://doi.org/10.1037/0033-295x.109.4.679 Holroyd, C. B., Nieuwenhuis, S., Yeung, N., & Cohen, J. D. (2003). Errors in reward prediction are reflected in the event-related brain potential. Neuroreport , 14 (18). https://doi.org/10.1097/00001756-200312190-00037 Höltje, G., & Mecklinger, A. (2020). Feedback timing modulates interactions between feedback processing and memory encoding: Evidence from event-related potentials. Cognitive, Affective, & Behavioral Neuroscience . https://doi.org/10.3758/s13415-019-00765-5 Hoy, C. W., Steiner, S. C., & Knight, R. T. (2021). Single-trial modeling separates multiple overlapping prediction errors during reward processing in human EEG. Communications Biology , 4 (1), 910. https://doi.org/10.1038/s42003-021-02426-1 Jocham, G., Brodersen, K. H., Constantinescu, A. O., Kahn, M. C., Ianni, A. M., Walton, M. E., Rushworth, M. F., & Behrens, T. E. (2016). Reward-Guided Learning with and without Causal Attribution. Neuron , 90 (1), 177-190. https://doi.org/10.1016/j.neuron.2016.02.018 Kim, S., & Arbel, Y. (2019). Immediate and delayed auditory feedback in declarative learning: An examination of the feedback related event related potentials. Neuropsychologia , 129 , 255–262. https://doi.org/10.1016/j.neuropsychologia.2019.04.001 Kimura, K., & Kimura, M. (2016). Temporal prediction restores the evaluative processing of delayed action feedback: an electrophysiological study. Neuroreport , 27 (14), 1061–1067. https://doi.org/10.1097/wnr.0000000000000657 Kirsch, F., Kirschner, H., Fischer, A. G., Klein, T. A., & Ullsperger, M. (2022). Disentangling performance-monitoring signals encoded in feedback-related EEG dynamics. NeuroImage , 257 , 119322. https://doi.org/https://doi.org/10.1016/j.neuroimage.2022.119322 Kobza, S., Ferrea, S., Schnitzler, A., Pollok, B., Südmeyer, M., & Bellebaum, C. (2012). Dissociation between active and observational learning from positive and negative feedback in Parkinsonism. PloS one , 7 (11), e50250. https://doi.org/10.1371/journal.pone.0050250 Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course . Cambridge university press. Lighthall, N. R., Pearson, J. M., Huettel, S. A., & Cabeza, R. (2018). Feedback-Based Learning in Aging: Contributions and Trajectories of Change in Striatal and Hippocampal Systems. The Journal of neuroscience : the official journal of the Society for Neuroscience , 38 (39), 8453–8462. https://doi.org/10.1523/jneurosci.0769-18.2018 Marco-Pallarés, J., Müller, S. V., & Münte, T. F. (2007). Learning by doing: an fMRI study of feedback-related brain activations. Neuroreport , 18 (14), 1423-1426. https://doi.org/10.1097/WNR.0b013e3282e9a58c Meteyard, L., & Davies, R. A. I. (2020). Best practice guidance for linear mixed-effects models in psychological science. Journal of Memory and Language , 112 , 104092. https://doi.org/10.1016/j.jml.2020.104092 Miltner, W. H., Braun, C. H., & Coles, M. G. (1997). Event-related brain potentials following incorrect feedback in a time-estimation task: evidence for a ”generic” neural system for error detection. Journal of Cognitive Neuroscience , 9 (6), 788–798. https://doi.org/10.1162/jocn.1997.9.6.788 Nieuwenhuis, S., Holroyd, C. B., Mol, N., & Coles, M. G. (2004). Reinforcement-related brain potentials from medial frontal cortex: origins and functional significance. Neuroscience and biobehavioral reviews , 28 (4), 441–448. https://doi.org/10.1016/j.neubiorev.2004.05.003 Nikooyan, A. A., & Ahmed, A. A. (2015). Reward feedback accelerates motor learning. Journal of neurophysiology , 113 (2), 633–646. https://doi.org/10.1152/jn.00032.2014 O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science (New York, N.Y.) , 304 (5669), 452–454. https://doi.org/10.1126/science.1094285 Peterburs, J., Kobza, S., & Bellebaum, C. (2016). Feedback delay gradually affects amplitude and valence specificity of the feedback-related negativity (FRN). Psychophysiology , 53 (2), 209–215. https://doi.org/10.1111/psyp.12560 Pleger, B., Blankenburg, F., Ruff, C. C., Driver, J., & Dolan, R. J. (2008). Reward facilitates tactile judgments and modulates hemodynamic responses in human primary somatosensory cortex. The Journal of neuroscience : the official journal of the Society for Neuroscience , 28 (33), 8161–8168. https://doi.org/10.1523/jneurosci.1093-08.2008 Pleger, B., Ruff, C. C., Blankenburg, F., Klöppel, S., Driver, J., & Dolan, R. J. (2009). Influence of dopaminergically mediated reward on somatosensory decision-making. PLoS biology , 7 (7), e1000164. https://doi.org/10.1371/journal.pbio.1000164 Proudfit, G. H. (2015). The reward positivity: From basic research on reward to a biomarker for depression. Psychophysiology , 52 (4), 449–459. https://doi.org/10.1111/psyp.12370 Rescorla, R., & Wagner, A. (1972). A theory of Pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement. Classical Conditioning: Current Research and Theory . Röhlinger, M., Albrecht, C., & Bellebaum, C. (2025). The Role of the N170 in Linking Stimuli to Feedback—Effects of Stimulus Modality and Feedback Delay. Psychophysiology , 62 (4), e70050. https://doi.org/https://doi.org/10.1111/psyp.70050 Röhlinger, M., Albrecht, C., Ghio, M., & Bellebaum, C. (2025). Neural Processing of Immediate versus Delayed Feedback in Action-Feedback and Stimulus-Feedback Associations. Journal of Cognitive Neuroscience , 1-35. https://doi.org/10.1162/jocn.a.49 Rossion, B. (2014). Understanding face perception by means of human electrophysiology. Trends in cognitive sciences , 18 (6), 310–318. https://doi.org/10.1016/j.tics.2014.02.013 Rossion, B., Joyce, C. A., Cottrell, G. W., & Tarr, M. J. (2003). Early lateralization and orientation tuning for face, word, and object processing in the visual cortex. NeuroImage , 20 (3), 1609–1624. https://doi.org/10.1016/j.neuroimage.2003.07.010 Rubin, J. E., Vich, C., Clapp, M., Noneman, K., & Verstynen, T. (2021). The credit assignment problem in cortico-basal ganglia-thalamic networks: A review, a problem and a possible solution. European Journal of Neuroscience , 53 (7), 2234-2253. https://doi.org/https://doi.org/10.1111/ejn.14745 Sambrook, T. D., & Goslin, J. (2015). A neural reward prediction error revealed by a meta-analysis of ERPs using great grand averages. Psychological bulletin , 141 (1), 213–235. https://doi.org/10.1037/bul0000006 Schiffer, A.-M., Muller, T., Yeung, N., & Waszak, F. (2014). Reward activates stimulus-specific and task-dependent representations in visual association cortices. The Journal of neuroscience : the official journal of the Society for Neuroscience , 34 (47), 15610–15620. https://doi.org/10.1523/jneurosci.1640-14.2014 Schönberg, T., Daw, N. D., Joel, D., & O’Doherty, J. P. (2007). Reinforcement Learning Signals in the Human Striatum Distinguish Learners from Nonlearners during Reward-Based Decision Making. The Journal of Neuroscience , 27 (47), 12860-12867. https://doi.org/10.1523/jneurosci.2496-07.2007 Schultz, W., Dayan, P., & Montague, P. R. (1997). A Neural Substrate of Prediction and Reward. Science (New York, N.Y.) , 275 (5306), 1593–1599. https://doi.org/https://doi.org/10.1126/science.275.5306.1593 Sinclair, A. H., Manalili, G. M., Brunec, I. K., Adcock, R. A., & Barense, M. D. (2021). Prediction errors disrupt hippocampal representations and update episodic memories. Proceedings of the National Academy of Sciences , 118 (51), e2117625118. https://doi.org/doi:10.1073/pnas.2117625118 Singer, A. C., & Frank, L. M. (2009). Rewarded outcomes enhance reactivation of experience in the hippocampus. Neuron , 64 (6), 910–921. https://doi.org/10.1016/j.neuron.2009.11.016 Staresina, B. P., Gray, J. C., & Davachi, L. (2009). Event congruency enhances episodic memory encoding through semantic elaboration and relational binding. Cereb Cortex , 19 (5), 1198-1207. https://doi.org/10.1093/cercor/bhn165 Sutton, R. S., & Barto, A. (2018). Reinforcement learning: An introduction. A Bradford Book . Vassiliadis, P., Beanato, E., Popa, T., Windel, F., Morishita, T., Neufeld, E., Duque, J., Derosiere, G., Wessel, M. J., & Hummel, F. C. (2024). Non-invasive stimulation of the human striatum disrupts reinforcement learning of motor skills. Nature human behaviour . https://doi.org/10.1038/s41562-024-01901-z Weber, C., & Bellebaum, C. (2024). Prediction-error-dependent processing of immediate and delayed positive feedback. Scientific reports , 14 (1), 9674. https://doi.org/https://doi.org/10.1038/s41598-024-60328-8 Weinberg, A., Luhmann, C. C., Bress, J. N., & Hajcak, G. (2012). Better late than never? The effect of feedback delay on ERP indices of reward processing. Cognitive, affective & behavioral neuroscience , 12 (4), 671–677. https://doi.org/10.3758/s13415-012-0104-z Wessel, M. J., Beanato, E., Popa, T., Windel, F., Vassiliadis, P., Menoud, P., Beliaeva, V., Violante, I. R., Abderrahmane, H., Dzialecka, P., Park, C.-H., Maceira-Elvira, P., Morishita, T., Cassara, A. M., Steiner, M., Grossman, N., Neufeld, E., & Hummel, F. C. (2023). Noninvasive theta-burst stimulation of the human striatum enhances striatal activity and motor skill learning. Nature neuroscience , 26 (11), 2005–2016. https://doi.org/10.1038/s41593-023-01457-7 Yagishita, S., Hayashi-Takagi, A., Ellis-Davies, G. C., Urakubo, H., Ishii, S., & Kasai, H. (2014). A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science , 345 (6204), 1616-1620. https://doi.org/10.1126/science.1255514 Yeung, N., Holroyd, C. B., & Cohen, J. D. (2005). ERP correlates of feedback and reward processing in the presence and absence of response choice. Cerebral cortex (New York, N.Y. : 1991) , 15 (5), 535–544. https://doi.org/10.1093/cercor/bhh153 Zaghloul, K. A., Blanco, J. A., Weidemann, C. T., McGill, K., Jaggi, J. L., Baltuch, G. H., & Kahana, M. J. (2009). Human substantia nigra neurons encode unexpected financial rewards. Science (New York, N.Y.) , 323 (5920), 1496–1499. https://doi.org/10.1126/science.1167342 Figure captions and notes Figure 1: Trial structure in the learning parts Figure 2: Accuracy in the learning and in the test trials Note. Error margins represent 95% confidence intervals. Figure 3: Descriptive Action Index values by Instruction, Block and Feedback Timing Note. The Action Index codes how much participants attribute feedback to a preceding action: 0 = feedback is attributed to a preceding stimulus; 1 = feedback is attributed to a preceding action. Error margins represent 95% confidence intervals. Figure 4: Grand averages and topographies of the FRN/RewP Note. A. Error margins in the event-related potentials represent standard errors. The black line represents the difference wave. The dark grey rectangle indicates the time window for the extraction of the negative peak; the light grey rectangle indicates the additional time window in which the positive peak was extracted (starting at 100 ms and ending at the latency of the negative peak). B. The topographies represent the relative negative peak, i.e., the negative peak minus the preceding positive peak. For optimal visibility of frontocentral electrodes the top view of the head is displayed. Figure 5: Model predictions of the FRN/RewP model including Action Index Note. Error margins represent 95% confidence intervals. PE = prediction error. Figure 6: Grand averages and topographies of the N170 Note. A. Error margins in the event-related potentials represent standard errors. The dark grey rectangle indicates the time window for the extraction of the negative peak. B. For optimal visibility of temporoparietal electrodes the back view of the head is displayed. Figure 7: Model predictions of the N170 model including Action Index Note. Error margins represent 95% confidence intervals. PE = prediction error. Figure 8: Model Predictions of all significant effects involving the Action Index Note. Error margins represent 95% confidence intervals. AI = Action Index, PE = prediction error. Graphical Abstract Graphical Abstract Text Individual beliefs on whether feedback refers to a previous action or a previous stimulus affects feedback processing: the feedback-related negativity, possibly indicating striatal activity, showed stronger feedback valence coding across feedback delays, while the N170, possibly indicating medial temporal lobe activity, stronger prediction error by valence coding when participants focused on action-feedback associations. Besides feedback delay, beliefs about association type thus might influence which learning system is recruited in feedback learning. Supplementary Material File (figure3.tif) Download 6.17 MB File (figure5.tif) Download 10.28 MB File (figure7.tif) Download 14.66 MB File (manuscript_feedbackambiguousfeedbackassociation_revision_final_clean.docx) Download 160.18 KB Information & Authors Information Version history V1 Version 1 01 December 2025 Peer review timeline Published European Journal of Neuroscience Version of Record 6 Mar 2026 Published Copyright This work is licensed under a Non Exclusive No Reuse License. Collection European Journal of Neuroscience Keywords feedback delay feedback-association types frn/rewp n170 prediction error Authors Affiliations Christine Albrecht 0000-0002-8388-2533 [email protected] Heinrich Heine University Düsseldorf View all articles by this author Marta Ghio Heinrich Heine University Düsseldorf View all articles by this author Christian Bellebaum Heinrich Heine University Düsseldorf View all articles by this author Metrics & Citations Metrics Article Usage 178 views 161 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Christine Albrecht, Marta Ghio, Christian Bellebaum. Action or Stimulus: Individual beliefs about learned associations influence the processing of immediate and delayed feedback. Authorea . 01 December 2025. DOI: https://doi.org/10.22541/au.176463133.30560512/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.176463133.30560512/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'a004b39908224193',t:'MTc3OTU0NjAyOA=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.