Keywords
population coding, reinforcement learning, resource allocation, attention, working
memory
1
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
Introduction1
To support adaptive behaviour and ensure survival, the brain has evolved to prioritise environmental2
cues that signal potential rewards [1, 2]. Selectively attending to reward-predicting stimuli facilitates3
efficient navigation of complex environments, helping organisms move towards more rewarding states4
[3, 4]. This selection process is crucial given the brain’s limited processing capacity, as it enhances in-5
ternal representations of valuable stimuli and facilitates the formation of stimulus-reward associations6
[5]. Whereas the bias towards processing stimuli associated with tangible rewards is well established,7
the influence of intrinsic rewards – positive motivational states associated with feelings of satisfaction8
and competence [6] – on sensory processing remains less understood.9
Experiments using points-based and monetary incentives have found that associating stimuli10
with a higher probability, or greater magnitude, of external reward facilitates voluntary, or top-down,11
attention [7–9]. Additionally, in visual search tasks, which primarily engage bottom-up processes,12
search times are faster for pop-out targets associated with higher rewards than stimuli predicting less13
or no reward [10]. Notably, the prioritisation of reward-associated stimuli persists in subsequent tasks14
even when reward contingencies are removed, and previously rewarded features cease to be salient or15
task-relevant [11–13]. Consistent with this, studies have shown that eye movements are biased towards16
objects and spatial locations previously associated with rewards [14–16]. This continued prioritisation17
of previously rewarded stimuli, even when it no longer aligns with immediate task goals, suggests that18
reward learning creates a lasting effect that can involuntarily bias attention towards these stimuli [17,19
18].20
The influence of external rewards on behaviour extends to visual working memory (VWM) [19],21
which is known for its ability to flexibly store and maintain features of multiple objects within a22
limited capacity [20–28]. The precision of representations increases as a function of the associated23
reward, indicating that VWM allocation also tracks reward values when multiple objects provide24
different rewards ([29, 30]; see [31] for a review). Objects that were previously associated with reward25
are also better remembered, even when they are currently task-irrelevant [32]. Crucially, however,26
total VWM capacity does not show flexibility with reward [33, 34], which is further evidenced by27
findings that improved performance for high-reward items is accompanied by a corresponding decline28
in performance for low-reward items [35]. These results demonstrate that stimuli can be strategically29
prioritised for encoding in VWM through selective attention, leading to flexible allocation of limited30
capacity between items based on their assigned subjective values [36, 37].31
Neuroimaging studies suggest that intrinsic rewards can have similar effects on the neural sys-32
tem as external rewards. Successful information retrieval in cognitive tasks has been argued to be33
psychologically rewarding [38], and studies have shown elevated activation in the striatum – a region34
traditionally associated with the motivational significance of actions [39–42] – in response to correct35
responses, even in the absence of explicit rewards [38,43, 44]. This activation is driven not by the36
successful retrieval of information itself, but rather by the satisfaction of the observer’s internal goals37
[38, 43]. Similarly, changes in confidence levels, which reflect subjective evaluations of correctness,38
have also been shown to modulate striatal activation [45–47]. Building on evidence of subjective con-39
fidence signals in the brain’s reward circuits, it has been argued that the brain reinforces behaviours40
linked to high-confidence states while diminishing those associated with low confidence [48]. Together,41
growing evidence suggests that internally generated signals, particularly those related to perceived42
accuracy and performance evaluation, are represented similarly in the brain to explicit, externally43
administered rewards, raising the possibility that they may similarly bias sensory processing.44
In the present study, we combined psychophysical measurement and computational modelling to45
investigate how different intrinsic and extrinsic factors affect the competition between visual stimuli46
for processing resources. We used a modified analogue report task [49,50] in which observers were47
instructed to reproduce the direction of one of a pair of motion stimuli that differed in their associated48
history of reward. Across a series of experiments, we found performance was consistently better49
for stimuli previously associated with larger extrinsic reward, but also those associated with lower50
uncertainty or with improved performance feedback. To provide a mechanistic explanation of the51
observed behaviour, we developed a computational model that relates accumulation of past rewards,52
both intrinsic and extrinsic, to allocation of neural resources between stimuli, which in turn influences53
estimation performance.54
2
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
Results55
Differential rewards bias resource allocation56
Building on existing evidence that external rewards can bias information processing, we began our57
investigation by quantifying their effects on representational fidelity in a motion reproduction task.58
In Experiment 1, observers viewed two coloured motion stimuli, and after a brief delay and the59
presentation of a colour cue, they were asked to reproduce the motion direction of the cued stimulus60
(Fig. 1A & B). Critically, in this experiment, we associated the colours of the stimuli with different61
external rewards by awarding accurate recall (< 50◦ absolute error) with 15 points when items of one62
colour were tested versus 5 points for the other colour. Accumulated points were converted into a63
bonus payment to the observer. At the end of the experiment, all observers correctly identified which64
stimulus had provided the larger rewards. To determine whether the difference in external rewards65
influenced reproduction precision, we compared the mean absolute deviation (MAD) of responses66
between stimuli of different colours. We found strong evidence that response errors were smaller for67
items of the colour associated with the larger reward (BF10 = 18.7, median of the posterior over effect68
size δ = 0.575, 95% credible interval = [0.195, 0.966]) (Fig.1C & D).69
Density
C D ELow rewardHigh reward
Response error
High Low
Reward
MAD
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0
0.5
1
1.5
2
2.5α
Observed
Allocation
Rewardmaximization
Motion stimuli CueDelay ResponseA B
+5 points
Reward
Response error
0 0.5 1 1.5
0
α
Total normalized reward
0
0.5
1
1.5
0
0.5
1
1.5
0 - 0 -
no points
reward zone (5 or 15 points)
Figure 1: External reward manipulation in Experiment 1. A) Schematic of the task. B) Illustration
of the experimental manipulation. Responses within 50 degrees of the target motion direction were
rewarded with either 15 or 5 points, depending on the colour of the cued object. C) Distribution
of response errors and corresponding fits of the Neural resource model. Histograms represent the
data, while coloured curves and shaded areas depict model predictions (M± SE) D) Mean absolute
deviation (MAD) of response errors. The coloured circles with error bars represent the mean ±
SE. Dashed line indicates chance level performance. E) Observed (i.e., freely estimated) resource
allocation compared to the optimal allocation aimed at maximizing the total points in the task. For
visualisation purposes, allocation towards the low-reward item is shown. Dashed line indicates equal
allocation. Allocation smaller than 1 indicates that more resource was allocated towards the high-
reward item (1:0.695 for high- vs low-reward item). The inset shows individual reward functions
relating resource allocation to expected point totals, with each curve’s peak indicating the allocation
that maximizes expected reward. For ease of visualization, only a subset of observers is shown, and
all curves are normalized to the same total reward.
Neural resource allocation70
The results of Experiment 1 indicate that observers prioritised encoding the stimulus associated with71
the larger reward. Importantly, recall for the low-reward item remained reliably better than chance,72
suggesting that prioritisation was graded rather than all-or-none. To quantify the share of resources73
allocated to each item, we applied a normalization-based population coding model [22,51] to the data74
from Experiment 1. In this model, neural firing rate takes the role of a limited resource, which, in75
the simplest scenario, would be equally distributed between stimuli. Here, we extended this model by76
freely fitting a gain modulation parameter,α, which increased the activity encoding one of the two77
3
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
stimuli while keeping the total gain (i.e., mean activity) of the population constant (see theNeural78
resource modelsection for more detail). Consistent with the observed difference in response error, we79
found strong evidence for unequal resource allocation favouring items of the highly rewarded colour80
(low/high ratio 0.695; difference from equal allocation,BF10 = 23.8, δ = 0.594, 95% CI = [0.211,81
0.987]) (Fig.1E).82
We next investigated whether observers distributed resources in a way that would maximize the83
total number of collected points, which we considered an optimal allocation strategy for this task.84
To test this, we calculated the expected number of points awarded for a range of different allocation85
weights (seeOptimal resource allocationfor more detail). The values ofα that maximized the reward86
are shown in Figure1E (optimal allocation). Comparing the observed and optimal weights revealed87
strong evidence for a difference between the two (BF10 = 92.9, δ = 0.694, 95% CI = [0.297, 1.102]),88
with observers distributing resources more equally than would be required to maximize the total89
number of points (low/high ratio 0.34).90
A reward-maximization strategy would come at the cost of higher error for the low-reward item.91
This could suggest that, in addition to maximizing external rewards, observers may also aim to92
achieve a certain level of accuracy on the task across all items, potentially because they find accuracy93
intrinsically rewarding (see also, [25]).94
Perceived accuracy biases resource allocation95
Having confirmed that external rewards modulated allocation in the motion reproduction task, we96
next investigated effects of perceived accuracy, a possible form of intrinsic reward, on the same task.97
In Experiment 2a we presented manipulated feedback at the end of each trial to influence observers’98
perception of their reproduction accuracy. Observers were again presented with two coloured stimuli,99
and reproduced one indicated by a colour cue. We magnified the error presented at feedback when100
one colour was cued and minified the error at feedback for the other colour (Fig.2A & B). A post-101
experimental questionnaire revealed that 84% of observers judged stimuli of the colour associated with102
error-magnified feedback as more difficult to remember, indicating that we successfully associated103
stimulus identity (i.e., colour) with perceived difficulty.104
To assess the effects of perceived difficulty on response precision, we compared MAD between the105
response and the true target direction (rather than the one shown as feedback) for stimuli of the two106
colours (Fig.2C & D). We found responses to be more precise for the stimulus with reduced feedback107
error, i.e., the one perceived as easier to remember (BF10 = 29.4, δ = 0.672, 95% CI = [0.24, 1.12]).108
This finding indicates that the perception of better performance for stimuli of one colour, induced by109
feedback, led to improved actual performance for those stimuli.110
The observed effect could be attributed to either capture of visual attention by the “easier”111
item (i.e., competition for visual processing resources) or the mnemonic prioritisation of that item112
(i.e., competition for memory resources). To differentiate between these possibilities, we conducted113
a follow-up experiment. Experiment 2b replicated the conditions of Experiment 2a but with stimuli114
presented sequentially to reduce encoding competition between the two objects and minimize the115
influence of attentional selection on resource allocation. Similar to Experiment 2a, 89% of observers116
judged the colour associated with magnified feedback errors as more difficult to remember. However,117
in contrast to Experiment 2a, comparing response errors across the two stimuli (Fig.S1) revealed that118
the observed data were nine times more likely under the null hypothesis, providing moderate evidence119
for a lack of difference in response precision between the two colours (BF10 = 0.11, δ = 0.009, 95%120
CI = [-0.183, 0.202]). This finding suggests that the effect observed in Experiment 2a is likely due121
to attentional competition during encoding. When that competition is mitigated, observers do not122
show preferential encoding based on perceived difficulty.123
Neural resource allocation124
The results of Experiment 2a show that observers prioritised encoding of the error-minified stimulus,125
i.e., the one signalling better performance. Crucially, the error-magnified stimulus was still recalled126
with above-chance precision, consistent with a graded rather than all-or-none allocation of resources.127
To quantify resource distribution between the two objects, we again applied the Neural resource128
model to the data, with results illustrated in Figure2C & E. We found that, on average, observers129
allocated 1.18 times more resources towards the error-minified stimulus (difference from equal alloca-130
4
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
Magni/f_iedMini/f_ied
Feedback error
MAD
0
0.5
1
1.5
2
2.5α
C D EMagni/f_ied feedback errorMini/f_ied feedback error
Observed
Allocation
Feedback error minimization
Feedback
0
Response error
-1
-0.5
0
0.5
1
Feedback
error minifying error magnifyingMotion stimuli CueDelay ResponseA B
Density
Response error Response error
0
0.5
1
1.5
0
0.5
1
1.5
0 0 --
-
(rad)
0 0.5 1 1.5
0
α
Total feedback variance
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Figure 2: Perceived accuracy manipulation in Experiment 2a. A) Schematic of the task. B) Experi-
mental manipulation illustration. Feedback error was magnified for one stimulus and minified for the
other, based on the colour of the cued object. C) Distribution of response errors and corresponding
fits of the Neural resource model. Histograms represent the data, while coloured curves and shaded
areas depict model predictions (M± SE) D) Mean absolute deviation of response errors. The coloured
circles with error bars show the mean± SE. Dashed line indicates chance level performance. E) Ob-
served resource allocation and optimal allocation aiming to minimize overall feedback variance in the
task. Dashed line indicates equal allocation. Allocation larger than 1 indicates that more resource
was allocated towards the error-minified item (minified vs magnified: 1.18:1). The inset illustrates
individual variability in feedback error as a function of resource allocation, with each curve’s trough
indicating the allocation level that minimizes feedback error. For ease of visualization, only a subset
of observers is shown, and all curves are normalized to the same range of feedback variance.
tion, BF10 = 2.63, δ = 0.45, 95% CI = [0.05, 0.86]; 19 out of 25 observers hadαobserved > 1; Fig.2D),131
consistent with the observed difference in response error between the stimuli of two colours.132
Next, we investigated whether the observed allocation matched the predictions of an ideal ob-133
server who optimally weights neural activity to minimize overall feedback error in the task. We134
calculated the expected variance of feedback error across both items for a range of different allo-135
cation weights, and Figure 2E shows optimal allocation weights that minimize this variance. The136
optimal strategy would require shifting twice as many resources towards the error-magnified item137
(αoptimal = 0.52). Importantly, we found extremely strong evidence that this was inconsistent with138
the observed allocation, which favoured the error-minified item (BF10 = 3.77 × 106, δ = 1.72, 95% CI139
= [1.09, 2.39]). Overall, these results indicate that observers did not adopt an allocation strategy that140
would minimize their feedback error variability (α= 0.52), but instead did the opposite, allocating141
more neural resources to the item for which we systematically minified the error in feedback.142
In Experiment 2b, fitting the same Neural resource model to the data revealed that the observed143
allocation parameter was numerically close to 1, (αmean = 1.07; BF10 = 0.62, δ = 0.18, 95% CI =144
[-0.01, 0.38]), which aligns with the observed similarity in reproduction precision between the two145
stimuli. This further supports the conclusion that the effect observed in Experiment 2a depended on146
attentional competition during encoding.147
Estimation difficulty biases resource allocation148
Following Experiment 2, we aimed to determine whether preferential allocation and encoding would149
persist when varying objective stimulus difficulty rather than perceived performance. Drawing on150
previous findings showing a positive correlation in humans between subjective confidence and the151
motion strength of RDK stimuli [52] (see also [53]), we hypothesised that variations in the objective152
difficulty of stimuli would modulate internally generated confidence signals, driving the prioritisation153
of specific stimuli as in Experiment 2. In Experiment 3a, we presented two coloured RDK stimuli154
with different coherence levels on the majority of trials, to create differences in objective difficulty155
5
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
and associated confidence. We then assessed response precision on the remaining trials, during which156
both stimuli were presented with equal coherence (i.e., equal difficulty) (Fig.3A & B).157
C
0
0.5
1
1.5Density
0
0.5
1
1.5
Response error
High coherence colour
0
0.5
1
1.5Density
Low coherence colour
0
0.5
1
1.5
D
0 -
Response error
0 -
0 - 0 -
E F
High Low
Coherence
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6MAD
Inter. (High) Inter. (Low)
Coherence
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0
0.5
1
1.5
0
0.5
1
1.5
0
0.5
1
1.5
0
0.5
1
1.5
G
DensityDensity
H
0 - 0-
Response error
0 -
Response error
0-
I
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6MAD
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Jα
0
0.5
1
1.5
2
2.5
3
3.5
4
High Low
Coherence
Inter. (High) Inter. (Low)
Coherence
Observed
Allocation
Error minimization
Experiment 3aExperiment 3b
Observed
Allocation
Error minimization
α
0
0.5
1
1.5
2
2.5
3
3.5
4
Motion stimuli CueDelay Response
A B Variable
coherence trials
Equal
coherence trials
85%
45%
65%
65%
85% 45%
65% 65%
85% 45%
65% 65%
0 0.5 1 1.5
0
α
Total response variance
2
0 0.5 1 1.5
0
α
Total response variance
2
45% 65% 85%
coherence (%)
Additive perceptual noise (var)
0
Figure 3: Estimation difficulty manipulation in Experiment 3a and 3b (simultaneous presentation).
A) Schematic of the task. B) Experimental manipulation illustration. In most trials, the two colours
were associated with different levels of motion estimation difficulty (i.e., variable coherence); in the
remaining trials, both objects had the same level of difficulty (i.e., equal coherence). Motion with
different coherence levels produces varying degrees of perceptual noise, with higher coherence reduc-
ing noise. This perceptual noise was incorporated into the Neural Resource model as an additive
component, alongside memory noise. C) & D) Distribution of response errors and corresponding fits
of the Neural resource model. Histograms represent the data, while coloured curves and shaded areas
depict model predictions (M± SE). Panel A depicts variable coherence trials, and panel B depicts
equal coherence trials. E) Mean absolute deviation of response errors. Dashed lines indicate equal
allocation. F) Observed resource allocation and optimal allocation aiming to minimize overall feed-
back variance in the task. Dashed line indicates equal allocation. Allocation larger than 1 indicates
that more resource was allocated towards the easier item (high vs low coherence: 1.76:1). Panels
G-J are the same as C-F, but for the simultaneous presentation condition of Experiment 3b. J)
Allocation larger than 1 indicates that more resource was allocated towards the easier item (high vs
low coherence: 1.61:1). The insets illustrate individual variability in response variance as a function
of resource allocation, with each curve’s trough indicating the allocation level that minimizes overall
response error. The coloured circles with error bars show the mean± SE. For ease of visualization,
all curves are normalized to the same range of recall variance.
In Experiment 3a, all observers reported that stimuli of the colour associated with low coherence158
were more difficult to remember, confirming that the coherence manipulation produced a clear differ-159
6
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
ence in perceived difficulty, despite the absence of performance feedback in this experiment. Results160
of Experiment 3a are shown in Figure 3C-E. As expected, observers were more precise in reproducing161
the motion direction of the high-coherence stimulus on trials where the stimuli objectively differed162
in difficulty (BF10 = 1.83 × 105, δ = 1.6, 95% CI = [0.95, 2.29]). More importantly, on trials where163
the stimuli had equal coherence, reproduction was also more precise for the stimulus with the colour164
associated with high coherence (i.e., the “easier” colour) (BF10 = 11.4, δ = 0.63, 95% CI = [0.18,165
1.10]). This finding suggests that observers associated colour with difficulty when the two objects166
were presented with different levels of coherence, and subsequently allocated more resources to the167
stimulus they had learned was easier.168
This result was replicated in the laboratory setting of Experiment 3b, where observers were169
required to maintain eye fixation at the centre of the screen during stimulus presentation (Fig.3G-170
I). Despite preventing observers from overtly shifting their attention towards one stimulus during171
encoding, 74% of observers correctly identified one colour as more difficult. As expected, responses172
weremorepreciseforthehigh-coherencestimulusontrialswherestimulidifferedincoherence( BF10 =173
4414, δ = 1.366, 95% CI = [0.718, 2.046]). Additionally, responses were more precise for the colour174
associated with high coherence on trials where both stimuli were presented with equal coherence175
(BF10 = 4.2, δ = 0.56, 95% CI = [0.092, 1.052]). However, the observed difference could again be176
explained by attentional demands at encoding. When objects were presented sequentially (Fig.S2),177
response precision was comparable across colours when both stimuli had the same coherence (BF10 =178
0.51, δ = 0.267, 95% CI = [-0.157,0.709]). This was despite observers being able to judge which item179
was more difficult (84%) and a noticeable precision advantage for the high-coherence stimulus on trials180
when coherence levels varied between objects (BF10 = 9.22 × 104, δ = 1.76, 95% CI = [1.015, 2.550]).181
Compared to other experiments, response distributions in this experiment exhibit more pronounced182
peaks around the direction opposite to the target (i.e., elevated tail ends). The tendency of our183
sensory system to encode orientation of a motion path (i.e., the line on which movement occurs)184
partly independently of direction is well-documented [54,55] and may be especially pronounced when185
motion stimuli are presented in the periphery rather than at fixation.186
Neural resource allocation187
Consistent with the findings from Experiment 2, Experiment 3 demonstrated that observers, when188
presented with objects associated with different levels of performance, prioritised the encoding of the189
stimuli perceived as easier. Also consistent with previous experiments, observers performed above190
chance for the more difficult item, supporting the interpretation that resource allocation was graded191
rather than all-or-none. To quantify the distribution of resources across the two items, we again192
applied our population coding model to the data.193
In Experiment 3a, the allocation estimates from the model indicated that observers allocated194
nearly twice as much resource (1.76:1) to the high-coherence stimuli (Fig.3F), and this allocation195
deviated from equal allocation (BF10 = 2.68 × 104, δ = 1.4, 95% CI = [0.792, 2.031]). We next196
investigated whether the observed allocation was consistent with an optimal allocation strategy aimed197
at minimizing overall response variance in the task. To this end, we simulated performance on the198
variable coherence trials using a range of different allocation weights, and found that the optimal199
strategy for most observers was equal allocation (Fig.3F). Comparing the observed and optimal200
weights revealed strong evidence that the observed weights were, on average, larger than the optimal201
weights (BF10 = 9580, δ = 1.341, 95% CI = [0.733, 1.977]).202
These findings were replicated in Experiment 3b. When objects were presented simultaneously,203
the model estimated that observers allocated resources at a ratio of 1.61:1 in favour of the high-204
coherence stimulus (Fig.3J). This allocation was again different from equal (BF10 = 830.6, δ = 1.166,205
95% CI = [0.566, 1.794]), and from optimal, which was again close to equal (meanαoptimal = 1.03;206
BF10 = 789, δ = 1.16, 95% CI = [0.561, 1.786]). Finally, fitting a free allocation parameter to the207
data from the equal coherence condition with sequential presentation (Exp 3b), revealed a ratio of208
1.2:1 in favour of the colour associated with high coherence; however, we did not find evidence that209
this was different from equal allocation (BF10 = 0.82, δ = 0.343, 95% CI = [-0.09, 0.797]).210
7
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
Interim conclusion211
In Experiment 1, we observed a clear effect of external rewards on representational fidelity in a motion212
reproduction task, with observers allocating more cognitive resources to high-reward items. In Exper-213
iments 2 and 3, we found a similar effect using novel manipulations, where observers allocated more214
resources to the stimulus that was perceived as easier, either based on manipulated error feedback215
(Experiment 2) or internal confidence in estimation (Experiment 3). In both of these experiments,216
the observed allocation deviated from predictions made by an optimal strategy aimed at minimiz-217
ing overall feedback or response error. Additionally, we demonstrated that differences in estimation218
performance were abolished when competition during encoding was removed by presenting stimuli219
sequentially, suggesting they arise from unequal allocation of attentional resources at the encoding220
stage.221
Building on these results and existing literature [38,43, 44], we argue that observers in our study222
found higher accuracy and confidence in their performance intrinsically rewarding, and learned to223
associate this intrinsic reward with a stimulus feature (i.e., one of the two colours). This association224
biased resource allocation towards subsequent stimuli with the same feature. Importantly, although225
reward-driven, this biased allocation was not a strategy that would maximize intrinsic reward on226
these tasks, because observers had no influence over which stimulus was cued for report on a given227
trial. Indeed the direction of the biases induced by implicit rewards in Exps 2 & 3 meant that they228
were counterproductive: increasing overall error variability relative to a strategy of equal allocation.229
Therefore, instead of evaluating this data from the perspective of optimal performance, we propose a230
neural model inspired by reinforcement learning to elucidate these findings.231
Reinforcement learning model of resource allocation232
To further explore the dynamics of resource allocation, we developed a computational model that in-233
tegrates principles of neural coding and reinforcement learning. The proposedReinforcement learning234
account of resource allocationextends theNeural resource model[22, 51] by incorporating a value-235
updating mechanism that allows extrinsic and intrinsic rewards to influence the future distribution236
of neural resources (Fig.4). A key contribution of our model is the concept that rewards – both237
intrinsic and extrinsic – obtained from reproduction of a stimulus become associated with the identi-238
fying features of that stimulus, affecting their subjective value and biasing allocation of resources in239
subsequent encounters. We found that this approach accurately predicted resource allocations esti-240
mated by freely fitted allocation weights, indicating that behavioural estimation performance could241
be successfully inferred from an analysis of accumulated rewards.242
External reward243
In Experiment 1, the stimulus colour associated with a high reward (15 points) was expected to244
accumulate greater value relative to the colour associated with a low reward (5 points). On average,245
observers earned points on 80% of trials when the high-reward stimulus was probed and 66.5% of246
trials when the low-reward stimulus was probed, leading to an average accumulation of 602 and 166247
points, respectively. To apply the proposed RL model to each observer’s data, we combined individual248
trial-by-trial external rewards with estimates of internal confidence (Equation9).249
Figure 5A shows the average trajectory of resource allocation across trials (see Fig.S3A for indi-250
vidual trajectories). This trajectory shows an early shift in resource allocation towards the preferred251
item, followed by a stable plateau. For ease of visualisation, trajectories are presented as directed252
towards the preferred object, defined as the object receiving a greater average resource allocation253
across all trials.254
Crucially, since our RL account is grounded in the same Neural resource model previously em-255
ployed to fit the psychophysical data and quantify resource allocation (Fig.1A & C), we can directly256
compare estimates across the two models. Here we focus on the comparison of estimated resource al-257
locations, while ML estimates and comparisons for the other parameters are shown inSupplementary258
Information (Fig. S4A). Importantly, the freely estimated resource allocation (observed allocation in259
Fig. 1C) is based on behavioural errors only, with no information about rewards, and so can serve as a260
benchmark for evaluating performance of the RL model. As shown in Figure5B, we observed a strong261
positivecorrelationbetweenthefreelyestimatedallocationparameterandthemeanallocationderived262
from the history of accumulated rewards (r= 0.976, 95% CI = [0.941, 0.988],BF10 = 3.34 × 1016).263
8
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
E FVarying reward weight (c) Varying leak (y)
Trial number
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Resource fraction
1 10 20 30
0
1 10 20 30
B C
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
high
reward
stimulus
low
reward
stimulus
D
MAD
Neural resource
allocation Stochastic spiking
A
Relative value (νt )
Gain factor (α)
Uncertainty Error feedback Awarded points
Trial number
0
Leak
max
0
Reward weight
max
Trials Trials
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Resource fraction
Resource fraction
+15 points
νt = (1 – y)Δt–1 + Δt
Reward
Trial1
Trial2
...
Trialt
Weight
κrˆ rextϵfb
Δ1
Δ2
Δt
...=
10
Relative value (νt )
Figure 4: The neural resource allocation account applied to the motion estimation task. A) On
each trial, motion directions of the two stimuli are encoded in the spiking activity of populations of
neurons, with mean activity determined by the relative allocation of resources to stimuli. Based on
the cue colour, one of the populations is decoded to yield an estimated direction with an associated
uncertainty that varies from trial to trial. The uncertainty of the estimate, the accuracy feedback
(if present) and any points awarded represent different forms of intrinsic and external reward, which
are combined as a weighted sum into a composite reward (∆t). This composite reward is then used
to update the relative value (ν) associated with the stimulus colours. Finally, this relative value is
transformed via an exponential mapping into a neural gain factor (α), which controls the fraction of
resources allocated to each stimulus on the subsequent trial. In this framework, resource allocation
is entirely driven by the history of accumulated rewards. B) Throughout the reported experiments,
the two colours of stimuli are systematically related to different intrinsic or external rewards, so the
relative value assigned to each colour progressively diverges over the sequence of trials. C) Fraction
of total resources allocated to the high-reward stimulus over trials, based on relative value shown in
B. The dashed line represents the mean allocation across all trials (∼65%). The remaining resources
(∼35%) are allocated to the low-reward stimulus. D) Unequal resource allocation is reflected in
differences in the mean absolute error across trials when the high- or low-reward stimulus is cued for
report. E) Larger reward weight (c), with a constant leak factor, results in a stronger preference for
one stimulus over the other in terms of resource allocation. F) Larger leak factor (y), with a constant
reward weight, leads to a weaker preference.
9
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
The close alignment of these two distinct methods suggests that the history of accumulated rewards264
can effectively account for resource allocation in this task.265
Intrinsic reward: Perceived accuracy266
In Experiment 2 we manipulated the response error presented in feedback to influence the perceived267
difficulty of reproducing stimuli of each colour. This manipulation resulted in participants experi-268
encing systematically larger feedback errors for stimuli of one colour (magnified feedback MAD =269
0.837) than the other (minified feedback MAD = 0.233). To model this data within our RL account,270
we assume that the feedback on each trial provided an intrinsic reward that was associated with271
the corresponding stimulus colour. This assumption is supported by evidence that observers’ subjec-272
tive evaluations tend to favour smaller feedback errors over larger ones because they suggest higher273
accuracy [56].274
In the model, rewards derived from feedback were integrated with those derived from internal275
confidence. Figure 5D illustrates the mean trajectory of resource allocation across trials (Fig.S3B276
shows individual trajectories for example observers). The model fits again indicate that resources were277
unequally allocated between stimuli, although the bias is smaller than observed in the experiment278
with external rewards.279
Comparing the estimated allocation derived from the RL account to the freely fitted allocation280
parameter in the Neural resource model (Fig.5E), we found a strong positive correlation (r= 0.911,281
95% CI = [0.777, 0.958],BF10 = 3.44 × 107). Consistent with the findings from Experiment 1, the282
correspondence between these two distinct approaches indicates that the history of accumulated in-283
trinsic rewards provides an explanation for resource allocation in the task with manipulated feedback.284
ML estimates and comparisons for the other parameters are shown inSupplementary Information285
(Fig. S4B).286
Intrinsic reward: Estimation difficulty287
Experiment 3 investigated the role of objective difficulty in the representation of motion informa-288
tion. On most trials, two stimuli with different coherence levels (85% and 45%) were presented. We289
hypothesized that internal confidence in each item’s motion direction, reflecting a metacognitive esti-290
mate of accuracy, functions as an intrinsic reward which observers associate with each item’s identity291
(i.e., colour) [48]. To model the psychophysical data in the simultaneous presentation condition, we292
estimated internal confidence by exploiting the close coupling between uncertainty and trial-to-trial293
variability in error within the Neural resource model. Informed by observed response error on each294
trial, we derived the posterior probability distribution of likelihood precision and used the most prob-295
able precision as a basis for intrinsic reward (Eq.12). While internal confidence was also incorporated296
in this way when modelling data from the previous two experiments, in Experiment 3 it was the sole297
source of reward influencing resource allocation.298
Figure 5G & J show the mean trajectories from Experiment 3a & 3b, respectively. Again,299
we visualised the obtained individual trajectories in example participants (Fig.S3C & D). In both300
experiments, all parameter estimates obtained with the Neural resource model and the RL account301
strongly covaried (Fig. 5H & K & Fig. S4C & D). Importantly, this was also true for estimates302
of resource allocation. Across the two experiments, we found very consistent and strong positive303
correlations between the freely estimated allocation parameter and the mean allocation derived from304
the history of accumulated rewards (Exp3a:r = 0.833, 95% CI = [0.589, 0.923],BF10 = 1.29 × 104;305
Exp3b: r = 0.843, 95% CI = [0.575, 0.933],BF10 = 3.67 ×103). Consistent with the findings from the306
first two experiments, this strong correspondence indicates that the history of accumulated intrinsic307
rewards based on internal confidence effectively accounts for resource allocation in this task.308
Changes in resource allocation predict response precision309
Our finding that freely estimated resource allocation strongly correlates across participants with310
resource allocation based on the history of rewards supports the conclusion that human resource311
allocation is guided by a reward-driven value assignment to objects in the visual environment. To312
further substantiate this claim, we investigated whether variability in resource allocation across trials313
within individual participants, derived from the RL model, predicts the magnitude of their response314
errors.315
10
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
12 04 06 08 0 100
0.2
0.3
0.4
0.5
0.6
0.7
0.8
12 04 06 08 0 100
0.2
0.3
0.4
0.5
0.6
0.7
0.8
A
D
Resource fractionResource fraction
Trial number
Trial number
15 0 100 150
0.2
0.3
0.4
0.5
0.6
0.7
0.8
15 0 100 150 200
0.2
0.3
0.4
0.5
0.6
0.7
0.8
G
J
Resource fractionResource fraction
Trial number
Trial number
0 1
0
0.2 0.4 0.6 0.8
freely estimated
resource allocation
Favour
high reward
Favour
low reward
0 0.2 0.4 0.6 0.8 1
freely estimated
0
0.2
0.4
0.6
0.8
1
Favour
error-mini/f_ied
Favour
error-magni/f_ied
reward predicted
r = .976
r = .911
B
E
External reward
(Exp 1)
Perceived accuracy
(Exp 2a)
Estimation difficulty
(Exp 3a)
Estimation difficulty
(Exp 3b)
reward predicted0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
freely estimated
reward predicted
r = .833
H
0
0.1
0.2
0.3
0.4
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
freely estimated
reward predicted
Favour
high coherence
Favour
low coherence
r = .843
K
0
0.1
0.2
0.3
0.4
Favour
high coherence
Favour
low coherence
0
0.1
0.2
0.3
0.4
0.5
0
0.1
0.2
0.3
0.4
0.5
C
F
I
L
MAD
MAD
MAD
MAD
Figure 5: Modelling results. A) Resource allocation across trials inferred by the RL account in the
external reward experiment (Experiment 1). Circles represent the mean fraction of resources across
observers allocated on each trial towards the overall preferred object. B) Correlation between mean
allocations inferred by the RL account and freely estimated allocations. The red line shows predictions
of the fitted linear regression model, and the shaded area indicates the 95% CI. C) Difference in MAD
between trials on which the probed item had below- and above-median resources allocated to it, as
estimated by the RL account. On average, MAD was larger when less resource was allocated to the
probed stimulus. D–F) Same as above, but for the perceived accuracy experiment (Experiment 2).
G–I) Online estimation difficulty experiment (Experiment 3a). J–L) Lab-based estimation difficulty
experiment (Experiment 3b, simultaneous condition).
11
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
To investigate this, we performed a median-split analysis for each observer based on the esti-316
mated fraction of allocated resources towards the item associated with larger reward (i.e., individual317
trajectories similar to those shown in Fig.5, left column). Specifically, we calculated the MAD of re-318
sponse errors for trials with above- and below-median resource allocation, separately for trials where319
the high- or low-reward item was tested. We hypothesised that MAD would be greater on trials320
where the RL model indicated that below-average resource was allocated to the probed item, i.e.,321
below-median trials when the high-reward object was probed and above-median trials when the low-322
reward object was probed. To test this, we computed a composite score for each observer equal to the323
sum of the signed difference in MAD between below- and above-median trials when the high-reward324
item was probed and the signed difference in MAD between above- and below-median trials when the325
low-reward item was probed. In all four experiments, the composite scores indicated that lower pre-326
dicted resource allocation, based on the history of rewards, corresponded on average to larger MAD327
of response errors (Fig.5C, F, I & L) This was confirmed with one-sided t-tests against zero which328
provided moderate to extreme evidence for a difference in the hypothesised direction: Experiment 1329
(BF10 = 5.55 × 104, δ = 1.096, 95% CI = [0.635, 1.570]); Experiment 2 (BF10 = 564, δ = 0.866, 95%330
CI = [0.402, 1.345]); Experiment 3a (BF10 = 3.74, δ = 0.441, 95% CI = [0.073, 0.874]); Experiment331
3b (BF10 = 192.4, δ = 0.919, 95% CI = [0.377, 1.487]).332
Discussion333
In the present study, we investigated how human observers represent stimuli associated with varying334
levels of external and intrinsic reward. Across three psychophysical experiments, we paired object335
identities with different rewards and found observers developed higher estimation accuracy for the336
items associated with larger rewards. In two additional experiments, we demonstrated that this effect337
was driven by competition for attentional, rather than mnemonic, resources. To provide a mechanistic338
explanation of this behaviour, we developed a neural model incorporating a reinforcement learning339
rule that directs resource allocation towards more rewarding stimuli. Our key finding is that a340
resource allocation mechanism based solely on the history of accumulated rewards is sufficient to341
explain differences in estimation performance based on intrinsic as well as external rewards.342
In the first experiment, we investigated the effects of external rewards on representational fidelity343
in a motion reproduction task. Both the psychophysical results and computational modelling provided344
compelling evidence that observers allocated more processing resources to objects associated with a345
higher reward, resulting in more precise reproduction of high-reward stimuli compared to low-reward346
ones. This finding aligns with a broad body of research demonstrating that external rewards, such347
as points or money, influence various aspects of information processing, including the allocation of348
attentional resources [17,18] and working memory [31], while also facilitating motor responses, such349
as hand movements and saccades, towards rewarding stimuli [56–58].350
In contrast to external rewards, the influence of intrinsic rewards on representational fidelity351
has received comparatively less attention. Building on the premise that accuracy itself is rewarding352
[38, 43, 44], we conducted two experiments that manipulated perceived accuracy (via feedback)353
and objective estimation difficulty (via signal strength) in a motion reproduction task. We found354
convergingevidenceatboththebehaviouralandcomputationallevelindicatingthatobserversallocate355
more neural resources towards, and consequently have a more precise internal representation of,356
objects associated with better estimation performance – whether induced by artificially manipulated357
feedback (Experiment 2) or by objective differences in stimulus discriminability (Experiment 3).358
We argue that observer derived intrinsic reward from confidence in their responses and feedback359
on their accuracy. In our tasks, the association of these rewards with the distinguishing feature of360
the presented objects (i.e., colour) leads to a bias in resource allocation, favouring subsequent stimuli361
that share the same feature. This proposal aligns with the notion that perceptual features linked to362
rewards are prioritised in sensory processing due to their incentive salience (e.g., [59,60]). Moreover,363
neural evidence supports this notion by demonstrating that sensory representations are modulated364
by the history of rewards, underscoring the impact of reward associations on perceptual processing365
[61]. To make our proposal concrete, we developed a mechanistic model grounded in the principles366
of population coding and reinforcement learning. Specifically, our reinforcement learning account367
operates by analyzing accumulated rewards and allocating proportionally more resources to objects368
previously associated with higher rewards. We found this model closely replicated resource allocation369
12
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
estimates obtained from freely fitted parameters, suggesting that the history of accumulated intrinsic370
and extrinsic rewards is sufficient to account for the observed patterns of resource allocation.371
A key novel finding from the proposed model is that both internally generated and externally372
manipulated (via feedback) estimates of accuracy, when associated with an object’s distinguishing373
feature, can bias the subsequent processing of objects that share that feature. While the role of374
external feedback in reinforcing behaviours more generally has been widely acknowledged (e.g., [62,375
63]), recent research demonstrates that internal confidence can similarly reinforce behaviour even in376
the absence of explicit feedback. For instance, improvements in sensitivity in a perceptual learning377
task have been observed without external feedback [64]. Guggenmos et al. [48] proposed that such378
learning is driven by confidence prediction errors – discrepancies between an individual’s current379
confidence and their expected confidence level. Notably, the neural substrate for these prediction380
errors has been identified in the striatum (see also [65]), a brain region traditionally linked to reward381
processing. Our findings contribute to a growing body of literature that highlights the importance of382
metacognition [53] and self-reinforcement [66] as critical processes in the pursuit of rewards.383
Using a range of tasks similar to the one used in this study, previous research has demonstrated384
that humans possess knowledge about the uncertainty with which individual items are reported385
(e.g., [52, 67,68]). Population coding models [69,70] have been particularly effective in capturing386
subjective confidence [71], as well as proxies such as response latency [72]. Within the population387
coding framework, an ideal observer of spiking activity would derive their confidence estimate –388
whether internal or explicitly reported – based on the precision of the posterior distribution, which389
represents the probability of the stimulus value given the observed neural activity. In the Neural390
resource model [22,51], the precision of the posterior (or likelihood, assuming a uniform prior over391
stimulus space) varies from trial to trial, as a result of stochastic variation in the number of spikes392
available for decoding. We calculated the most probable estimate of posterior precision on each trial393
to serve as an indicator of internal confidence. On this basis, the model successfully recreated freely394
estimated resource allocations based on our data.395
An important insight from our modelling is that the observed resource allocation deviated from396
the pattern required to minimize overall response or feedback error variability, resulting in poorer397
overall performance. In a similar vein, a recent study [73] provided theoretical and empirical evidence398
suggesting that sensory processing is optimised to maximize fitness (i.e., rewards), rather than to399
ensure perceptual accuracy. Supporting this idea, neurophysiological studies have demonstrated that400
early sensory systems encode both sensory information about a stimulus and non-sensory information401
regarding the behavioural relevance of stimuli [3, 74]. Embedding stimulus-reward contingencies402
within the sensory representation of a stimulus facilitates the prioritisation of behaviourally relevant403
information during encoding. These previous findings may help explain why our observers’ allocation404
strategies were not optimized for accuracy in the task, however they were also not optimized for405
maximizing rewards. In the experiment with external rewards we found that observer’s allocated406
resource more equally between items than would be predicted by a reward-maximizing strategy. The407
RL model captured the observed allocation strategy based on a weighted combination of points-408
based external rewards and confidence-based intrinsic rewards – this combination of factors could409
lead observers to maintain a certain level of performance even for stimuli associated with low external410
reward. When considered across all experiments, our results point to a reward-driven allocation of411
resources that, while prioritising reward-related stimuli, is not optimized to obtain rewards in the412
specific tasks we investigated.413
Our results also contribute to prominent theories in neuroscience, psychology, and economics414
[75–78] which consider how humans and other animals link the mental effort required for a task with415
the value of its outcome (i.e., the reward). Behavioural studies demonstrate that, when faced with416
tasks offering equal rewards but varying in effort, humans tend to avoid those perceived as more417
difficult [79,80]. Based on this, it has been argued that cognitive effort is experienced as carrying418
disutility, i.e., acting as a discount factor on expected rewards [78,81]. This hypothesis has been419
substantiated by the observation that cognitive effort reduces neural responses to rewards following an420
effortful task [82]. In the present results, perceived (Experiment 2) or objective difficulty in estimation421
(Experiment 3) similarly appears to have discounted or reduced the subjective value of a stimulus,422
leading observers to prioritiseeasier – and thus in principle morerewarding – items for encoding.423
However, because observers had no control over which stimuli were selected for test, this allocation424
strategy did not result in more reward in our tasks and could even be counterproductive. This raises425
13
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
the wider question of whether humans may similarly allocate effort suboptimally, driven by intrinsic426
reward, in other situations where they have limited control over what information will subsequently427
become relevant.428
14
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
Materials
and methods429
Apparatus430
Intheonlineexperiments, taskswerepresentedviawebbrowsersonobservers’personalcomputersand431
were coded in JavaScript and HTML Canvas. In the laboratory experiment, stimuli were displayed432
on a 69 cm gamma-corrected LCD monitor with a refresh rate of 60 Hz. Observers were seated in433
a dark room and viewed the monitor from a distance of 60 cm, with their heads supported by a434
forehead and chin rest. Eye position was monitored online at 1000 Hz using an infrared eye tracker435
(SR Research). Stimulus presentation and response registration were controlled by a script written436
in Psychtoolbox [83, 84] and executed in Matlab (The Mathworks Inc.). Responses were collected437
using a computer mouse.438
Participants439
A total of one hundred ninety-six naive observers (110 females, 80 males, 6 preferred not to say;440
Mage = 27.6, SDage = 5.0) took part in the study after giving written informed consent in accor-441
dance with the Declaration of Helsinki. All observers reported normal colour vision and normal or442
corrected-to-normal visual acuity. For the online experiments, observers were recruited using Prolific443
(https://www.prolific.co) and were remunerated £6 per hour for their participation. For the lab-444
oratory experiments, observers were recruited through the Cambridge Psychology research sign-up445
system and were remunerated £10 per hour.446
For the online experiments, we used a Bayesian stopping rule to determine the sample size. The447
stopping rule guides when enough evidence has been gathered to support a decision, thus optimizing448
the sample size. In particular, we continued testing observers until we obtained strong evidence, as449
estimated by the Bayes Factor, in favour of eitherH0 (BF10 ≤ 0.1, indicating evidence supporting no450
difference between the two conditions of interest) orH1 (BF10 ≥ 10, indicating evidence supporting451
a difference between the two conditions). If neither hypothesis was supported, data collection ceased452
after reaching 100 observers. In Experiment 1, we assessed differences in mean absolute reproduction453
error in the analogue report task between stimuli associated with high and low reward, which were454
the conditions of interest for the Bayesian stopping rule. In Experiment 2, we tested for differences in455
mean absolute reproduction error between error-minified and error-magnified stimuli. In Experiment456
3, we compared mean absolute reproduction errors on trials where stimuli were presented in different457
colours but with equal coherence. For the laboratory experiment (Experiment 3b), we aimed to collect458
a number of participants similar to that in Experiment 3a. In total, thirty observers participated in459
Experiment 1. Twenty-five observers participated in Experiment 2a, and one hundred participated in460
Experiment 2b. Finally, twenty-two and nineteen observers participated in Experiments 3a and 3b,461
respectively.462
Stimuli463
The stimuli in this study were random dot kinematograms (RDK). On each trial, two RDK stimuli,464
each consisting of 40 dots, were presented within a circular aperture. A percentage of the dots465
(specified below) moved in a coherent direction, while the remaining dots moved in random but466
consistent directions within the aperture [85]. When a dot exited the aperture, it was replaced by a467
new dot at the aperture’s edge, maintaining a constant dot density. In all experiments, one stimulus468
was always green (RGB colour values; online: 47, 195, 129, lab: 0, 199, 128) and the other was469
always blue (online: 24, 199, 233, lab: 0, 187, 241). In Experiment 3b, the same observers completed470
two identical tasks, with stimuli presented either simultaneously or sequentially. In this experiment,471
stimuli were either green and blue or orange (237, 154, 0) and magenta (255, 79, 208), balanced across472
observers and presentation conditions. Across all tasks, stimuli were presented against a mid-grey473
background.474
For the online experiments, all measures in pixels are reported for a 1920 x 1080 resolution and475
60 Hz refresh rate. When a different resolution or refresh rate was detected, all measurements of size,476
positioning and speed were automatically adjusted to maintain consistency in stimuli presentation477
across different display settings. The stimulus aperture was 105 pixels in diameter, and each dot had a478
radius of 3 pixels. Two apertures were positioned 220 pixels to the left or right of the screen centre. On479
each frame, the dots were shifted by 3 pixels in a specific direction. In the laboratory experiment, two480
15
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
apertures (1.4 dva radius) were presented horizontally aligned with the fixation annulus, positioned481
at 5 dva to the left and right. Each dot was 0.15 dva in diameter and travelled at 4 dva/sec speed.482
Procedure and task483
In all experiments, observers completed an analogue report task [50]. Each trial began with the484
presentation of a central fixation annulus. In the laboratory experiment, gaze direction was monitored485
using an eye-tracking camera, and observers were required to maintain gaze fixation within a radius486
of 2◦ around the central annulus for 500 ms before the trial could proceed. After achieving stable487
fixation, the fixation annulus changed appearance (i.e., became thinner) to signal that the memory488
array would be presented in 500 ms. In the online experiment, the appearance of the fixation annulus489
changed after a fixed interval of 500 ms. The sample array, consisting of two RDK stimuli, was then490
shown for 750 ms, followed by a 1000 ms delay period. A centrally presented colour cue subsequently491
indicated which of the previously presented stimuli, distinguished by colour, was the target that the492
observers should recall and report the direction of.493
Once observers were ready to give their response, they could begin moving the cursor with a494
mouse or trackpad, which triggered the appearance of a randomly oriented white arrow within the495
central annulus. Observers were instructed to align the direction of the arrow with the previously496
presented motion direction of the cued stimulus. In the online experiment, responses were confirmed497
by pressing the spacebar, while in the laboratory experiment, they were confirmed by pressing the498
right mouse button.499
Experiment 1: External reward500
In Experiment 1, we investigated how extrinsic rewards influence motion reproduction precision.501
To this end, observers received 15 points for reporting a motion direction within50◦ of the target502
direction when the target was of one colour (e.g., green), and 5 points when it was of the other colour503
(e.g., blue). Responses that were more than 50 degrees from the target direction did not receive any504
points. The colour associated with high versus low reward was chosen randomly for each observer at505
the beginning of the experiment. Both stimuli were presented with the same coherence (85%) and506
no error feedback was provided. Accumulated points were converted to a bonus payment at the end507
of the experiment, and observers were informed of this at the beginning of the experiment. Overall,508
they could collect a maximum of one thousand points, which was equivalent to a bonus payment of509
£1.50. Observers completed twenty practice trials and one hundred experimental trials. The task510
took approximately 20 minutes to complete. The trials were divided into two equal blocks with a511
break of at least one minute in between, and the complete testing session lasted approximately 15512
min.513
Experiment 2: Perceived accuracy514
Experiments 2a and 2b were designed to investigate the role of feedback on the precision of motion re-515
production. The two experiments were identical except for the presentation of stimuli. In Experiment516
2a, two stimuli were presented simultaneously for 750 ms at two distinct locations. In Experiment517
2b, the same two locations were used, but the stimuli were presented sequentially, each for 750 ms.518
In Experiment 2b, the order of presentation and the colour cues were balanced across conditions.519
In both experiments, at the end of each trial, following the response, we presented feedback in520
the form of the reported and target motion directions. Unbeknownst to participants, we manipulated521
the feedback by artificially magnifying errors for one stimulus colour. This was done by shifting the522
presented target motion direction (θ∗) away from the reported direction (ˆθ) and thereby inflating the523
presented response error for the designated “difficult” item. This was done according to the following524
equation:525
θ∗ = θ ± 50 sin(ˆθ − θ), (1)
where θ is the true motion direction, and all angles are expressed in degrees. Similarly, we system-526
atically minimized the error in the feedback for the other colour, designated as the “easy” item. The527
magnification and minimization of errors were randomly assigned to one of the two colours (i.e., green528
or blue) for each observer at the beginning of the experiment. The RDK stimuli were presented with529
85% coherence. At the beginning of the experiment, during the instructions, we informed observers530
that individuals might vary in their ability to perceive the motion of stimuli of different colours.531
16
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
This was intended to make any perceived differences in difficulty appear plausible. At the end of the532
experiment, observers were debriefed and the true purpose of the study was revealed. In Experiments533
2a and 2b, observers completed twelve practice trials and one hundred experimental trials. The trials534
were divided into two equal blocks with a break of at least one minute in between, and the complete535
testing session lasted approximately 15 min.536
Experiment 3: Estimation difficulty537
In Experiments 3a and 3b, we investigated the role of stimulus discriminability on the fidelity of538
visual representations. To achieve this, we presented two stimuli with different levels of coherence on539
67% and 70% of all trials in Experiments 3a and 3b, respectively. Specifically, the stimulus of one540
colour was presented with 85% (high) and the stimulus of the other colour with 45% (low) coherence.541
These variable-coherence trials were randomly interleaved with trials where both stimuli had the542
same intermediate (65%) coherence. The assignment of low and high coherence to specific colours543
was randomized for each observer at the beginning of the experiments. No feedback was provided544
during these experiments.545
Experiment 3a was conducted online, while Experiment 3b took place in the laboratory. In546
Experiment 3a, on all trials stimuli were presented simultaneously. In Experiment 3b, the same547
observers performed the task with both simultaneous and sequential presentations, with the order of548
these conditions counterbalanced across participants. To prevent transfer effects between conditions,549
we used different colour combinations: in one condition, stimuli were presented in green and blue,550
while in the other, they were presented in orange and magenta. The colour combinations were551
randomly assigned to each presentation condition.552
In Experiment 3a, observers completed twenty practice trials and one hundred fifty experimental553
trials. The trials were divided into two blocks with a mandatory break of at least one minute in554
between, resulting in a total testing session duration of around 15 minutes. Experiment 3b (i.e.,555
the laboratory experiment) consisted of four hundred trials, divided into eight equal blocks. In half556
of the blocks, stimuli were presented simultaneously, while in the other half, they were presented557
sequentially. Half of the observers completed the simultaneous blocks first, followed by the sequential558
blocks, and vice versa for the other half. At the beginning of each block sequence (i.e., simultaneous559
or sequential task), observers performed twenty practice trials to familiarize themselves with the560
task. In Experiment 3b, observers were required to maintain central fixation throughout the stimulus561
presentation. If gaze deviated by more than2◦, a warning message appeared on the screen, and the562
trial was aborted and restarted with newly randomized stimuli. Completing Experiment 3b took563
approximately 90 minutes.564
Analysis565
All stimulus values were analysed and are reported with respect to the circular parameter space566
of possible motion directions, [−π, π) radians. Response error for each trial was measured as the567
angular difference between the reported and target motion directions. To quantify the dispersion of568
response errors, we calculated the mean absolute deviation (MAD) across trials for each condition569
and observer. Higher MAD values indicate greater average reproduction error.570
To compare differences in performance across conditions, we used Bayesian hypothesis tests,571
implemented in JASP [86] with the default Jeffreys-Zellner-Siow prior on effect sizes [87]. We report572
Bayes factors which compare the relative predictive adequacy of two competing hypotheses (e.g.,573
alternative and null) and quantify the change in belief that the data bring about for the hypotheses574
under consideration [88]. For example,BF10 = 10 indicates that the data are ten times more likely575
to occur under the alternative hypothesis (i.e., there is a difference) than under the null hypothesis576
(i.e., there is no difference). Evidence for the null hypothesis is indicated byBF10 < 1, in which577
case the strength of evidence is indicated by1/BF10. Evidence assessed via the Bayes Factor is best578
understood as a ratio-scaled value ranging from 0 to infinity. For clarity in communication, we also579
use an interpretative framework for Bayes Factor values, following the classification scheme outlined580
by Lee and Wagenmakers [89]:BF = 1 as no evidence; 1< BF < 3 as weak or anecdotal evidence; 3581
≤ BF < 10 as moderate evidence; 10≤ BF < 30 as strong evidence; 30≤ BF < 100 as very strong582
evidence; BF ≥ 100 as extreme evidence. It is critical to note that while we utilize these discrete583
categories, they are arbitrary and should serve only as rough guidelines. Along with the Bayes factor,584
17
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
we report the median of the posterior distribution over the effect size (δ) and the accompanying 95%585
credible interval (95% CI).586
Neural resource model587
Weanalysedobservers’responseerrorswithanestablishedmodelbasedontheprinciplesofpopulation588
coding [22, 51, 90, 91]. In this framework, a visual stimulus (θ) is encoded by an idealized population589
of neurons whose activity is determined by their individual tuning functions. All neurons are assumed590
to share the same bell-shaped von Mises tuning function,591
fi(θ) = exp(κ(cos(θ ⊖ φi) − 1)), (2)
where κ determines the tuning concentration, and⊖ is subtraction on a circle. These tuning functions592
are translated through the feature space to peak at each neuron’s preferred value (φi), such that they593
provide dense uniform coverage of the entire feature space. In a population ofM neurons, the average594
response of theith neuron in response to a stimulus valueθ is obtained by scaling the output of the595
tuning function with the population’s mean total firing activity (γ),596
¯ni(θ) = γ
M fi(θ). (3)
If activity associated with multiple stimuli is combined or normalized [92] at a population levelγ,597
Equation 3 implements a form of limited resource [22]. The spike count produced by each neuron is598
drawn from a Poisson distribution,599
ni(θ) ∼ Poiss(¯ni(θ)), (4)
and the decoded motion direction estimate is obtained by maximum likelihood estimation of the600
population spiking activity,n:601
ˆθ = arg max
θ
p(n|θ). (5)
The resulting distribution of decoding errors, for a given total number of spikesm = Σini ∼ Poiss(γ),602
is described as a mixture of von Mises (ϕ) distributions,603
p(ˆθ|θ, m) =
Z
p(r|m, κ)ϕ(ˆθ; θ, rκ)dr, (6)
with604
p(r|m, κ) = I0(κr)
(I0(κ))m rψm(r), (7)
where rψm(r) is the probability density function for resultant lengthr of a uniform random walk of605
m steps. The full distribution of response errors predicted by the model is a mixture of probability606
distributions p(ˆθ|θ, m), weighted with the probability of obtainingm spikes. For a complete derivation607
of the distribution of response errors, see Bays [22] and Bays [71].608
The model has two free parameters, the population’s mean total firing activity (γ), and the609
concentration of the tuning function (κ). In scenarios when multiple objects (N) need to be repre-610
sented, the total resourceγ is typically divided equally among objects (i.e.,γ/N). Here we extend611
this basic approach by incorporating an allocation parameter, or gain factorα, which controls the612
neural activity allocated to one object (see also [22]). Without loss of generality, we fixed the gain613
factor for one object at 1, while treating the gain factor for objectj (see below for details of each614
experiment) as a free parameter when fitting the model to the data. The neural activity allocated615
to objectj can be expressed aspαγ, where pα = α/(1 + α) represents the proportion of total neural616
activity. The remaining activity (proportion1 − pα) is allocated to the other object.617
In Experiment 1, which involved the manipulation of external reward, the allocation weight for618
the high-reward item was fixed at 1, while the allocation weight for the low-reward item was freely619
estimated. In Experiment 2, in which we manipulated perceived accuracy, the allocation weight for620
the error-magnified item was fixed at 1, and the allocation weight for the error-minified item was freely621
estimated. In Experiment 3, which involved estimation difficulty manipulation, we simultaneously622
fitted responses on variable- and equal-coherence trials. Building on our previous work [93], we623
assumed that the strength of the motion signal is controlled by the coherence level of RDK stimuli,624
such that the value encoded into the neural population is given by625
¯θ ∼ WN (θ, σ2
coherence), (8)
18
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
where WN is a wrapped normal with meanθ and variance σ2
coherence accounting for additive Gaus-626
sian noise. For simplicity, we considered 85% coherence (high coherence) as perceptually noiseless627
and further assumed that σ2
45% > σ 2
65%, where 45% was the low-coherence level, and 65% was the628
intermediate-coherence level used in the equal-coherence trials. Additionally, the allocation weight629
for the low-coherence colour (45% and half of 65% stimuli) was fixed at 1, while the allocation weight630
for the high-coherence colour (85% and half of 65% stimuli) was freely estimated across variable-631
and equal-coherence trials. In other words, on equal-coherence trials, differences in response precision632
were explained solely by the allocation weight. In contrast, on variable-coherence trials, the allocation633
weight and perceptual noise jointly accounted for variations in response precision.634
Optimal resource allocation635
To identify the optimal levels of resource allocation, we conducted a simulation study. For each636
observer, we simulated model predictions using the best-fitting parameters of the Neural resource637
model, specifically the mean total number of samples (γ) and the precision of a single sample (ω1),638
along with a grid of potential allocation weights.639
For Experiment 1, we analytically determined the number of points based on the model-predicted640
response distributions under different allocation weights. The allocation weights were tested across641
a grid ranging from 0.001 to 2 in increments of 0.01, resulting in 200 distinct values. The optimal642
allocation weight was identified as the value that maximized the total reward across both high- and643
low-reward items.644
For Experiment 2, we numerically simulated the variance (i.e., squared circular SD) of feedback645
errors using the same grid of allocation weights employed in Experiment 1. This analysis was based on646
107 simulated trials drawn from the error distribution predicted by the model. The optimal allocation647
weight was determined as the value that minimized the total variance of feedback errors across both648
error-minified and error-magnified items.649
For Experiment 3, we analytically modelled the response variance on the variable coherence trials650
(i.e., forthehigh-andlow-coherenceitems)acrossarangeofallocationweights. Weemployedagridof651
200 values, spanning from 0.01 to 6 in increments of 0.03. The optimal allocation weight was identified652
as the value that minimized the total response variance for both high- and low-coherence items (i.e.,653
85% and 45% coherence). In Experiment 3a, the simulation yielded values around αoptimal = 1654
for all but one outlier observer, for whom the estimate reached the endpoint of the examined grid655
(αoptimal = 6). This occurred due to the model estimating high levels of perceptual noise for medium-656
and low-coherence stimuli, suggesting that minimizing overall error would be achieved by allocating all657
resources to the high-coherence object. We exclude this data point in Figure3D, and the comparison658
of observed and optimal allocations is based on the remaining observers. Including this observer’s659
data and performing a non-parametric test did not change our conclusions.660
Reinforcement learning account of resource allocation661
We developed a quantitative model to describe how the history of accumulated rewards from multiple662
objects influences subsequent resource allocation towards those objects. The proposed model extends663
the Neural resource model by incorporating a simple reinforcement learning (RL) rule, which directs664
behaviour towards more rewarding stimuli. Importantly, our model applies the same RL rule to both665
external and intrinsic rewards. In the standard RL framework, analysis typically focuses on external666
rewards, such as points or money, which are provided by the environment as a direct response to the667
agent’s actions. Our model broadens this scope to include intrinsic rewards - those that are inherently668
pleasurable and drive behaviour - such as the sense of being accurate in a task.669
Drawing on the conducted experiments and the motion direction reproduction task, the general670
overview of this account is as follows: on a particular trial, the received points or money (extrinsic re-671
ward), perceived accuracy due to feedback (intrinsic reward), and an individual’s internal confidence-672
based estimate of precision (intrinsic reward) collectively update the value (ν) of a particular object673
associated with these rewards. This computed value influences the allocation of cognitive resources674
to that object in subsequent encounters, thereby modulating the precision with which the object is675
represented.676
Formally, in the simplest scenario involving only two objects, rather than defining the accumu-677
19
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
lated reward for each object separately, we can define the relative accumulated reward on trialt678
as:679
νt = (1 − y)νt−1 + ∆t, (9)
680
∆t = I±(c1rextt + c2exp(−|ϵfbt |) + c3κˆrt), (10)
where y is a leak component,rextt is the number of received points,ϵfbt is feedback error,κˆrt is an681
estimate of internal confidence, and{c1, c2, c3} are respective weights accounting for different scales of682
rewards and the types of rewards prioritised by observers. The variableI± takes the value of +1 or -1683
depending on the object identity, i.e., reproduction of the green or blue item, with the assignment of684
conditions being arbitrary. Positive values ofν indicate a higher relative value for one item (I = +1),685
while negative values ofν indicate a higher relative value for the other item (I = −1). To account for686
the fact thatν can range from−∞ to +∞, we transform it into gain parameterα using the following687
equation:688
α(ν) = e2ν. (11)
This allows us to compute the proportion of spiking activity allocated to the item identified as689
sign(I) = +1, given by α/(1 + α), with the remaining spiking activity allocated to the other item.690
When ν = 0, such as at the beginning of the task, both items are perceived as having equal value,691
resulting in an equal distribution of neural resources between them.692
The leak component (y) functions as a temporal filter, modulating the influence of past rewards693
on resource allocation. Wheny = 1, the system entirely ignores accumulated past values, making the694
value of an object - and thus resource allocation in the next trial - rely exclusively on the reward from695
the most recent trial. Conversely, wheny = 0, the accumulated value is fully retained and integrated696
with the most recent reward. The necessity of the leak component becomes particularly evident in697
scenarios where rewards are discontinued: a non-zero leak will gradually equalize the relative value698
and resource allocation across objects, returning them to a state of equilibrium.699
The first reward component of Equation9, rext, reflects the experimental manipulation of Exper-700
iment 1. In this experiment, observers received 15 points for responses with an error of less than50◦701
for high-reward objects and 5 points for low-reward objects. Responses with an error greater than702
50◦ received no points. When applying this model to the data, we used values ofrext = {0.15, 0.05, 0}703
to represent the rewards for high-reward, low-reward, and no-reward trials, respectively.704
The feedback component of the model (ϵfb) addresses the experimental manipulation of Exper-705
iment 2. In this experiment, we systematically manipulated feedback error by reducing it for one706
stimulus and increasing it for another. We hypothesized that feedback serves as an intrinsic reward,707
with stimuli receiving minified feedback errors being perceived as more rewarding than those with708
magnified feedback errors. In modelling this relationship, feedback error was assumed to be exponen-709
tially related to the object’s valueν, with smaller feedback errors corresponding to higher rewards,710
leading to a greater increase in the object’s value. This exponential relationship reflects diminishing711
sensitivity to large feedback errors, such that a wide range of larger errors yields relatively minimal712
and similar rewards, whereas a narrow range of smaller errors results in significantly higher but more713
variable rewards.714
The final component of our model is the estimate of internal confidence (κˆr). While internal715
confidence can be assessed through self-reported or metacognitive measures, our approach leverages716
the inherent mechanism of the Neural resource model to quantify uncertainty in the decoded (i.e.,717
reported) value. Our approach relies on the principle that the width of the likelihood function718
reflects the uncertainty of the estimate. The likelihood function evaluates how well various stimulus719
values align with the observed neural activity: a broad likelihood function is compatible with many720
different feature values, suggesting lower precision in the maximum likelhood estimate (the peak721
of the likelihood function), whereas a narrow likelihood function implies a more precise estimate.722
Due to the probabilistic generation of spikes across retrievals (Eq.4), the likelihood has the form723
of a von Mises with concentrationκˆr proportional to the resultant vector length of the preferred724
values associated with each of the emitted spikes (m), with higher spike counts producing a narrower725
likelihood function on average [51]. This formulation has previously been shown to quantitatively726
reproduce findings from studies in which participants were asked to rate their subjective confidence727
in each estimate [67,71]. Consequently, the precision of the likelihood function emerges as a natural728
candidate for a computational estimate of the observer’s internal confidence.729
20
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
To measure internal confidence associated with each response, we determined the most probable730
resultant vector length given the individual response errors and the probabilistic distribution of spike731
values, which was fully characterized by population parametersγ and κ. Specifically, for each trial,732
we used Bayes rule to find the posterior probability of resultant vector lengthr given the error on733
that trial ϵ, marginalizing over total spike countm,734
p(r|ϵ, κ, γ) = p(ϵ|r, κ)p(r|κ, γ)R
p(ϵ|r, κ)p(r|κ, γ)dr (12)
p(ϵ|r, κ) = ϕ(ϵ; 0, κr) (13)
p(r|κ, γ) =
Z
p(r|m, κ)p(m|γ)dm. (14)
where p(r|m, κ) is given by Eq.7 and p(m|γ) is the Poisson p.m.f. with meanγ. Applying MAP735
estimation to this posterior distribution returns the most probable estimate of resultant length for a736
given response error,737
ˆr = arg max
r
p(r|ϵ, κ, γ). (15)
Finally, we useκˆr = ˆrκ as a measure of internal confidence on the given trial.738
Model fitting739
To model the observed allocation within the Neural resource model [22,51], which has two free740
parameters – the mean population activity (γ) and the precision of the tuning functions (κ) – we741
introduced an additional parameter, the gain modulationα [22], resulting in a total of three free742
parameters in Experiments 1 and 2. In Experiment 3, which involved an estimation difficulty ma-743
nipulation, the Neural Resource model was extended by two additional parameters (σ2
45% and σ2
65%)744
to capture the effects of variable sensory noise introduced by different coherence levels. This brought745
the total number of free parameters in the Neural resource model for Experiment 3 to five.746
The Reinforcement learning account retained all parameters of the Neural resource model except747
the gain modulation parameterα, while introducing four new parameters, namely the leak parameter748
(y)andrewardweightparameters( c1, c2, c3). Inallthreeexperiments, wemodelledtheleakparameter749
(y) and the effect of internal confidence (c3) on resource allocation (see Eq. 9); additionally, we750
modelled the effect of external reward (c1) only in Experiment 1 while setting it to zero in all other751
experiments, and feedback error (c2) only in Experiment 2 while also setting it to zero in all other752
experiments. This resulted in the estimation of five free parameters in Experiments 1 and 2, and six753
in Experiment 3. When fitting the model to the data, the leak parameter was constrained between 0754
and 1, and all three weight parameters were limited to a range of -1 to 1.755
For all models, we obtained a separate maximum likelihood fit for each individual observer.756
These fits were derived using the Nelder-Mead simplex method (via thefminsearch function in MAT-757
LAB). A MATLAB toolbox implementing the Neural resource model is available for download from758
https://bayslab.com/toolbox.759
21
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
Acknowledgment760
We thank David Aagten-Murphy and Robert Taylor, who worked on earlier iterations of this project.761
We thank Neha Abraham, Pepita Alex, Amida Anand, Paul McMeekin, Adam Sabo, Tom Wenban-762
Smith, Adam Zhu, and Adam Triabhall for assisting with data collection. This work was funded by763
the Wellcome Trust (grant 106926 to P.M.B). The funders had no role in study design, data collection764
and analysis, decision to publish or preparation of the manuscript.765
Author contributions766
I.T. contributed to conceptualization, methodology, software, data collection, investigation, formal767
analysis, modelling, visualizations, and writing - original draft and revisions.R.R.R contributed768
to methodology, software, and data collection. P.M.B. contributed to conceptualization, funding769
acquisition, supervision, methodology, formal analysis, modelling, visualizations, and writing - editing770
and revisions.771
Data availability772
Data and analysis code will be made publicly available upon publication of this manuscript.773
References774
[1] Berridge, K. C., Robinson, T. E., and Aldridge, J. W. Dissecting components of reward: ‘liking’,775
‘wanting’, and learning.Current Opinion in Pharmacology9.1 (2009), pp. 65–73.doi: 10.1016/776
j.coph.2008.12.014.777
[2] Berridge, K. C. and Robinson, T. E. Parsing reward. Trends in Neurosciences26.9 (2003),778
pp. 507–513.doi: 10.1016/S0166-2236(03)00233-9.779
[3] Stănişor, L., Van Der Togt, C., Pennartz, C. M. A., and Roelfsema, P. R. A unified selection780
signal for attention and reward in primary visual cortex.Proceedings of the National Academy781
of Sciences110.22 (2013), pp. 9136–9141.doi: 10.1073/pnas.1300117110.782
[4] Schultz, W. Behavioral Theories and the Neurophysiology of Reward.Annual Review of Psy-783
chology 57.1 (2006), pp. 87–115.doi: 10.1146/annurev.psych.56.091103.070229.784
[5] Maunsell, J. H. Neuronal representations of cognitive state: reward or attention? Trends in785
Cognitive Sciences8.6 (2004), pp. 261–265.doi: 10.1016/j.tics.2004.04.003.786
[6] Blain, B. and Sharot, T. Intrinsic reward: potential cognitive and neural mechanisms.Current787
Opinion in Behavioral Sciences39 (2021), pp. 113–118.doi: 10.1016/j.cobeha.2021.03.008.788
[7] Navalpakkam, V., Koch, C., Rangel, A., and Perona, P. Optimal reward harvesting in com-789
plex perceptual environments.Proceedings of the National Academy of Sciences107.11 (2010),790
pp. 5232–5237.doi: 10.1073/pnas.0911972107.791
[8] Lee, J. and Shomstein, S. The Differential Effects of Reward on Space- and Object-Based792
Attentional Allocation.Journal of Neuroscience33.26 (2013), pp. 10625–10633.doi: 10.1523/793
JNEUROSCI.5575-12.2013.794
[9] Della Libera, C. and Chelazzi, L. Visual Selective Attention and the Effects of Monetary Re-795
wards. Psychological Science 17.3 (2006), pp. 222–227.doi: 10.1111/j.1467- 9280.2006.796
01689.x.797
[10] Kristjansson, A., Sigurjonsdottir, O., and Driver, J. Fortune and reversals of fortune in visual798
search: Reward contingencies for pop-out targets affect search efficiency and target repetition799
effects. Attention, Perception & Psychophysics72.5 (2010), pp. 1229–1236.doi: 10.3758/APP.800
72.5.1229.801
[11] Anderson, B. A., Laurent, P. A., and Yantis, S. Value-driven attentional capture.Proceedings802
of the National Academy of Sciences108.25 (2011), pp. 10367–10371.doi: 10 . 1073 / pnas .803
1104047108.804
22
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
[12] Della Libera, C. and Chelazzi, L. Learning to Attend and to Ignore Is a Matter of Gains and805
Losses. Psychological Science 20.6 (2009), pp. 778–784.doi: 10.1111/j.1467- 9280.2009.806
02360.x.807
[13] Peck, C. J., Jangraw, D. C., Suzuki, M., Efem, R., and Gottlieb, J. Reward Modulates Attention808
Independently of Action Value in Posterior Parietal Cortex.The Journal of Neuroscience29.36809
(2009), pp. 11182–11191.doi: 10.1523/JNEUROSCI.1929-09.2009.810
[14] Theeuwes, J. and Belopolsky, A. V. Reward grabs the eye: Oculomotor capture by rewarding811
stimuli. Vision Research74 (2012), pp. 80–85.doi: 10.1016/j.visres.2012.07.024.812
[15] Anderson, B. A. and Kim, H. Mechanisms of value-learning in the guidance of spatial attention.813
Cognition 178 (2018), pp. 26–36.doi: 10.1016/j.cognition.2018.05.005.814
[16] Anderson, B. A. and Kim, H. On the representational nature of value-driven spatial attentional815
biases. Journal of Neurophysiology120.5 (2018), pp. 2654–2658.doi: 10.1152/jn.00489.2018.816
[17] Awh, E., Belopolsky, A. V., and Theeuwes, J. Top-down versus bottom-up attentional control:817
a failed theoretical dichotomy. Trends in Cognitive Sciences16.8 (2012), pp. 437–443.doi:818
10.1016/j.tics.2012.06.010.819
[18] Anderson, B. A., Kim, H., Kim, A. J., Liao, M.-R., Mrkonja, L., Clement, A., and Grégoire, L.820
The past, present, and future of selection history.Neuroscience & Biobehavioral Reviews130821
(2021), pp. 326–350.doi: 10.1016/j.neubiorev.2021.09.004.822
[19] Bays, P. M., Schneegans, S., Ma, W. J., and Brady, T. F. Representation and computation in823
visual working memory.Nature Human Behaviour8.6 (2024), pp. 1016–1034.doi: 10.1038/824
s41562-024-01871-2.825
[20] Bays, P. M., Gorgoraptis, N., Wee, N., Marshall, L., and Husain, M. Temporal dynamics of826
encoding, storage, and reallocation of visual working memory.Journal of Vision11.10 (2011),827
pp. 6–6.doi: 10.1167/11.10.6.828
[21] Gorgoraptis, N., Catalao, R. F. G., Bays, P. M., and Husain, M. Dynamic Updating of Working829
Memory Resources for Visual Objects.Journal of Neuroscience31.23 (2011), pp. 8502–8511.830
doi: 10.1523/JNEUROSCI.0208-11.2011.831
[22] Bays, P. M. Noise in Neural Populations Accounts for Errors in Working Memory. en.Journal832
of Neuroscience34.10 (2014), pp. 3632–3645.doi: 10.1523/JNEUROSCI.3204-13.2014.833
[23] Emrich, S. M., Lockhart, H. A., and Al-Aidroos, N. Attention mediates the flexible allocation834
of visual working memory resources.Journal of Experimental Psychology: Human Perception835
and Performance43.7 (2017), pp. 1454–1465.doi: 10.1037/xhp0000398.836
[24] Sprague, T. C., Itthipuripat, S., Vo, V. A., and Serences, J. T. Dissociable signatures of visual837
salience and behavioral relevance across attentional priority maps in human cortex.Journal of838
Neurophysiology 119.6 (2018), pp. 2153–2165.doi: 10.1152/jn.00059.2018.839
[25] Yoo, A. H., Klyszejko, Z., Curtis, C. E., and Ma, W. J. Strategic allocation of working memory840
resource (2018). doi: 10.1101/329870.841
[26] Taylor, R., Tomić, I., Aagten-Murphy, D., and Bays, P. M. Working memory is updated by842
reallocation of resources from obsolete to new items.Attention, Perception, & Psychophysics843
85.5 (2023), pp. 1437–1451.doi: 10.3758/s13414-022-02584-2.844
[27] Griffin, I. C. and Nobre, A. C. Orienting Attention to Locations in Internal Representations.845
Journal of Cognitive Neuroscience15.8(2003),pp.1176–1194. doi: 10.1162/089892903322598139.846
[28] Oberauer, K. Control of the Contents of Working Memory–A Comparison of Two Paradigms847
and Two Age Groups.Journal of Experimental Psychology: Learning, Memory, and Cognition848
31.4 (2005), pp. 714–728.doi: 10.1037/0278-7393.31.4.714.849
[29] Klyszejko, Z., Rahmati, M., and Curtis, C. E. Attentional priority determines working memory850
precision. Vision Research105 (2014), pp. 70–76.doi: 10.1016/j.visres.2014.09.002.851
[30] Atkinson, A. L., Oberauer, K., Allen, R. J., and Souza, A. S. Why does the probe value effect852
emerge in working memory? Examining the biased attentional refreshing account.Psychonomic853
Bulletin & Review29.3 (2022), pp. 891–900.doi: 10.3758/s13423-022-02056-6.854
[31] Allen, R. J., Atkinson, A., and Hitch, G. J. Getting value out of working memory through855
strategic prioritisation; implications for storage and control.Quarterly Journal of Experimental856
Psychology (2024), p. 17470218241258102.doi: 10.1177/17470218241258102.857
23
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
[32] Gong, M. and Li, S. Learned reward association improves visual working memory.Journal of858
Experimental Psychology: Human Perception and Performance40.2 (2014), pp. 841–856.doi:859
10.1037/a0035131.860
[33] Brissenden, J. A., Adkins, T. J., Hsu, Y. T., and Lee, T. G. Reward influences the allocation but861
not the availability of resources in visual working memory.Journal of Experimental Psychology:862
General (2023). doi: 10.1037/xge0001370.863
[34] Van Den Berg, R., Zou, Q., Li, Y., and Ma, W. J. No effect of monetary reward in a visual864
working memory task. PLOS ONE 18.1 (2023), e0280257. doi: 10 . 1371 / journal . pone .865
0280257.866
[35] Atkinson, A. L., Berry, E. D., Waterman, A. H., Baddeley, A. D., Hitch, G. J., and Allen,867
R. J. Are there multiple ways to direct attention in working memory?Annals of the New York868
Academy of Sciences1424.1 (2018), pp. 115–126.doi: 10.1111/nyas.13634.869
[36] Gazzaley, A. and Nobre, A. C. Top-down modulation: bridging selective attention and working870
memory. Trends in Cognitive Sciences16.2 (2012), pp. 129–135.doi: 10.1016/j.tics.2011.871
11.014.872
[37] Awh, E., Vogel, E., and Oh, S.-H. Interactions between attention and working memory.Neuro-873
science 139.1 (2006), pp. 201–208.doi: 10.1016/j.neuroscience.2005.08.023.874
[38] Wolf, D. H., Gerraty, R., Satterthwaite, T. D., Loughead, J., Campellone, T., Elliott, M. A.,875
Turetsky, B. I., Gur, R. C., and Gur, R. E. Striatal intrinsic reinforcement signals during876
recognition memory: relationship to response bias and dysregulation in schizophrenia.Frontiers877
in Behavioral Neuroscience5 (2011), p. 81.doi: 10.3389/fnbeh.2011.00081.878
[39] Schultz, W., Dayan, P., and Montague, P. R. A Neural Substrate of Prediction and Reward.879
Science 275.5306 (1997), pp. 1593–1599.doi: 10.1126/science.275.5306.1593.880
[40] Knutson, B., Fong, G. W., Adams, C. M., Varner, J. L., and Hommer, D. Dissociation of reward881
anticipation and outcome with event-related fMRI:Neuroreport 12.17 (2001), pp. 3683–3687.882
doi: 10.1097/00001756-200112040-00016.883
[41] Elliott, R., Friston, K. J., and Dolan, R. J. Dissociable Neural Responses in Human Reward884
Systems. The Journal of Neuroscience20.16 (2000), pp. 6159–6165.doi: 10.1523/JNEUROSCI.885
20-16-06159.2000.886
[42] De Martino, B., Kumaran, D., Holt, B., and Dolan, R. J. The Neurobiology of Reference-887
Dependent Value Computation.The Journal of Neuroscience29.12 (2009), pp. 3833–3842.doi:888
10.1523/JNEUROSCI.4832-08.2009.889
[43] Han, S., Huettel, S. A., Raposo, A., Adcock, R. A., and Dobbins, I. G. Functional Significance890
of Striatal Responses during Episodic Decisions: Recovery or Goal Attainment?The Journal of891
Neuroscience 30.13 (2010), pp. 4767–4775.doi: 10.1523/JNEUROSCI.3077-09.2010.892
[44] Satterthwaite, T. D., Ruparel, K., Loughead, J., Elliott, M. A., Gerraty, R. T., Calkins, M. E.,893
Hakonarson, H., Gur, R. C., Gur, R. E., and Wolf, D. H. Being right is its own reward: Load and894
performance related ventral striatum activation to correct responses during a working memory895
task in youth.NeuroImage 61.3 (2012), pp. 723–729.doi: 10.1016/j.neuroimage.2012.03.896
060.897
[45] Daniel, R. and Pollmann, S. Striatal activations signal prediction errors on confidence in the898
absence of external feedback. NeuroImage 59.4 (2012), pp. 3457–3467.doi: 10 . 1016 / j .899
neuroimage.2011.11.058.900
[46] Hebart, M. N., Schriever, Y., Donner, T. H., and Haynes, J.-D. The Relationship between901
Perceptual Decision Variables and Confidence in the Human Brain.Cerebral Cortex26.1 (2016),902
pp. 118–130.doi: 10.1093/cercor/bhu181.903
[47] Schwarze, U., Bingel, U., Badre, D., and Sommer, T. Ventral Striatal Activity Correlates with904
Memory Confidence for Old- and New-Responses in a Difficult Recognition Test.PLoS ONE905
8.3 (2013), e54324.doi: 10.1371/journal.pone.0054324.906
[48] Guggenmos, M., Wilbertz, G., Hebart, M. N., and Sterzer, P. Mesolimbic confidence signals907
guide perceptual learning in the absence of external feedback.eLife 5 (2016), e13388. doi:908
10.7554/eLife.13388.909
24
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
[49] Prinzmetal, W., Amiri, H., Allen, K., and Edwards, T. Phenomenology of attention: I. Color,910
location, orientation, and spatial frequency. en.Journal of Experimental Psychology: Human911
Perception and Performance24.1 (1998), pp. 261–282.doi: 10.1037/0096-1523.24.1.261.912
[50] Tomić, I., Adamcová, D., Fehér, M., and Bays, P. M. Dissecting the components of error in913
analogue report tasks.Behavior Research Methods(2024). doi: 10.3758/s13428-024-02453-914
w.915
[51] Schneegans, S., Taylor, R., and Bays, P. M. Stochastic sampling provides a unifying account916
of visual working memory limits. en.Proceedings of the National Academy of Sciences(2020),917
p. 202004306. doi: 10.1073/pnas.2004306117.918
[52] Kiani, R., Corthell, L., and Shadlen, M. N. Choice Certainty Is Informed by Both Evidence and919
Decision Time.Neuron 84.6 (2014), pp. 1329–1342.doi: 10.1016/j.neuron.2014.12.015.920
[53] Fleming, S. M. Metacognition and Confidence: A Review and Synthesis. Annual Review of921
Psychology 75.1 (2024), pp. 241–268.doi: 10.1146/annurev-psych-022423-032425.922
[54] Chetverikov, A. and Jehee, J. F. M. Motion direction is represented as a bimodal probability923
distribution in the human visual cortex.Nature Communications 14.1 (2023), p. 7634. doi:924
10.1038/s41467-023-43251-w.925
[55] Kwak, Y. and Curtis, C. E. Unveiling the abstract format of mnemonic representations.Neuron926
110.11 (2022), 1822–1828.e5.doi: 10.1016/j.neuron.2022.03.016.927
[56] Shadmehr, R., Reppert, T. R., Summerside, E. M., Yoon, T., and Ahmed, A. A. Movement928
Vigor as a Reflection of Subjective Economic Utility.Trends in Neurosciences 42.5 (2019),929
pp. 323–336.doi: 10.1016/j.tins.2019.02.003.930
[57] Summerside, E. M., Shadmehr, R., and Ahmed, A. A. Vigor of reaching movements: reward931
discounts the cost of effort.Journal of Neurophysiology119.6 (2018), pp. 2347–2357.doi: 10.932
1152/jn.00872.2017.933
[58] Manohar, S. G., Finzi, R. D., Drew, D., and Husain, M. Distinct Motivational Effects of Con-934
tingent and Noncontingent Rewards.Psychological Science 28.7 (2017), pp. 1016–1026.doi:935
10.1177/0956797617693326.936
[59] Berridge, K. C. and Robinson, T. E. What is the role of dopamine in reward: hedonic impact,937
reward learning, or incentive salience?Brain Research Reviews28.3 (1998), pp. 309–369.doi:938
10.1016/S0165-0173(98)00019-8.939
[60] Yoo, A. H. and Collins, A. G. E. How Working Memory and Reinforcement Learning Are Inter-940
twined:ACognitive,Neural,andComputationalPerspective. Journal of Cognitive Neuroscience941
34.4 (2022), pp. 551–568.doi: 10.1162/jocn_a_01808.942
[61] Serences,J.T.Value-BasedModulationsinHumanVisualCortex. Neuron60.6(2008),pp.1169–943
1181. doi: 10.1016/j.neuron.2008.10.051.944
[62] Ashby, F. G. and Maddox, W. T. Human Category Learning.Annual Review of Psychology945
56.1 (2005), pp. 149–178.doi: 10.1146/annurev.psych.56.091103.070217.946
[63] Hattie, J. and Timperley, H. The Power of Feedback. Review of Educational Research77.1947
(2007), pp. 81–112.doi: 10.3102/003465430298487.948
[64] Haddara, N. and Rahnev, D. The Impact of Feedback on Perceptual Decision-Making and949
Metacognition: Reduction in Bias but No Change in Sensitivity.Psychological Science 33.2950
(2022), pp. 259–275.doi: 10.1177/09567976211032887.951
[65] Rouault, M. and Fleming, S. M. Formation of global self-beliefs in the human brain.Proceedings952
of the National Academy of Sciences117.44 (2020), pp. 27268–27276.doi: 10 . 1073 / pnas .953
2003094117.954
[66] Bröker, F., Holt, L. L., Roads, B. D., Dayan, P., and Love, B. C. Demystifying unsupervised955
learning: how it helps and hurts.Trends in Cognitive Sciences28.11 (2024), pp. 974–986.doi:956
10.1016/j.tics.2024.09.005.957
[67] Berg, R. van den, Yoo, A. H., and Ma, W. J. Fechner’s law in metacognition: A quantitative958
model of visual working memory confidence.Psychological Review124.2 (2017), pp. 197–214.959
doi: 10.1037/rev0000060.960
25
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
[68] Li, H.-H., Sprague, T. C., Yoo, A. H., Ma, W. J., and Curtis, C. E. Joint representation of961
working memory and uncertainty in human cortex.Neuron 109.22 (2021), 3699–3712.e6.doi:962
10.1016/j.neuron.2021.08.022.963
[69] Ma, W. J., Beck, J. M., Latham, P. E., and Pouget, A. Bayesian inference with probabilistic964
population codes.Nature Neuroscience9.11 (2006), pp. 1432–1438.doi: 10.1038/nn1790.965
[70] Pouget, A., Dayan, P., and Zemel, R. S. Inference and computation with population codes.966
Annual Review of Neuroscience26 (2003), pp. 381–410.doi: 10.1146/annurev.neuro.26.967
041002.131112.968
[71] Bays, P. M. A signature of neural coding at human perceptual limits.Journal of Vision16.11969
(2016), p. 4.doi: 10.1167/16.11.4.970
[72] Schneegans, S. and Bays, P. M. Drift in Neural Population Activity Causes Working Memory971
to Deteriorate Over Time. The Journal of Neuroscience 38.21 (2018), pp. 4859–4869.doi:972
10.1523/JNEUROSCI.3440-17.2018.973
[73] Schaffner, J., Bao, S. D., Tobler, P. N., Hare, T. A., and Polania, R. Sensory perception relies974
on fitness-maximizing codes.Nature Human Behaviour (2023). doi: 10.1038/s41562- 023-975
01584-y.976
[74] Shuler, M. G. and Bear, M. F. Reward Timing in the Primary Visual Cortex.Science 311.5767977
(2006), pp. 1606–1609.doi: 10.1126/science.1123513.978
[75] Badre, D. Cognitive Control. Annual Review of Psychology(2024). doi: 10.1146/annurev-979
psych-022024-103901.980
[76] Inzlicht, M., Shenhav, A., and Olivola, C. Y. The Effort Paradox: Effort Is Both Costly and981
Valued. Trends in Cognitive Sciences22.4 (2018), pp. 337–349.doi: 10.1016/j.tics.2018.982
01.007.983
[77] Westbrook, A. and Braver, T. S. Cognitive effort: A neuroeconomic approach.Cognitive, Affec-984
tive, & Behavioral Neuroscience15.2 (2015), pp. 395–415.doi: 10.3758/s13415-015-0334-y.985
[78] Shenhav, A., Musslick, S., Lieder, F., Kool, W., Griffiths, T. L., Cohen, J. D., and Botvinick,986
M. M. Toward a Rational and Mechanistic Account of Mental Effort.Annual Review of Neuro-987
science 40.1 (2017), pp. 99–124.doi: 10.1146/annurev-neuro-072116-031526.988
[79] Kool,W.,McGuire,J.T.,Rosen,Z.B.,andBotvinick,M.M.Decisionmakingandtheavoidance989
of cognitive demand.Journal of Experimental Psychology: General139.4 (2010), pp. 665–682.990
doi: 10.1037/a0020198.991
[80] Corlazzoli, G., Desender, K., and Gevers, W. Feeling and deciding: Subjective experiences rather992
than objective factors drive the decision to invest cognitive control. Cognition 240 (2023),993
p. 105587. doi: 10.1016/j.cognition.2023.105587.994
[81] Kool, W. and Botvinick, M. The intrinsic cost of cognitive control.Behavioral and Brain Sci-995
ences 36.6 (2013), pp. 697–698.doi: 10.1017/S0140525X1300109X.996
[82] Botvinick, M. M., Huffstetler, S., and McGuire, J. T. Effort discounting in human nucleus997
accumbens. Cognitive, Affective, & Behavioral Neuroscience9.1 (2009), pp. 16–27.doi: 10.998
3758/CABN.9.1.16.999
[83] Brainard, D. H. The Psychophysics Toolbox. Spatial Vision 10.4 (1997), pp. 433–436.doi:1000
https://doi.org/10.1163/156856897X00357.1001
[84] Pelli, D. G. The VideoToolbox software for visual psychophysics: transforming numbers into1002
movies. Spatial Vision10.4 (1997), pp. 437–442.1003
[85] Scase, M. O., Braddick, O. J., and Raymond, J. E. What is Noise for the Motion System?1004
Vision Research36.16 (1996), pp. 2579–2586.doi: 10.1016/0042-6989(95)00325-8.1005
[86] JASP Team. JASP (Version 0.18.3)[Computer software]. 2024.1006
[87] Liang, F., Paulo, R., Molina, G., Clyde, M. A., and Berger, J. O. Mixtures of g Priors for1007
BayesianVariableSelection. Journal of the American Statistical Association103(2008),pp.410–1008
423. doi: 10.1198/016214507000001337.1009
[88] Wagenmakers, E.-J. et al. Bayesian inference for psychology. Part II: Example applications with1010
JASP. Psychonomic Bulletin & Review25.1 (2018), pp. 58–76.doi: 10.3758/s13423- 017-1011
1323-7.1012
26
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
[89] Lee, M. D. and Wagenmakers, E.-J. Bayesian cognitive modeling: a practical course. Cambridge1013
; New York: Cambridge University Press, 2013. 264 pp.1014
[90] Tomić, I. and Bays, P. M. Perceptual similarity judgments do not predict the distribution1015
of errors in working memory. Journal of Experimental Psychology: Learning, Memory, and1016
Cognition 50.4 (2024), pp. 535–549.doi: 10.1037/xlm0001172.1017
[91] Tomić, I. and Bays, P. M. A dynamic neural resource model bridges sensory and working1018
memory. eLife 12 (2024), RP91034.doi: 10.7554/eLife.91034.3.1019
[92] Carandini, M. and Heeger, D. Normalization as a canonical neural computation.Nature Reviews1020
Neuroscience 13.1 (2012), pp. 51–62.doi: 10.1038/nrn3136.1021
[93] Tomić, I. and Bays, P. M. Internal but not external noise frees working memory resources.1022
PLOS Computational Biology14.10 (2018), e1006488.doi: 10.1371/journal.pcbi.1006488.1023
27
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
Supplementary Information1024
Psychophysical data1025
Experiment 2b: Sequential presentation1026
A BMagni/f_ied feedback errorMini/f_ied feedback error
0
0.5
1
1.5
0
0.5
1
1.5
Density
Response error Response error
0 - 0 -
0
0.2
0.4
0.6
0.8
1
1.2
1.4MAD
Magni/f_iedMini/f_ied
Feedback error
Figure S1: Perceived accuracy manipulation in Experiment 2b (sequential presentation). A) His-
tograms represent distributions of response errors. B) Mean absolute deviation of response errors.
The coloured circles with error bars represent the mean± SE.
Experiment 3b: Sequential presentation1027
High Low
Coherence
Inter. (High) Inter. (Low)
Coherence
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6MAD
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0
0.5
1
1.5
0
0.5
1
1.5
0
0.5
1
1.5
A
Density
High coherence colour
Density
Low coherence colour
B
Response error
0 -
Response error
0-
C
0 - 0-
0
0.5
1
1.5
Figure S2: Estimation difficulty manipulation in Experiment 3b (sequential presentation). A & B)
Histograms represent distributions of response errors. Panel A depicts variable coherence trials, and
panel B depicts equal coherence trials. C) Mean absolute deviation of response errors. The coloured
circles with error bars represent the mean± SE.
Reinforcement learning account1028
External reward1029
The average trajectory of resource allocation shown in Figure 5A is based on ML parameter estimates1030
(mean ± SE): mean activityγ = 2.88± 0.39; tuning precisionκ = 10.29± 0.98; leaky = 0.28± 0.06;1031
reward weightc1 = 0.31±0.14; internal confidence weightc3 = 0.012 ± 0.01. Calculating the corre-1032
lation between parameter estimates from the Reinforcement learning account and the neural model1033
with freely estimated resource allocation, we found highly consistent estimates of the population’s1034
mean spiking activity (r = 0.997, 95% CI = [0.992, 0.999],BF10 = 1.46 × 1028) and tuning precision1035
(r = 0.971, 95% CI = [0.929, 0.986],BF10 = 3.16 × 1015) (Fig.S4A).1036
28
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
Finally, to assess whether observers prioritised external rewards or internal confidence signals in1037
resource allocation, we calculated each observer’s mean contribution of external rewards and internal1038
confidence to the relative value of the two objects. Results indicated moderate evidence for no1039
difference between their contributions (BF10 = 0.22; δ = 0.076, 95% CI = [-0.263, 0.418]). However,1040
this finding should be interpreted with caution, as the effect of internal confidence may partly reflect1041
observers’ resource allocation favouring the high-reward item (reflecting the influence of external1042
reward), which subsequently enhances confidence for that item.1043
Intrinsic reward: Perceived accuracy1044
The mean trajectory of resource allocation across trials shown in Figure 5D is based on ML parameter1045
estimates (mean ± SE): mean activityγ = 5.01 ± 0.78; tuning precisionκ = 6.64 ± 0.51; leak y =1046
0.538 ± 0.076; feedback weight c2 = 0.238± 0.108; internal confidence weightc3 = 0.007 ± 0.044.1047
Comparing the estimates derived from the Neural resource model and the Reinforcement learning1048
account (Fig. S4B), we again found that the RL account’s estimates closely match the population’s1049
mean spiking activity (r = 0.987, 95% CI = [0.964, 0.994],BF10 = 1.26 × 1016) and tuning precision1050
(r = 0.956, 95% CI = [0.884, 0.980],BF10 = 4.7 × 1010).1051
Finally, we found moderate evidence for no difference between feedback and internal confidence1052
signals in their contribution to the relative value of objects (BF10 = 0.33; δ = 0.174, 95% CI = [-0.196,1053
0.553]).1054
Intrinsic reward: Estimation difficulty1055
Fitting the Reinforcement learning account to psychophysical data from Experiment 3a, we obtained1056
the following ML parameters (mean± SE): mean activity γ = 3.33 ± 0.39; tuning precision κ =1057
11.53 ± 0.61; leak y = 0.398 ± 0.086; confidence weightc3 = 0.062 ± 0.045; intermediate perceptual1058
noise SD65% = 0.143 ± 0.021; high perceptual noise SD45% = 0.338 ± 0.086. In Experiment 3b we1059
observed very similar estimates: mean activityγ = 2.13 ± 0.31; tuning precisionκ = 13.03 ± 1.70;1060
leak y = 0.285 ± 0.083; confidence weightc3 = 0.007 ± 0.004; intermediate perceptual noise SD65%1061
= 0.090 ± 0.021; high perceptual noise SD45% = 0.249 ± 0.056. Again, we visualised the obtained1062
individual trajectories in example participants (Fig.S3C & D).1063
In both experiments, estimates obtained with the Neural resource model and the Reinforcement1064
learning account strongly covaried (Fig.S4C & D). Specifically, we found highly consistent estimates1065
of the population’s mean spiking activity (Exp 3a:r = 0.995, 95% CI = [0.986, 0.998], BF10 =1066
6.15 × 1017; Exp 3b: r = 0.999, 95% CI = [0.996, 1.000],BF10 = 1.66 × 1019), tuning precision (Exp1067
3a: r = 0.912, 95% CI = [0.761, 0.961],BF10 = 2.84 × 106; Exp 3b: r = 0.970, 95% CI = [0.899,1068
0.989], BF10 = 5.66 × 108), intermediate perceptual noise (Exp 3a: r = 0.922, 95% CI = [0.785,1069
0.966], BF10 = 8.00 × 106; Exp 3b: r = 0.985, 95% CI = [0.947, 0.994],BF10 = 8.17 × 1010), and1070
high perceptual noise (Exp 3a:r = 0.659, 95% CI = [0.293, 0.830],BF10 = 48.1; Exp 3b: r = 0.896,1071
95% CI = [0.696, 0.957],BF10 = 6.80 × 105).1072
29
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
A
B
Resource fractionResource fraction
C
D
Resource fractionResource fraction
External reward (Exp 1)Perceived accuracy (Exp 2a)Estimation difficulty (Exp 3a)Estimation difficulty (Exp 3b)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 Observer 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 Observer 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
12 04 06 08 0 100
Trial number
12 04 06 08 0 100
Trial number
12 04 06 08 0 100
Trial number
Observer 3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
12 04 06 08 0 100
Trial number
12 04 06 08 0 100
Trial number
12 04 06 08 0 100
Trial number
Observer 1 Observer 2 Observer 3
15 0 100 150
Trial number
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Observer 1 Observer 2 Observer 3
15 0 100 150
Trial number
15 0 100 150
Trial number
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Observer 1 Observer 2 Observer 3
15 0 100 150 200
Trial number
15 0 100 150 200
Trial number
15 0 100 150 200
Trial number
Figure S3: A) Trial-by-trial resource allocation estimated by the RL account in the external reward
experiment (Experiment 1) for three illustrative participants. Circles represent the fraction of re-
sources allocated to the preferred item on each trial. B) Perceived accuracy experiment (Experiment
2). C) Estimation difficulty experiment (Experiment 3a). D) Estimation difficulty experiment (Ex-
periment 3b).
30
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint
05 10 15
0
5
10
15
Spiking activity
Neural resource model
RL account
05 10 15 20 25
0
5
10
15
20
25
Neural resource model
Tuning precision
05 10 15 20 25
Neural resource model
0
5
10
15
20
25
05 10 15
Neural resource model
0
5
10
15 RL account
A B C D
RL account
RL account
05 10 15 20 25
0
5
10
15
20
25
05 10 15
0
5
10
15
Neural resource model
Neural resource model
RL account RL account
05 10 15 20 25
0
5
10
15
20
25
05 10 15
0
5
10
15
Neural resource model
Neural resource model
RL account RL account
FigureS4: Correlationbetweenmeanactivity(toprow)andtuningprecision(bottomrow)parameters
estimated in the Neural resource model and the RL account of resource allocation. A) External reward
experiment (Experiment 1). B) Perceived accuracy experiment (Experiment 2). C) Estimation
difficulty experiment (Experiment 3a). D) Estimation difficulty experiment (Experiment 3b).
31
.CC-BY 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted April 27, 2025. ; https://doi.org/10.1101/2025.04.25.650663doi: bioRxiv preprint