A new paper in Nature Neuroscience adds something genuinely interesting to our growing understanding of neuroeconomics, but I've got reservations about how the authors describe the significance of their own work. The paper is called Reward prediction based on stimulus categorization in primate lateral prefrontal cortex.
The authors first trained monkeys that reward was delivered after each of two different sequences of three stimuli (see picture below) as long as the correct saccades were made in between the stimuli. (Saccadic eye movements are a common choice action in these experiments - they don't require head movements, can easily be tracked, and some important early work in the field related economic values of targets to the activity of neurons involved in making the different saccades.)
In the main experiment, following the above learning, monkeys first learned that one of the two final stimuli from the initial task was associated with a larger reward than the other, and then offered a novel choice between the first member of the two sequences. They mostly 'correctly' opted for the first stimulus of the sequence that ended with the stimulus that was now associated with relatively greater reward.
The behavioral result is cool if not surprising (it's old news in animal learning). The monkeys 'could' (according to a certain simplistic behaviourism they would) initially have showed no relative preference in the final task, given that the first stimulus in each sequence had not been directly associated with different relative reward at the end of the sequence. (The magnitudes of the rewards in the first training stage had been equal for both sequences.) And there's more - the authors recorded large numbers of neurons in the lateral prefrontal cortex (LPFC) and found that some were preferentially active for rewards, some for stimuli, and some for stimulus-reward interactions.
But the title of the paper, the abstract, and remarks dotted around in it, suggest that the authors think this has something to do with the question of whether "LPFC predicts reward by means of temporal-difference learning or by a model-based method", and to have something to do with "categorization" or "category members that have not been linked directly with any experience of reward."
This seems to suppose a conflict that isn't necessary. As Montague and Berns (2002) showed predictor-valuation (or temporal difference) models cope just fine with sequences of predictors where reward only follows the final predictor (their sequence had two stimuli with varying timing of the second stimulus and the reward). Predictors are predictors of the values of states, and associations establish informational links between states. So making a state have a higher value should raise the values of predictors of ways of getting to those states. This new paper sheds interesting light on how the brain handles predictor valuation in cases where predictors are chained together, but it isn't reason to think that we've found something that temporal difference learning can't deal with.
And 'categorization' refers to a large and complex literature in cognitive science. It would be better not to invoke it imprecisely, or as though all that was needed to make things clear was a bald contrast with "simple associative theories".