The authors first trained monkeys that reward was delivered after each of two different sequences of three stimuli (see picture below) as long as the correct saccades were made in between the stimuli. (Saccadic eye movements are a common choice action in these experiments - they don't require head movements, can easily be tracked, and some important early work in the field related economic values of targets to the activity of neurons involved in making the different saccades.)

The behavioral result is cool if not surprising (it's old news in animal learning). The monkeys 'could' (according to a certain simplistic behaviourism they would) initially have showed no relative preference in the final task, given that the first stimulus in each sequence had not been directly associated with different relative reward at the end of the sequence. (The magnitudes of the rewards in the first training stage had been equal for both sequences.) And there's more - the authors recorded large numbers of neurons in the lateral prefrontal cortex (LPFC) and found that some were preferentially active for rewards, some for stimuli, and some for stimulus-reward interactions.
But the title of the paper, the abstract, and remarks dotted around in it, suggest that the authors think this has something to do with the question of whether "LPFC predicts reward by means of temporal-difference learning or by a model-based method", and to have something to do with "categorization" or "category members that have not been linked directly with any experience of reward."
This seems to suppose a conflict that isn't necessary. As Montague and Berns (2002) showed predictor-valuation (or temporal difference) models cope just fine with sequences of predictors where reward only follows the final predictor (their sequence had two stimuli with varying timing of the second stimulus and the reward). Predictors are predictors of the values of states, and associations establish informational links between states. So making a state have a higher value should raise the values of predictors of ways of getting to those states. This new paper sheds interesting light on how the brain handles predictor valuation in cases where predictors are chained together, but it isn't reason to think that we've found something that temporal difference learning can't deal with.
And 'categorization' refers to a large and complex literature in cognitive science. It would be better not to invoke it imprecisely, or as though all that was needed to make things clear was a bald contrast with "simple associative theories".
2 comments:
Very interesting, if a bit opaque... You should register for BPR3 and put the icon on posts like these: http://bpr3.org/.
Oh, and write something about pseudoscience and submit it (to me) for the upcoming Skeptics' circle (I'm hosting).
Yeah, I admit I didn't really hit this one on the head. I wanted to get a Science related post up so that the mix was more representative, and also for my BPR3 application (pending). I hope to do a better follow up on this one, though - I also want to see who else blogs it.
Post a Comment