[Table of Contents]

Functional components of the Hebbian Recurrent Network
as applied to Recognition Memory

Simon Dennis
Department of Psychology
University of Queensland
mav@psy.uq.edu.au

Target Paper: Dennis, S. (1996). The Effect of the Environment on Memory: A Connectionist Model, Noetica, 2(2), http://psy.uq.edu.au/CogPsych/Noetica. [html].

Introduction:

The cognitive task of the model

In this case study, the focus will be the Hebbian Recurrent Network as applied to the episodic recognition memory paradigm. In this paradigm, a list of words is presented to the subjects during a study phase. Subsequently, a series of words are shown, some of which occurred in the study list and some which did not. Subjects are asked whether they have seen each item within the study context. Human subjects are surprisingly accurate on these sorts of tasks. For instance, Shepard (1967) gave subjects a study list of 540 words and found that they were able to recognise 88% in a subsequent forced choice test.

The data and input/output task

Table 1 outlines how the task was mapped onto the network. On the input, each of the study items was presented one at a time. These were followed by a single probe item and then by a "Pause" signal. On the output, the network was required to respond with the "Blank" symbol on all occasions except during the answer phase at which time the network was required to respond with either "Yes" or "No" depending on whether the probe pattern occurred in the study list. Local encodings were used throughout and the non zero entries were set to 0.7 (to avoid the low gradient tails of the tanh function).

Table 1: An example input/output sequence for the episodic recognition task.

First Study Item Second Study Item Third Study Item Probe Answer
Input A B C B Pause
Targets Blank Blank Blank Blank Yes

The network

Figure 1: The Hebbian Recurrent Network (HRN). The solid arrows are sets of weights that are modified using the backpropagation algorithm. The dashed line represents the feeding of hidden unit activations through a set of Hebbian weights to the context units in preparation for the next timestep. The Hebbian weights are updated after the activations are fed through. In addition, the teacher signal included the input pattern to ensure that the hidden unit states corresponding to different inputs were separated.

The architecture of the HRN is similar to that of the Simple Recurrent Network (SRN, see case study 1), in that the input and context layers are completely connected to the hidden layer and the hidden layer is completely connected to the output layer (see Figure 1). In contrast to the SRN, which copies the contents of the previous hidden layer to the context layer, the hidden layer of the HRN is connected to the context layer by a set of Hebbian weights. It is these weights that form the memory of the system. Each incoming word is encoded on the hidden units. This hidden encoding is stored in the Hebbian weights and any subsequent input pattern that is similar will retrieve this stored trace into the context layer.

Describe how the input/output task addresses the cognitive task

The input/output task is designed to be a direct analog of the recognition task. The study vectors represent study words and test vectors represent test words. Only sequential structure is preserved as a new input pattern is presented at each timestep. The instructions describing how to complete the task, which are critical in subjects performance, are embodied in the outputs provided in the training set and are stored within the network in the backpropagation weights.

Memory:

The information to be stored

In order to complete the recognition task, the network must retain the identity of all of the study items. The order in which the items appear is irrelevant to the decision.

The mechanisms of storage

There are three mechanisms of storage in the HRN:

  1. The activation values of the context units retain the results of the last memory retrieval to be used in making a recognition decision when the "Pause" vector is input. These activations are an analog of Short Term Memory (ranging up to a few minutes).
  2. The Hebbian weights retain the information about which items were present on the study list. They are an analog of Intermediate Term Memory (ranging up to a week) and are intended as a partial functional counterpart of the hippocampal formation.
  3. The Backpropagation weights retain information about the representations of items, the decision criteria and the control functions of memory. They are an analog of Long Term Memory (spanning a lifetime) and are intended to play some of the role of the neocortex.

Describe how the mechanisms achieve the memory

At each timestep, the input pattern and context pattern are combined to form the hidden unit representation. Provided the hidden unit patterns are close to orthogonal there will be little input from the context units so the the hidden patterns will encode the item being presented. The hidden unit pattern is then stored in the Hebbian weights (by autoassociating) thus retaining a record of the occurrence of that item. At test either a target item (one that was presented during study) or a distractor is presented. A target item will invoke a similar pattern to its study pattern on the hidden units. When this pattern is run through the Hebbian weights it will contact the previously stored trace. Consequently, a pattern will be formed on the context units. These context activations persist (short term memory) until the next timestep when a decision is made. Note that if there was no item recorded in the Hebbian weights then no pattern is formed on the context units. The Backpropagation weights which have been trained over a series of such trials learn to distinguish between "no pattern" and a pattern representing an item. They map this distinction onto a "No" or "Yes" response respectively.

Time:

As is the case with the SRN, the HRN considers only time as sequence, disregarding the absolute durations of events. Study items are input one at a time followed by a single test item and a pause signal indicating the time to respond. This sequence assumption is made to simplify conceptualisation and simulation, but is inadequate for a number of reasons. One of the most obvious of these reasons in the context of human memory is that strengthening of items can only be accomplished by repeating the item. In the laboratory setting it is possible to extend the period of time that certain items are presented without repeating the item. Furthermore, the data show that a long presentation of an item is not equivalent to several smaller presentations of the same object which add to the same total time. Typically, distributing practise in this way improves performance (Greene, 1992). In the long term, the HRN should be expressible as a set of continuous differential equations possibly including components displaying oscillatory behaviour to better account for the effects of absolute time.

Change:

The most important contribution of the HRN is the way in which it deals with long term change. Through the backpropagation algorithm the weights are optimised to perform the functions required of the system. Unlike other models of human memory in which the representations, decision functions and control processes of memory are either hard wired or are considered to fall outside the scope of the model, the HRN provides a specific mechanism for how these components arise. Consequently, the HRN can start to address learning to learn phenomena and most importantly the way in which the environment of memory impacts on the mechanism.

As an example I will outline briefly how the HRN accounts for the low frequency advantage in recognition memory (Dennis, 1993).

In recognition memory (as opposed to other lexical processing tasks such as lexical decision and cued and free recall), words that occur with low frequency in the language are better recognised than their high frequency counterparts. Through the analysis of several online corpora Dennis (1993) discovered that while high frequency words occur more often in the entire corpus, in general, low frequency words are more likely to recur within a given context. That is, once a word has occurred in the context of a newspaper article for instance, it is more likely that it will occur again in that context if the word is of low frequency. The reason is that low frequency words tend to be more context specific and tend to cluster more. Brightfield (1995) has shown that when word frequency is controlled, the word density (amount of cluster) impacts substantially on recognition performance.

The HRN accounts for this pattern of results by assuming that the backpropagation weights are altered so as to optimise performance in the the environment. In simulations of the word frequency effect (Dennis, 1993), a training set was constructed that mirrored the pattern of environmental statistics. Input patterns corresponding to high frequency words were presented more often, but if a pattern corresponding to a low frequency word was presented in a study phase it was more likely to be tested. When the HRN was subsequently, tested on an unbiased test set, performance was best for the low frequency words.

Apriori there was nothing in the mechanism which would cause it to respond differently to low and high frequency words. As training progressed, however, the decision functions implemented by the backpropagation weights in the HRN had adapted to the environment in which they were emersed.

Structure:

The primary structure to be encoded by the HRN is a list of words. In other recurrent architectures such as the SRN, the only short term data structure is the vector of hidden unit activations. Consequently, the entire list structure must be retained within this vector if the network is to perform well on the recognition task. Learning such an encoding is difficult and requires a substantial number of examples (Phillips, 1991). One of the design principles of the HRN is that it stores items in the Hebbian weights. Consequently, it need only learn to represent each of the items in the vocabulary and does not have to acquire a list representation.

Figures two and three demonstrate this point using Hierarchical Cluster Analysis (HCA). HCA creates a cluster tree in which vectors that are similar occupy adjacent leaves of the tree. Figure two shows the HCA of the hidden unit space after a three item study list has been input. At this point the hidden unit pattern must represent the entire list. The patterns are labelled according to whether they contained the "a" pattern. Furthermore, if the "a" pattern was in the final position the patterns are labelled with a three. You can see from the diagram that patterns corresponding to lists containing an "a" are grouped in the one area of hidden unit space.

Figure 2: Hierarchical Cluster Analysis (HCA) of the hidden unit patterns of the SRN after the study list has been input. Note that the sequences that contain an "a" are cluster together. Such an organisation occurs since the SRN must respond with an "a" regardless of which position the "a" occurred in.

Contrast the HCA for the SRN with the HCA for the HRN (Figure 3). The HRN encodes only the items not the list structure on the hidden units. Consequently, the third position "a" patterns are well separated from everything else, but the other "a" patterns are distributed throughout hidden unit space. The information about the occurrence of these items is being stored in the Hebbian weights.

Figure 3: Hierarchical Cluster Analysis (HCA) of the hidden unit patterns of the HRN after the study list has been input. In contrast to the cluster analysis of the SRN, the HRN clusters only the third position "a"s well. The first and second position "a"s are mixed in with the non "a" patterns. The HRN does not need to separate these patterns in the hidden unit patterns since they are retained in the Hebbian memory.

Structure in the HRN, then, is encoded both by the hidden unit activations and architecturally through the Hebbian weights. So doing, decreases learning times and increases the degree of generalisation across lists.

Discussion and Conclusions:

Within this case study the mechanism of the Hebbian Recurrent Network has been analysed with respect to memory, time, change and structure:

Memory:
The HRN includes three mechanisms of memory. Hidden unit activations are analogous to Short Term Memory, the Hebbian weights are analogous to Intermediate Term Memory and the Backpropagation weights are analogous to Long Term Memory.
Time:
As in the SRN, time in the HRN is reduced to sequence information. The effects of absolute time are beyond the scope of the model.
Change:
Long term change within the HRN is accomplished through the backpropagation weights. This mechanism emphasises the role of the environment in determining behaviour. The word frequency effect in recognition memory can be accounted for by considering how the HRN interacts with the statistics of word usage.
Structure:
The HRN encodes items on the hidden units which are then stored in the Hebbian weights. In contrast to the SRN, which must form a list data structure, the HRN learns more quickly and generalises across lists immediately.

References

Brightfield, R. (1995). The contribution of word density to the word frequency effect in item recognition. Honours thesis, Department of Psychology, University of Queensland.

Dennis, S. (1993). The Integration of Learning into Models of Human Memory. Ph.D. thesis, The University of Queensland.

Greene, R. L. (1992). Human Memory: Paradigms and Paradoxes. Lawrence Erlbaum Associates, Hillsdale, NJ.

Phillips, S. (1991). Serial recall using an Elman net with hints. Unpublished manuscript.

Shepard, R. N. (1967). Recognition memory for words, sentences and pictures. Journal of Verbal Learning and Verbal Behaviour, 6, 156-163.