Janet Wiles and J. Devin McAuley
Departments of Computer Science and Psychology
University of Queensland
janetw@cs.uq.edu.au, devin@psy.uq.edu.au
For many ANN models of cognitive phenomena, interesting behaviour appears to arise from the model as a total package, and it is often a challenge to understand how aspects of the behaviour are supported by components of the ANN. The goal of the case study analyses is to further such understanding, identifying the functional roles of ANN components and how they combine to explain cognitive phenomena.
Each Case Study is based on a published ANN simulation (or related architecture) in an area of cognitive science. These existing ANN models have been analysed with respect to three traditional strengths of ANNs - mechanisms for memory, time and change; and one area of weakness - mechanisms for structure (see below for details of these four areas). These four areas provide overlapping viewpoints from which to examine models. In essence, we believe they are the areas to look for the ``sources of power'' in a model.
The case studies are specifically intended for cognitive modellers who use ANNs, but we anticipate that they will also be of interest to the wider ANN community. The case study format grounds the analysis of the functional components of ANNs in the cognitive modelling literature, and focuses on phenomena that do admit a computational explanation. The case study method is intended as much as a learning experience as a communicative one.
Introduction: As background to understanding the functional components, each case study begins by describing the task addressed by the model. The descriptions concern the inputs and outputs: how the task information is encoded in the input representation, and how the model's response is encoded by the output representation. In many cases, there is a distinction between the cognitive task of the model and its instantiation in the input/output task of the network. For studies such as Elman's SRN, this mismatch is intentional, as the cognitive task can be viewed as a by-product of another process (i.e., discovery of lexical classes is via the prediction task in Elman's SRN). In others, the mismatch between cognitive task and input/output task may be less benign, obscuring the contribution of the model towards understanding the phenomenon. The second role of the introduction in each case study is to describe the network used and explain how the network addresses the cognitive task.
Memory: The target area of memory concerns identifying the information to be stored in memory, and the mechanisms that support it. There have been a range of mechanisms proposed for storing and retrieving memories in neural networks: such as implicit long-term coding of memories distributed in the weights; memory as an attractor; short-term memory as transient decay of activations; limit-cycle encoding of memories with synchrony as a method of retrieval. The case studies consider the what and how of memory storage and retrieval: What information needs memory? How is it stored and retrieved?
Time: The target area of time concerns how time is treated in the data, processing and parameters of the network. For example, does the network consider time as an absolute measure in which events in the input are time-stamped with reference to an external clock, as a sequence in which only the order of events is specified, or as a relative measure in which durations are ratios of one another? Methods for processing temporal information with neural networks have included: using a fixed or sliding time window which maps time into space; learning of time delays in the network weights; sequential processing of time slices; and encoding time as the phase angle of an oscillator. The case studies consider what measures of time are used by the network, how they are represented and what the underlying mechanisms are.
Change: In this section, the case studies consider the types of changes that occur in the neural network, parameters, data, etc, over a range of timescales:
They concern the types of changes that occur in the selected model and they are implemented in the network's mechanisms. (Note that few of these aspects apply to any one model, with many case studies focusing on change as learning.)
Structure: In this section, the case studies consider how structured information in the environment is represented as structured information in the network (e.g., an implicit grammar in training data can be encoded in the hidden-unit space of a recurrent net; and higher-order bindings can be stored using tensors or phase synchrony). Some structure is coded directly into the architecture of the network (e.g., the network may be partitioned into modules to directly encode structure). The way in which structure is encoded in the network constrains the generalization that the network capable of, and the case studies address whether generalization is due to direct coding of structure vs. learning it from the training data. There has been an ongoing debate in the ANN literature on the generalization abilities of networks. Where possible, the case studies address whether there are important aspects of structure that cannot be represented, learned or generalized by the network, or structure that may be expected to be reflected in the ANN model but is not.
Discussion and Conclusions: The final section involves discussion of how the functional components reviewed in the previous sections combine to explain the cognitive phenomena targeted by the model.