Human Analogical Reasoning Capacity: Toward a Neural Net Model

Graeme.S. Halford
University of Queensland

William.H. Wilson
University of New South Wales

Brett Gray
University of Queensland
Australia

and

Steven Phillips
Electrotechnical Laboratory
Japan

Copyright restrictions apply to this article. Contact Graeme Halford for additional information.

Paper presented to XXVI International Congress of Psychology, Montreal, Canada, August 16-21, 1996.

Special characters used in this article may not be dispalyed by some browsers. To view document with complete figures, tables, and special characters download the Microsoft Word rtf document.
 


Abstract

Analogical processes can provide a powerful and psychologically realistic account of human reasoning, but this requires that models of analogy be consistent with human limitations in processing capacity. Human capacity is much more constrained in relational than in associative processing, and there is evidence that adult humans are limited to processing quaternary relations in parallel. The reason is apparent from neural net representation of relations. A model of analogical reasoning that incorporates psychologically realistic information processing capacity is presented, and its implications for human cognitive functioning are discussed.


Human Analogical Reasoning Capacity: Toward a Neural Net Model

The growth of parallel processing models of human analogical reasoning raises the question of the complexity of structures which humans can map in parallel. A good case can be made for parallel processing in analogies (Holyoak & Thagard, 1989) but it is implausible that the most complex analogies can be processed entirely in parallel. Even a relatively simple analogy, such as that between heat-flow and water-flow shown in transparency 1 has a structure that is too complex to be processed entirely in parallel by humans.

Figure 1

If computational models are to be psychologically realistic, a means must be found for quantifying the complexity of structures that can be processed in parallel. It is also necessary to explain how problems that exceed this capacity are processed. For this purpose a metric is needed for quantifying the amount of information that can be processed in parallel. We have worked on this problem for a number of years, and have found that a metric in which complexity of relations is quantified by the number of arguments is the best conceptual complexity metric.

Capacity and complexity

Each argument of a relation provides a source of variation, or dimension, and thereby makes a contribution to the complexity of the relation. An N-ary relation can be thought of as a set of points in N-dimensional space. Relations of higher dimensionality (more arguments) impose higher processing loads. A unary relation is the least complex, and has one argument. It corresponds to a set of points in unidimensional space. A binary relation (e.g. BIGGER-THAN) has two arguments, and a ternary relation has three arguments (e.g. LOVE-TRIANGLE is a ternary relation, and has arguments comprising three people, two of whom love a third).

We have proposed (Halford, et al., 1994; Halford & Wilson, submitted) that processing capacity of higher cognitive processes can be quantified in terms of this dimensionality concept. Assessment of the working memory literature, plus some specific experimentation, has led to the conclusion that adult humans can process a maximum of four dimensions in parallel, equivalent to one quaternary relation (Halford, 1993; Halford, et al., 1994; Halford & Wilson, submitted). Structures more complex than this must be processed by either conceptual chunking or segmentation.

Conceptual chunking is recoding a concept into fewer dimensions. We can illustrate conceptual chunking using the concept of velocity, defined as V= st. The relation between velocity, distance and time is three dimensional, but velocity can be expressed as a single dimension, such as the position of a pointer on a dial; VELOCITY(60 kph). In this single dimensional representation, no relation is defined between velocity, distance and time. If we want to compute (say) the way velocity varies as distance increases and time decreases, we must return to the three-dimensional representation. Thus conceptual chunks save processing capacity, but the cost is that some relations become temporarily inaccessible. There is also a psychological factor which limits chunking, because experience is required in which there is a constant mapping of elements into chunks (Logan, 1979). Chunking is a form of learning, which takes place over time.

The chunking principles are: (1) a chunk functions as a single entity, predicate or argument, in a relation. (2) no relations can be accessed between items within a chunk. (3) relations between the chunk and other items, or other chunks, can be represented.

Segmentation is decomposing tasks into steps small enough not to exceed processing capacity, as in serial processing strategies.

Analogy model

The Structured Tensor Analogical Reasoning (STAR) model (Halford, Wilson, Guo, Gayler, Wiles, & Stewart, 1994) was designed to incorporate realistic human information processing capacities into a PDP model of analogy.

An analogy is a structure-preserving map from a base or source to a target (Gentner, 1983). The structure of base and target are coded in the form of one or more relations. The question of how to represent relations in neural nets is one of the major issues in the field. It may be helpful to consider some of the properties that relational schemas should have.

FIGURE 2

Transparency 2

Properties of relations not shared with associations

An N-ary relation R(a1,a2, . . ,an.) is a subset of the cartesian product of N sets. Its psychological representation is a binding between a relation symbol, R, and the arguments a1,a2, . . ,an.

In relations the type of link can vary and is identified by a symbol

In associations links are all of the same type and have no symbol

Relations can be related to each other in higher-order relations

Associations can be chained, and can converge or diverge, but the associative link per se cannot be an entity in another association, so there is no associative equivalent of higher-order relations

Relational systematicity is possible because of higher-order relations (for example a>b Æ b<a). Implies is a higher-order relation

Associations do not have this property

Omni-directional access means that, given all but one of the components of a relation, we can access the remaining component. For example, given MOTHER-OF(woman,child), and given MOTHER-OF(woman,-) we can access "child", whereas given MOTHER-OF(-,child) we can access "woman", and given -(woman,child) we can access "MOTHER-OF".

Associations may be backward but omnidirectional access is not necessarily implied

Representation of roles or slots, independent of specific instances. Thus BIGGER-THAN(-,-) entails representation of slots for a larger and a smaller entity

Associations do not entail content-independent representations

Dimensionality: each role defines in a dimension*. Thus a unary relation is a set of points in unidimensional space, a binary relation is a set of points in 2 dimensional space, and so on. This provides a conceptual complexity metric.

Associations are always reducible to ordered pairs.

Neural net models which in principle fulfil the criteria for relational knowledge can be categorised by type of binding and by type of architecture. The models of Hummel & Holyoak (in press), Plate (1995), Shastri and Ajjanagadde (1993) and Smolensky (1990) use role-filler bindings, while our own model uses relation symbol-argument-argument bindings (we will refer to this as symbol-argument-argument binding). Architectures can be divided mainly into models based on a matrix (Halford, et al. 1994b; Plate, 1995; Smolensky, 1990) and models based on synchronous oscillation (Hummel & Holyoak, in press; Shastri and Ajjanagadde, 1993).

Matrix models represent roles and fillers by vectors, and the binding is represented by computing a matrix of the vectors. In some models the matrix is the outer product of the vectors, though in the model of Plate (in press) the matrix is compressed by computing the circular convolution of the vectors. The way relations for water-flow are represented in our model is shown in transparency

FIGURE 3

A unary relation, R(a) can be represented by a rank 2 tensor product VRŸV1. A binary relation, R(a1,a2) can be represented by a rank three tensor product VRŸV1ŸV2, and so on. The elements within the matrix are the binding units and their activations can be computed on-line. Therefore matrix represents a dynamic binding.

Roles are not represented by vectors in symbol-argument-argument models, but separate sets of units are used for each argument vector. A structural alignment procedure is used to ensure all vectors are represented on the correct set of units. Roles are not explicitly represented, but arguments are assigned to the correct role position by ensuring that they are corrected related to each other.

Matrix models are limited by the combinatorial explosion in number of binding units required as the rank of the relation increases. If a relation with k entities, one relation symbol and k-1 arguments is a represented by a tensor product of vectors with n elements, there are nk binding units - see transparency. Circular convolution models appear at first sight to avoid this problem, but computational complexity analysis by Steven Phillips has shown that they also have exponential space complexity in the rank of the relation. In synchronous oscillation models the complexity of relations is limited by the number of discriminable phases. Both types of models agree that neural net are limited to representations of low rank.

FIGURE 4

The model

Simple proportional analogy

The representation of base and target in the analogy cat:kitten::horse:foal is shown in transparency 4. There is a vector representing the predicate MOTHER-OF, and other predicates such as LOVES, FEEDS, PROTECTS, LARGER-THAN. The first arguments of both base and target are superimposed on one vector, and the second arguments on another, as shown. Predicate-argument bindings other than those essential to the problem are represented (e.g. LARGER-THAN(mare,rabbit)) to demonstrate that the model can select the appropriate solution and avoid irrelevancies.

The solution of the problem cat:kitten::mare:? is presented schematically in the transparency. In the first step, the input is cat:kitten and the output is all the predicates that have "cat" and "kitten" as arguments (a "predicate bundle"). That is, the output represents the set {MOTHER-OF, LOVES, FEEDS, PROTECTS, LARGER-THAN}.

In the second step, this predicate bundle is used as input, together with "mare". The output is an "argument bundle" representing all possible solutions. The possible solutions can be recognized by computing the inner product of vectors representing each possible solution with the output vector. Alternatively, it can be done by an auto-association technique (Chappell & Humphreys, 1994) which selects the most appropriate solution.

Notice that "foal" is not the only valid solution. For example, "rabbit" is a syntactically correct solution because cat:kitten::mare:rabbit is a valid analogy. The base can be represented as LARGER-THAN(cat,kitten) and the target as LARGER-THAN(mare,rabbit). The preferred solution "foal" can be justified on a number of grounds however. One is the salience of predicates. MOTHER-OF is a more salient predicate relating "cat" and "kitten" than is LARGER-THAN, because of the stronger associations between terms such as "kitten" and the mother-infant relation. Second, the solution "foal" fits more predicates than does "rabbit". The solution "foal" is consistent with all the predicates MOTHER-OF(mare,foal), LOVES(mare,foal), FEEDS(mare,foal), PROTECTS(mare,foal), LARGER-THAN(mare,foal). However "rabbit" fits only one predicate LARGER-THAN(mare,rabbit). The model can use both salience and the number of predicates consistent with a solution to produce an analogy corresponding to the one which we would find most satisfying. Therefore the model acknowledges that more than one solution is syntactically consistent, but can distinguish between solutions according to their plausibility.

Complex analogies - serial and parallel processing

The main focus of this paper is analogies which are too complex to be completely represented in parallel. We will consider two examples, the analogy between water-flow and heat-flow. Water-flow is caused by pressure-difference. There is a higher-order relation CAUSE which has pressure-difference and water-flow as arguments. The way this complex structure can be represented in the model without exceeding processing capacity limitations is shown in the transparency. Pressure-difference and water-flow are each chunked to a single vector. Conceptual chunking is implemented by concatenating the vectors in the tensor product, then associating this concatenation to a new smaller vector representing the chunked term. Cause is then represented as a binary relation, with the chunked representations of pressure-difference and water-flow as arguments.

The model can actively represent the causal relation between pressure-difference and water-flow, but cannot simultaneously represent the structure of the pressure-difference and water-flow concepts, because these are chunked into single entities. These must be unpacked in order for their constituent structure to be represented. However while the constituent structure of pressure-difference and water-flow are being actively represented, the overarching causal relation between them cannot be.

The model implements segmentation by matching relations in base to relations in target one at a time, staying within the limitation of not representing more than one quaternary relation in parallel. The model implies that complex analogies must entail a combination of parallel and serial processing because parallel processing of the entire structure would exceed capacity limitations. The task is segmented into relations, with parallel processing within each relation, but serial processing between relations. This is implemented by coding each relation as a tensor product which binds predicate and argument vectors, and permits predicates and arguments to be recovered. These vectors can be chunked into a single vector, which can be an argument to a higher-order predicate, enabling a hierarchy of relations to be represented.

Analogical mapping process

FIGURE 5

This is based on the set of nets shown in the transparency. There are four main structures in the model, the focus selection network, the argument mapping network, information storage structures and inference processes. The focus selection and argument mapping networks are constraint satisfaction networks, while the storage and inference nets are tensor products of varying ranks.

Because of processing capacity restrictions the model maps at most one quaternary relation at a time. The model first selects a single relational instance in base and target as a focus. This is the function of the focus selection network. The argument mapping network is then loaded with a representation of the relational instances selected from each domain. The base relational instance is mapped into the target relational instance, and the mapping is stored in the map storing network. The system then returns to the focus selection network which selects a new focus and the procedure is repeated.

Argument mapping network

FIGURE 6

This is a constraint satisfaction network, where the rows represent a base relational instance and the columns represent a target relational instance. An example of the network is shown mapping the top nodes of water-flow to heat-flow. That is, it is mapping causal relation between pressure-difference and water-flow to the causal relation between temperature difference and heat flow. The first row and column represent the relation symbol of the base and target respectively. The other rows and columns represent the arguments. The nodes within the matrix represent potential mappings of base and target elements.

Each of the nodes has an associated activation value. The general running of the network involves repeatedly updating the activation value of each of the nodes. Each node receives excitatory and inhibitory input from each of the nodes it is connected to, and this input moves the node's activation value toward a maximum or minimum value. After many iterations the activation settles to a stable state in which a number of nodes have significantly greater activation than the other nodes. These nodes are considered winning nodes, and indicate the mapping adopted.

The shaded nodes in the diagram show the winning nodes for this example. They represent the mapping of the relation symbol and arguments of the base to the relation symbol and arguments of the target.

FIGURE 7

The full argument mapping network is shown in transparency 7.

Inhibitory connections between all nodes in the same row or column tend to make mappings unique; that is each base element is mapped to at most one target element and vice verse. This is because the winning node of each row or column provides enough inhibitory input to the other nodes of the same row or column to stop them developing significant activation.

Excitatory connections exist between all nodes not in the same row or column to allow a stable growth of activation.

A number of heuristics are implemented by influencing nodes that provide constant excitatory or inhibitory input to the mapping nodes, biasing them towards or away from becoming winning nodes. The heuristics implemented include the following biases:

Corresponding argument position node: this gives an advantage to nodes that map arguments in corresponding positions, so the first argument of the base tends to be mapped to the first argument of the target, and so on.

Similarity - there is a bias to map more similar entities

Type - entities of the same type tend to be mapped.

Salience - items of higher salience are given priority

Consistency - there is a bias towards mappings that are consistent with previous mappings.

FIGURE 8

Focus selection network has two layers both with similar structure to the argument mapping network, the main difference being that the rows and columns represent chunked relational instances rather than symbol and argument. The lower layer is influenced by many of the same sources of activation as the mapping network, with the addition of the following:

Height node - there is bias to relational instances that are higher in the target structure.

Symbol node provides a bias towards selecting a pair of terms with the same relation symbol, or previously mapped relation symbol.

Argument node provides a bias towards selecting a pair of terms with common or previously mapped arguments.

Number of arguments node - there is a bias against pairs of relational instances that do not have equal numbers of arguments.

Excitatory connections are placed between mapping nodes in which the terms represented by one node are arguments to the terms represented by the other node. This provides a bias towards selecting similarly-shaped tree structures.

Layer two has a similar structure to layer one except that there are strong inhibitory connections between all nodes, thereby ensuring selection of a single focus.

There are unidirectional excitatory connections from units in layer one to the corresponding units in layer two. The weight of these connections is set to zero for nodes that have already been a focus.

FIGURE 9

Map storing network

Mappings between base and target entities are stored by computing the outer products of the vectors representing the base and target entities. These are then weighted by multiplying by a scalar representing the salience of the relational instances that were mapped. The resulting outer products are then add into a tensor network, which represents the accumulated mappings so far. Tensor products are also used to store other information including similarity.

Mapping score can then be calculated for each item in the base and target, which indicate to what items they are mapped.

Mapping heat-flow/water-flow

We will quickly sketch how the model would map water-flow to heat-flow. The mappings are shown in Figure 1.

1. The focus-selection network would initially select a chunked representation of the causal relation between pressure-difference and water-flow in the base, and the chunked causal relation between temperature-difference and heat-flow in the target.

2. The selected chunked terms are mapped.

3. The argument mapping network is loaded with the unchunked representations of these two terms.

4. The argument mapping network should converge to a solution in which the cause relation symbols in base and target are mapped, the chunked relation pressure-difference is mapped to the chunked relation temperature-difference, and the chunked relation water-flow is mapped to the chunked relation heat-flow.

5. These mappings are stored in map storing network.

6. Control returns to the focus selection network, which is likely to select the chunked terms pressure-difference and temperature-difference.

7. The unchunked terms are loaded into the argument-mapping network. This would map the relational symbols "greater" in base and target. Pressure-vessel-A chunked term would be mapped to temperature-object-A chunked term, and Pressure-vessel-B chunked term would be mapped to temperature-object-B chunked term.

8. These mappings would be stored in the map storing network.

9. This process would be iterated, mapping the corresponding structures until a mapping criterion is met.


References

Chappell, M., & Humphreys, M. S. (1994). An auto-associative neural network for sparse representations: Analysis and application to models of recognition and cued recall. Psychological Review, 101(1) 103-128.

Falkenhainer, B., Forbus, K. D., & Gentner, D. (1989). The structure-mapping engine: Algorithm and examples. Artificial Intelligence, 41 1-63.

Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7 155-170.

Halford, G. S. (1993). Children's understanding: the development of mental models. Hillsdale, N. J.: Erlbaum.

Halford, G. S., Wilson, W. H., Guo, J., Gayler, R. W., Wiles, J., & Stewart, J. E. M. (1994). Connectionist implications for processing capacity limitations in analogies. In K. J. Holyoak & J. Barnden (Eds.), Advances in connnectionist and neural computation theory, Vol. 2: Analogical connections (pp. 363-415). Norwood, NJ: Ablex.

Halford, G. S., & Wilson, W. H. (submitted). Processing capacity defined by relational complexity: Implications for comparative, developmental, and cognitive psychology. .

Holyoak, K. J., & Thagard, P. (1989). Analogical mapping by constraint satisfaction. Cognitive Science, 13(3) 295-355.

Holyoak, K. J., & Thagard, P. (1995). Mental leaps. Cambridge, MA: MIT Press.


Special characters used in this article may not be dispalyed by some browsers.
To view document with complete figures, tables, and special characters, download the Microsoft Word rtf document.


Transparency 3. Representation of water-flow.
 

Transparency 4. Representation of base and target (A), and processing of simple analogy (B).