Friday 28 January |
|
8:30 - 9:30am
|
Registration and coffee |
9:30 - 11:00am
|
Tutorial on Evolutionary
Computation |
11:45 - 12:00am
|
Opening Address |
12:00 - 1:00pm
|
What
is stable enough for language to evolve and still be learnable? |
1:00 - 2:00pm
|
Lunch |
2:00 - 3:00pm
|
Negotiating
syntax |
3:00 - 4:00pm
|
Language
adapts to aid its own survival: Towards a working model of the emergence of
morphosyntax |
4:00 - 4:30pm
|
Afternoon tea |
4:30 - 5:30pm
|
Evolving language to the edge of chaos: Boolean nets and linguistic parameters Professor James Hurford |
|
|
9:30 - 10:30am
|
What are the conditions for language emergence in the record of human evolution? Professor Iain Davidson |
10:30 - 11:00am
|
Morning tea |
11:00 - 12:00am
|
The punctuated equilibrium model of
language evolution |
12:00 - 2:00pm
|
Poster session and Lunch |
2:00 - 3:00pm
|
Survival
of the least fit: The prisoner's dilemma in evolutionary computation |
3:00 - 4:00pm
|
Investigating the constraints on the
emergence of word order universals: Evidence from connectionist simulations
and artificial grammar learning |
4:00 - 4:30pm
|
Afternoon tea |
4:30 - 5:30pm
|
Signs,
symbols and words in language evolution models |
5:30pm
|
Closing address - A/Prof Janet Wiles |
Saturday 28 January |
12.00-2.00pm |
|
Evolving recurrent networks for context-free language prediction Mikael Boden, Henrik
Jacobsson, and Tom Ziemke Training neural networks to predict a context-sensitive language by evolutionary hill-climbing Stefan Chalup and Alan D.
Blair Modeling sound systems with evolutionary computation techniques Jinyun Ke, Mieko Ogura, and
William S-Y Wang Peter Linaker Evolving self-sacrifice: A case study in experimental ethics Ann Nicholson, Kevin Korb,
and Steven Mascaro Using an evolutionary algorithm to guide problem selection in an online educational game Elizabeth Sklar and Jordan
Pollack Brad Tonkes and Janet Wiles Neural weak classifiers for language learning Michael Towsey, Claire
D'Este and Joachim Diederich An evolutionary model of natural language generation Huck Turner Exploring semantic complexity by a computational learning model Yuan Yao and Jinyun Ke |
Human language enables precise communication via propositional structures. This is the case despite the relatively rapid rate of change of language within an historical framework. The talk will consider how this feature of communication may have evolved and be learnable. It will be suggested that two adaptive pressures need to be jointly considered: the utility of precise expression (e.g., the ability to know who did what to whom) and how language structures are transmitted via the minds of children. We note that evolutionary arguments require that the issues of intensity of selection pressure, the stability of neural processing conditions and the invariance of responses need to be considered. In discussing what may be constant and consistent in language while being processed by the same kinds of brain structures we will consider the suggestion of Pulvermuller (1999) that cell assembles for "action" and "perception" words are associated with modality related channels. In terms of learnability we will consider what miminalist syntax and optimality theory have to offer, and try relate this discussion to empirical aspects of language acquisition and function.
Pulvermuller F. (1999). Words in the brain's language. Behavioral and Brain Science, 22, 253-336
If accurate communication provides benefit to both participants in a communicative event, at least some of the time, the sender will motivated to produce signals that the receiver is likely to interpret accurately and the receiver will be motivated to interpret a signal as the meaning it is most often used to express. In a population where this is true, a conventional communication system can result from a process of negotiation, as the agents alternate between contributing to, and conforming with, the emerging system.
A conventional communication system could be based on a simple table in which each meaning to be conveyed is associated with a unique signal. However as the number of meanings increases, the system becomes increasingly difficult to learn and to use.
If the set of meaning is large, the agents probably don't use a distinct internal representation for each of them. Their representations are most likely constructed from a relatively small number of component types. The interpretation of an internal representation depends on the specific components use to construct it, and how they are configured.
When an agent expresses a meaning, it will, in general, use aspects of the structure of its representation of the meaning in its derivation of the structure of the signal used to express that meaning. The receiver's interpretation of the signal will make use of the signal's structure to derive a representation of the meaning it conveys. The derivations performed by the sender and receiver therefore constitute implicit analyses of the relations between the structures of meanings and signals.
The structural derivations performed by the sender and the receiver might have no relation to one another, except that they both involve the same signal, and, if the agents are lucky, equivalent meanings.
For them to achieve better than chance accuracy, I assume that the members of a population obey a set of negotiated conventions regarding possible derivational relationships between the structures of signals and meanings. These conventions can emerge if senders perform their derivations of signals while considering how receivers might interpret them, and if receivers derive their interpretations of signals while considering how they might have been constructed by senders. By including recursive characterizations of structural properties, the set of conventions can be used to coordinate communication of an unbounded set of meanings with a relatively small set of conventions. Analyses of structural mappings between signals and meanings are used by learners as they attempt to discover the conventions the users of a communication system obey, and as they attempt to perform derivations of signals and meanings in accord with the conventions.
In this account, syntactic structure is a representation of the conventional aspects of the structural derivations performed by senders and receivers. It is not reducible to the domains of meaning or phonology, although regularities or constraints in either domain may influence aspects of syntactic structure. Other syntactic regularities and constraints can emerge as a result of the negotiation process.
Results from a number of recent computational models consistent with this account suggest that agents with fairly general representational and learning abilities can negotiate communication systems capable of accurately conveying very large numbers of meanings. The systems that emerge from the negotiations incorporate structural regularities and constraints that resemble some aspects of the syntax of human languages.
Over the past decade there has been a resurgence of interest in the origins of human language. Linguists have started to wonder if the unique properties of Language --- particularly its syntactic structure --- might be explained, in the same way other properties of our biological make-up are, in terms of the evolution of our species. Unfortunately, a satisfactory account of the origins of syntax in terms of natural selection has proved difficult to formulate.
In this talk, along with many of the other speakers at this workshop, I will explore a different kind of evolutionary approach: one in which the status of languages themselves as complex adaptive systems is taken seriously. In this view, our (innately given) language faculty acts as a set of selection pressures on the persistence of linguistic variants over time. Specifically, for a particular I-language (i.e. competence grammar) to survive it must be repeatedly mapped into E-language (i.e. utterances) by speakers, and mapped back into the same I-language by learners.
The effect of this linguistic (as opposed to natural) selection on the structure of languages can be tested by building working models of populations of learners in computational simulations. With these models we can observe the properties of the languages that emerge over a cultural timescale in populations with particular hand coded (i.e. innately given) properties. In other words, in contrast to a standard evolutionary model, we fix the structure of our computational agents are born with; they are not subject to selection pressure and do not themselves evolve. The only thing that changes is the linguistic system that the agents pass on culturally.
The model starts with no initial language and certain fairly minimal assumptions about language production and language learning. However the results are surprisingly complex: after an initial stage where the population converges on a rudimentary, unstructured communication system, languages emerge that structurally resemble human ones in many respects. In these "evolved" systems, length of utterance correlates inversely with frequency; meanings are typically expressed using a recursively compositional syntax; but highly frequent meanings are expressed idiosyncratically.
I will analyse these results in terms of a competition between two aspects of linguistic transmission: a learning bottleneck which favours a topographic mapping between meanings and strings; and speaker laziness which favours a minimal-length code. The take-home message will be that this competition results in language-like systems through properly linguistic as opposed to biological evolution. In other words, to understand the origins of morphosyntax, rather than looking at the way humans have adapted to be better at learning language, we should appreciate the way language has adapted to being better at being passed on by us.
A related paper discussing an earlier version of the model is available:
in press. "Learning, Bottlenecks and the Evolution of Recursive Syntax." in Linguistic Evolution through Language Acquisition: Formal and Computational Models, edited by Ted Briscoe. Cambridge University Press.
Postscript:
ling.ed.ac.uk/anonftp/pub/staff/kirby/ted.ps.gz
PDF: ling.ed.ac.uk/anonftp/pub/staff/kirby/ted.pdf
This paper describes an attempt to cast several essential, quite abstract, properties of natural languages within the framework of Kauffman's random Boolean nets. These properties are:
COMPLEXITY. Human languages are massively complex, but, paradoxically, children have no worries in mastering them. Languages thus seem to be at a level of `masterable complexity'. Recent work in the theory of complexity has shed light on the region of the phase change between orderly and chaotic systems, and shown how many natural systems tend to evolve toward a level of complexity just at or below `the edge of chaos'.
INTERCONNECTEDNESS. `Une langue est un systeme ou tout se tient'. This maxim of Linguistics could be translated as `a language is a system in which everything depends on everything else'. Until recently there has been no analytical machinery for understanding systems in which everything is interdependent. But now, with the advent of network models, we are beginning to see new ways of conceiving the properties of such systems.
STABILITY. Languages (and speaker's judgements about examples from them) are very stable, but subject to small changes over time.
DIVERSITY. There are many human languages. Putting aside differences in vocabulary, there are probably no two languages out of the 6000 extant languages with exactly the same grammatical system. Of course, the number of possible humanly-learnable grammatical systems must be far greater.
UNDERDETERMINEDNESS. The complete knowledge of a language reached by an adult is only partially determined by experiential input.
Specifically, in the research reported here, a language is modelled as an attractor of a Boolean net. (Groups of) nodes in the net might be thought of as linguistic principles or parameters as posited by Chomskyan theory of the 1980s. According to this theory, the task of the language learner is to set parameters to appropriate values, on the basis of very limited experience of the language in use. The setting of one parameter can have a complex effect on the settings of other parameters. A random Boolean net is generated and run to find an attractor. A state from this attractor is degraded, to represent the degenerate input of language to the language learner, and this degraded state is then input to a net with the same connectiviuty and activation functions as the original net, to see whether it converges on the same attractor as the original. In practice, many nets fail to converge on the original attractor, and degenerate into attractors representing complete uncertainty. Other nets settle at intermediate levels of uncertainty. And some nets manage to overcome the incompleteness of input and converge on attractors identical to that from which the original inputs were (de)generated. Finally, a genetic algorithm is used to select a population of such successful nets, and the properties of the connections and activation functions of these successfully evolved nets are examined.
Noble and Davidson outlined in their book *Human evolution, language and mind* (CUP 1996) how the interpretation of the evidence of hominin and human evolution can be put together with a social construction view of mindedness to create a coherent account of the evolutionary emergence of mindedness, through the origins of language.
In this paper, I will outline the essential steps in the story of hominin and human evolution. I will preface this with an account of the evidence of the changing cranial capacity of fossil hominins. This will serve to illustrate the key elements of evolutionary argument. I will then use this as a framework to discuss the contexts of the emergence of different important features in the story--the common ancestor of African apes (incuding humans), bipedalism, thermoregulation, tool-making, meat-eating, throwing, the control of the means of production of communicative utterances, colonisation of new environments, the production of symbols which are arbitrary but conventional and the use of language as a means of concealing information.
I will conclude the paper by beginning an analysis of the key elements of the Noble and Davidson argument and how these might be considered in the context of attempts to consider how human-like consciousness might be produced in machines.
The lecture begins by discussing limitations on the applicability of the 'family tree' model of linguistic relationships; and the pervasive nature of linguistic diffusion among languages which are in contact for an extended period within a given geographical area.
It puts forward a new hypothesis concerning language development (inspired by recent ideas in biology) - a punctuated equilibrium model. It is suggested that over most of the 100,000 years or more that languages have been spoken, there has existed a state of equilibrium. Within a given region a number of languages (each spoken by a smallish population) would have existed in a state of relative harmony, with no one language having much greater prestige than any other. Linguistic features would have diffused across the languages of the region so that they gradually converged on a common structural prototype.
The state of equilibrium is, from time to time, punctuated. The trigger could be some natural event (e.g. flood or drought), material innovation (the most notable being the evolution of agriculture), the emergence of an ambitious leader or an aggressive religion, or just geographical movement into new territory. During a period of punctuation we get expansion and split of peoples and of languages. It is then that the family tree model is appropriate - a single proto-language develops into a series of distinct daughter languages, all gradually diverging more and more from the proto-language. The period of punctuation (which will be relatively short, at most just a few thousand years) will gradually merge into a new period of equilibrium (which will be relatively long, often tens or thousands of years).
This explains why it has not been possible to scientifically establish any higher-level links between accepted language families; they probably had their origins in the end of a period of equilibrium. (It is not likely that a 'family-tree-type' period of punctuation would have been immediately preceded by earlier 'family-tree-type' period of punctuation.) The lecture pays particular attention to the criteria for proving genetic relationships between languages, and provides a critique of unscientific work, such as that on lexicostatistics and 'Nostratic'. There is consideration of the nature of putative proto-languages, with the suggestion that some language families may have developed from a group of two or three typologically similar languages (spoken in a small area), rather than from a single language. It is also shown how it may sometimes be impossible to tell whether a particular similarity between languages is evidence for genetic relationship or the result of diffusion.
The final sections consider recent history, and make projections about the future. The expansion of Europeans into every other part of the world - which essentially began in 1492 - has punctuated the linguistic equilibrium that existed in Australia, parts of the Americas and Africa, and so on. A large intrusive culture, with a prestige language that is used almost exclusively in schooling and the media, signals the gradual abandonment of local languages.
In every part of the world the smaller, non-prestige languages are steadily falling towards extinction. The rate of language loss in a community can be slowed, but never halted or reversed. I predict that the only languages which have any chance of survival, in the medium term, are those that are the official language of a nation. And even that will not be the end. With increasing globalisation the end product will be a single language, spoken by everyone.
In genetic algorithms it is often taken for granted that selection of the most successful members of a population will result in individuals whose fitness is higher than their ancestors. On the contrary, there are circumstances in which "survival of the fittest" is catastrophically bad, and survival of the least fit leads to the highest population fitness over time. Such situation are succinctly described in terms of the Prisoner's Dilemma concept from game theory. This is reminiscent of the 'no free lunch' result of computational learning theory, a connection which we hope to make explicit.
In the Prisoner's Dilemma, two agents make independent choices about whether they will "cooperate" with one another. What makes the situation interesting is the particular relationship between payoffs and actions: the joint payoff is highest for mutual cooperation, and lowest for mutual defection, yet at the individual level it always pays to defect (i.e. fail to cooperate). If fitnesses are determined from payoffs which have this structure, then selection of the fittest individuals leads to an erosion of the fitness of all survivors, as the number of cooperators drops. If one's goal is to evolve individuals (or populations, for that matter) with high fitness, one should actually select the LEAST fit individuals in each generation.
One widely investigated "escape" from the prisoner's dilemma lies in reciprocation: if contacts between agents are liable to be repeated, it can become "selfish" to cooperate if doing so engenders the cooperation of the other agent: the generic example of such a successful behaviour is the strategy known as tit-for-tat. A particularly interesting phenomenon in this iterate prisoner's dilemma is that a strategy X may beat strategy Y which in turn beats strategy Z, and yet Z may beat X, forming a cycle. If we consider a collection of these strategists occupying a two dimensional spatial array and "invading" one another under very simple dynamics, such cycles can persist indefinitely.
These spatial models exhibit behaviour that is very like self-organized criticality: they spontaneously form fractally distributed "clumpy" patterns having no characteristic length scale. On the other hand, this does not appear to be true in the time domain - a somewhat anomalous result.
In an echo of the situation with simple prisoner's dilemma agents, in these models the fastest (i.e. most consistent) invader is generally not the most successful, nor is the slowest invader the least successful. Simulation results (for the relative areas occupied in the long term) are in excellent agreement with the appropriate mean field theory - an intruiging result given that the mean field theory ignores all spatial structure.
Finally, we note that spatial patchiness among competing species has usually been attributed to variation in the underlying substrate or other extrinsic effects: the above model demonstrates that it can plausibly arise from competitive interactions alone.
One aspect of language that any comprehensive theory of language evolution must explain is the existence of language universals. The notion of language universals refers to the observation that although the space of logically possible linguistic subpatterns is vast, the languages of the world only take up a small part of it. That is, there are certain universal tendencies in how languages are structured and used. In this talk, I will provide an explanation for the emergence of one such universal relating to basic word order in sentences.
Across the languages of the world there is a high degree of consistency with respect to the ordering of heads of phrases. Within the Chomskyan approach to language these correlational universals have been taken to support the idea of innate linguistic constraints on word order. I will present an alternative explanation based on the suggestion by Christiansen (1994) that language has evolved to fit sequential learning and processing mechanisms existing prior to the appearance of language. These mechanisms presumably also underwent changes after the emergence of language, but the selective pressures are likely to have come not only from language but also from other kinds of complex hierarchical processing, such as the need for increasingly complex manual combination following tool sophistication. On this view, head direction consistency is a by-product of non-linguistic constraints on hierarchically organized temporal sequences. In particular, if recursively consistent combinations of grammatical regularities, such as those found in head-first and head-last languages, are easier to learn (and process) than recursively inconsistent combinations, then it seems plausible that recursively inconsistent languages would simply "die out" (or not come into existence), whereas the recursively consistent languages should proliferate. As a consequence languages incorporating a high degree of recursive inconsistency should be far less frequent among the languages of the world than their more consistent counterparts.
This talk will present converging evidence from four lines of research in support for this alternative explanation of word order universals. This evidence is derived from a theoretical analysis of rule interactions, from connectionist simulations (Christiansen & Devlin, 1997), from typological language data, and from artificial grammar learning in normal adults and aphasic patients. Together, this evidence suggests that constraints on the emergence of word order universals derive from non-linguistic constraints on the learning and processing of complex sequential structure. Thus, rather than a biological adaptation of learning mechanisms to fit linguistic structure, the evidence points to the adaptation of linguistic structure to fit pre-existing sequential learning mechanisms.
Christiansen, M.H. (1994). Infinite languages, finite minds: Connectionism, learning and linguistic structure. Unpublished doctoral dissertation, Centre for Cognitive Science, University of Edinburgh, U.K.
Christiansen, M.H. & Devlin, J.T. (1997). Recursive inconsistencies are hard to learn: A connectionist perspective on universal word order correlations. In Proceedings of the 19th Annual Cognitive Science Society Conference (pp. 113-118). Mahwah, NJ: Lawrence Erlbaum Associates. (click here for abstract and Postscript file)
Evolutionary Computation has recently been applied to studying the evolution of communication and language. Some models have been used for the simulation of the emergence of simple lexicons in populations of simulated organisms (e.g. Cangelosi & Parisi, 1998), in small communities of robots (Steel & Vogt, 1997), or in on-line Internet agents (Steels & Kaplan, 1999). In these studies organisms evolve shared lexicons for describing entities and relations of the environment. These models, that focus on lexicon emergence, do not make any explicit reference to the role of syntax in language origin. Their aim is to model the early stages of the evolution of animal-like communication.
Other evolutionary models have focused on the evolution of syntax (e.g. Batali, 1994; Kirby, 1999). Simulated organisms can evolve different syntactic categories starting from given sets of syntactic structures and constraints. In this type of models only the evolution of syntax is simulated. The associations that simulated organisms learn are self-referential symbol-symbol relationships. Therefore, these models are subject to the symbol grounding problem (Harnad, 1990) since they lack an intrinsic link between their symbols and the entities and relations existing in the organisms' environment. Indeed, internal symbols need some form of sensorimotor grounding. Due to the symbol grounding problem, the role of these models for understanding the evolution of cognition is reduced.
This talk will describe a series of models for the evolution of communication. It will focus on the distinction between signals, symbols, and words in language evolution. In particular, it will show how evolutionary computation techniques, such as Artificial Life, can be used to study the emergence of syntax and symbols from simple communication signals (Cangelosi, 1999). This type of language origin models can overcome the symbol grounding problem. In fact, simulated organisms use symbols whose semantic referents are constituted by categorical representations in the neural network's hidden layer. These semantic representations are activated by the actual presence of their referents in the organism's world.
Initially, computational models that evolve repertoires of isolated signals will be presented. For example, a recent study has simulated the evolution of signals for naming foods in a population of foragers (Cangelosi, & Parisi, 1998). This type of models study communication systems based on simple signal-object associations. Organisms learn and evolve simple stimulus associations between objects in the environment and signals. Communication signals only have referential relationships with the world's entities.
Then models that study the emergence of grounded symbols will be described. For example, in Cangelosi's (1999) work, simple syntactic rules are evolved, such as symbol combination and compositionality. The modeled behavioral task is influenced by Savage-Rumbaugh & Rumbaugh's (1978) ape language experiments. This second type of models mainly focus on the distinction between simple signal-object associations and complex symbol-symbol relationships. It permits a detailed analysis of the problem of symbol acquisition. For example, comparisons between symbol acquisition in animal models (e.g. chimpanzees) and in computational models (e.g. artificial neural networks) are made. They allow us to have an operational definition of the signal-symbol-word distinction in language evolution models.
Batali J. (1994). Innate biases and critical periods: Combining evolution and learning in the acquisition of syntax. In R. Brooks & P. Maes (eds), Artificial Life IV, Cambridge: MIT Press, 160-171
Cangelosi A. (1999). Modeling the evolution of communication: From stimulus associations to grounded symbolic associations. In D. Floreano et al. (Eds.), Proceedings of ECAL99 European Conference on Artificial Life, Berlin: Springer-Verlag, 654-663
Cangelosi A., & Parisi D. (1998). The emergence of a "language" in an evolving population of neural networks. Connection Science, 10(2), 83-97
Harnad S. (1990). The Symbol Grounding Problem. Physica D 42: 335-346
Kirby S. (1999). Syntax out of learning: The cultural evolution of structured communication in a population of induction algorithms. In D. Floreano et al. (Eds.), Proceedings of ECAL99 European Conference on Artificial Life, Berlin: Springer-Verlag
Steels L. & Kaplan F. (1999). Collective learning and semiotic dynamics. In D. Floreano et al. (Eds.), Proceedings of ECAL99 European Conference on Artificial Life, Berlin: Springer-Verlag
Steels L. & Vogt P. (1997). Grounding adaptive language games in robotic agents. In P. Husband & I. Harvey (eds). Proceedings of the Fourth European Conference on Artificial Life, London: MIT Press
1,3Department of Electronic
Engineering
City University of Hong
Kong, Hong Kong, China
2Tsurumi
University and 2,3University of California at Berkeley
Email: 1jyke@ee.cityu.edu.hk and 3eewsyw@cityu.edu.hk
In recent years there have been a series of computational efforts which aim to simulate the evolution of various language components including sound system, lexicon and syntax [Hurford et al].
In this study we propose a GA-based evolutionary game (EG) algorithm which incorporates game theory [Smith82] with genetic algorithm (GA) [Holland75] to model the emergence of human sound systems. We plan to extend beyond de Boer’s simulation [Boer97], with the following differences:
1) Instead of starting from scratch, we will assume that a large inventory of possible sounds which successively merge into fewer and fewer categories as a result of interaction among agents.
2) We extend the inventory to consonants in addition to vowels.
3) We will be guided by the criteria including articulatory ease and perceptual distinctiveness as well as other linguistic constraints [Wang71] [Lindblom98].
We hope that the algorithm we develop will account for the language change [Wang91] as well as the language emergence: game theory can model the horizontal transmission while genetic algorithm for the vertical transmission in the language change.
References
[Boer97] de Boer, Bart, Generating Vowel Systems
in a Population of Agents, Fourth European Conference on Artificial Life, ed.
by Phil Husbands and Inman Harvey, 503-510, 1997.
[Holland75] Holland,
H. J., Adaptation in Natural and Artificial Systems, The University of Michigan
Press, Ann Arbor, 1975.
[Hurford et al] Hurford,
James R., Studdert-Kennedy, Michael, Knight, Chris, eds, Approaches to the
Evolution of Language, Social and Cognitive Bases, Cambridge University Press,
1998.
[Lindblom98] Lindblom,
Björn, Systemic Constraints and Adaptive Change in the Formation of Sound
Structure, in [Hurford et al], 242-264, 1998.
[Smith82] Smith, Maynard J., Evolution and the Theory of Games,
Cambridge University Press, 1982.
[Wang71] Wang, William S-Y, The basis of Speech, in The Learning of
Language, 267-306, ed. by C.E. Reed, New York, 1971.
[Wang91] Wang, William S-Y, Explorations in Language, Pyramid Press, 105-130, 1991.
Machine Learning Research
Centre
School of Computer Science,
Queensland University of Technology
PO Box 2434
Brisbane, QLD 4001
towsey@fit.qut.edu.au
We describe a neural implementation of the combination of weak classifiers (CWC) algorithm [Ji and Ma, 1997] which is able to learn the one-step-look-ahead task where the input is natural language sentences. The one-step-look-ahead task is more usually implemented with Simple Recurrent Networks [Elman, 1990] whose architecture typically consists of comparitively few neurons but learning requires many thousands of presentations of the training data. Our implementation includes a four layered architecture which consists of (1) an input layer having one neuron for each input category, (2) a internally recurrent layer which captures the dynamics of the temporal input, (3) a large hidden layer of weakly classifying perceptrons and (4) a winner-take-all output layer having the same number of neurons as the input.
The CWC algorithm displays rapid learning (i.e. it has lower time complexity than training an SRN by backprop) but it builds up a hidden layer consisting of possibly thousands of weakly classifying perceptrons (i.e. the algorithm has high space complexity). This sacrifice of 'space' for 'time' appears to be an advantage in the context of learning cognitive skills such as language. Neural processing elements in the brain are presumeably not a limiting resource, whereas time may be a limiting factor where a task must be learned within a critical period.
Our language data consists of strings of lexical categories derived from a natural language text [Towsey et al, 1998]. An Elman SRN with 8 hidden units trained over 10,000 epochs achieved a correct prediction score of 65% on a test set. Our neural CWC algorithm achieved a score of 66% after accumulating 10,000 hidden perceptrons over 1000 epochs. This performance has been achieved without the need for back-propagation.
La Trobe University, Bundoora, Vic, Australia
Introduction
Dennett (1991) and others (e.g. Calvin, 1996) propose that linguistic
utterances are generated as a result of rapid evolutionary process in the brain
– an approach which has a number of advantages over traditional models. One of
these advantages is that it overcomes the infinite regress, present in many
traditional models, concerning their need for an input specification to
initiate the process (some initial representation of the meaning of the
utterance to be produced). Since evolutionary generation would proceed
serendipitously, there would be no need for an initial specification. The
resulting utterance would be one of many that a speaker could favour in a given
context.
Incorporating theory from several disciplines, the present study develops an evolutionary model of natural language generation, the syntactic aspect of which is simulated computationally. It works in the following manner: From a population of initially random draft utterances, the most promising among them are selected then ‘mated’ (and occasionally mutated) to produce a new set of drafts which tends to be an improvement on the previous generation. Through successive application of selection and recombination, the drafts become progressively more refined until a suitably well-formed utterance is produced. In the present implementation, selection was on the basis of syntactic well-formedness, but the approach is potentially scalable to include semantic and pragmatic selection criteria as well.
Methods
Principles and Parameters theory is applied as the syntactic framework of
the model, a major advantage being that it allows an utterance to be evaluated on
a continuous scale from very ungrammatical to grammatical rather than strictly
one or the other. This feature of the theory is necessary to differentiate
between utterances for selection in the evolutionary algorithm and is a result
of its modular structure. Fitness was evaluated according to the number (and
importance) of grammatical principles violated (e.g. the case filter, the theta
criterion, the extended projection principle, head-complement order and
others).
In the genetic algorithm, an utterance is encoded as a string of morpheme codes, each of which indexes an entry in a simple lexicon. The encoding is allowed to vary in length via insertion and deletion mutations while various other operators are also used to control the amount of variation in the population. The idea of ‘building blocks’ is used to argue that aspects of the X-Bar schema should emerge spontaneously from the model without explicit constraints on word-order. The idea is that the crossover procedure (of splicing chromosomes to ‘mate’ two parents) will tend to ensure that features strongly dependent on one another (in terms of their contribution to overall fitness) will appear close together on a chromosome. This is specifically tested in relation to complement-adjunct ordering in the present study. The basic argument is that a head-complement-adjunct word order (e.g. kissed him quietly) exhibits greater ‘linkage’ (the dependent constituents are closer together) than the ungrammatical head-adjunct-complement word order (e.g. *kissed quietly him) so the latter is more vulnerable to disruptions by the crossover procedure. Since both constructions score equally for fitness when no adjacency tests are allowed, it was assumed that under these conditions, if linkage has no effect, they would be generated with approximately equal frequencies.
Results
and Implications
From 72 utterances conforming to either head-complement-adjunct or head-adjunct-complement word orders, the simulation generated 58 of the former and 14 of the latter. These results are significant and in the direction predicted suggesting that it might be possible to eliminate the troublesome adjacency principle.
An analysis in terms of linkage
provides insight into locality in syntactic structure and has the potential to
make parsing more computationally feasible. Among the constructions discussed
in the present study are those involving heavy-NP shift for which a linkage
analysis provides a particularly elegant explanation.
An evolutionary approach to generation necessarily involves the parsing of competing drafts. It therefore constitutes an explanation for how processes of generation and understanding can make use of the same knowledge of language.
It seems likely that this kind of evolutionary approach could also be applied to model non-linguistic cognitive processes. Perhaps all intentionality could be cast in these terms. Deciding what to say next might simply be a special case of deciding what to do next. This view has some attractive features. For one, it would make human competence for language more continuous with other cognitive processes thus providing clues about the origins of the language faculty in the course of biological evolution.
Major References
Bard,
E. G., Robertson D. & Sorace A. 1996. Magnitude estimation of linguistic
acceptability. Language, 72, 32-68.
Berwick,
R. C. 1991. Principles of principle-based parsing. In R. C. Berwick, S. P.
Abney & C. Tenny (Eds.) Principle-based
parsing: Computation and psycholinguistics. Boston: Kluwer Academic
Publishers.
Calvin,
W. H. 1987. The brain as a Darwin machine. Nature,
330, 33-34.
Calvin,
W. H. 1996. How brains think: Evolving intelligence, then and now. London:
Weidenfeld & Nicolson.
Dale, R. 1993. The initial specifications for generation. In H. Horacek & M. Zock (Eds.), New concepts in natural language generation. New York: Pinter.
Dawkins,
R. 1989. The selfish gene (2nd ed.).
Oxford: Oxford University Press.
Dennett,
D. C. 1991. Consciousness Explained.
Boston: Little, Brown.
Goldberg, D. E. 1989. Genetic algorithms in search, optimization, and machine learning. New York: Addison-Wesley.
Inui, K., Tokunaga, T. & Tanaka, H. 1992. Text revision: A model and its implementation. In R. Dale, E. Hovy, D. Rösner & O. Stock (Eds.), Aspects of automated natural language generation. New York: Springer-Verlag.
Levelt,
W. J. M. 1989. Speaking: From intention
to articulation. Cambridge, MA: MIT Press.
McDonald, D. D. 1993. Does