Turing's "Machines". These machines are humans who calculate.
Ludwig Wittgenstein
This paper goes back to Turing (1936) and treats his machine as a cognitive model (W,D,B), where W is an "external world" represented by a memory device (the tape divided into squares), and (D,B) is a simple robot that consists of the sensory-motor devices, D, and the brain, B. The robot's sensory-motor devices (the "eye", the "hand", and the "organ of speech") allow the robot to simulate the work of any Turing machine. The robot simulates the internal states of a Turing machine by "talking to itself."
At the stage of training, the teacher forces the robot (by acting directly on its motor centers) to perform several examples of an algorithm with different input data presented on tape. Two effects are achieved: 1) the robot learns to perform the shown algorithm with any input data using the tape; 2) the robot learns to perform the algorithm "mentally" using an "imaginary tape."
The robot's brain consists of two interacting context-sensitive associative memories (CSAM). One CSAM (call it AM) is responsible for motor control. It learns to simulate the teacher. The other CSAM (call it AS) is responsible for working memory and mental imagery. It learns to simulate the "external" system (W,D).
The model illustrates the simplest concept of a universal learning neurocomputer, demonstrates universality of associative learning as the mechanism of programming, and provides a simplified, but nontrivial neurobiologically plausible explanation of the phenomena of working memory and mental imagery. At the neurobiological level the working memory is connected with neuromodulation. The model is implemented as a user-friendly program for Windows called EROBOT.
A detailed theory of this model was first described in Eliashberg (1979). It was later shown how the dynamics of working memory of this model could be connected with the statistical dynamics of conformations of protein molecules (e.g., ion channels). See Eliashberg (1989, 1990, and 2003).
The paper includes the following sections:
Section 1.1 Turing's Machine as a System (Robot,World) This section goes back to Turing (1936) and treats his machine as a cognitive model of system (Man,World).
Section 1.2 System-Theoretical Background introduces some basic system-theoretical concepts and notation needed for understanding this paper.
Section 1.3 Neurocomputing Background provides some neurocomputing background needed for understanding the neural model discussed in Section 1.4.
Section 1.4 Designing a Neural Brain for Turing's Robot describes an associative neural network that can serve as the brain for the robot of Section 1.1.
Section 1.5 Associative Neural Networks as Programmable Look-up Tables illustrates two levels of formalism by replacing the "neurobiological" model of Section 1.4, expressed in terms of differential equations, with a more understandable "psychological" model expressed in terms of "elementary procedures."
Section 1.6 Robot That Learns to Perform Mental Computations enhances the sensory-motor devices, D, and the brain, B, of the robot described in Section 1.1. This enhancement gives the robot "working memory" and allows it to learn to perform "mental" computations with the use of an "imaginary" memory device.
Section 1.7 Experiments with EROBOT discusses some educational experiments with the program EROBOT that simulates the robot from Section 1.6. This section also serves as the user's manual describing the user interface of the program.
Section 1.8 Basic Questions discusses several basic questions related to the brain that are addressed by the discussed model.
Section 1.9 Whither from Here outlines some possibilities for the further development of the basic ideas illustrated by EROBOT.
Consider the cognitive system (W,D,B) shown in Figure 1.1. The diagram illustrates my vision of the idea of Turing's machine in its original biological interpretation (Turing, 1936). The reader not familiar with the concept of a Turing machine should read the part of Turing's original paper that describes the way of thinking that led him to the invention of his machine. A good description of Turing's ideas can be found in Minsky (1967). It is interesting to mention that Turing used the term "computer" to refer to a person performing computations.
To be able to simulate the work of an arbitrary Turing machine the robot (system (D,B) ) shown in Figure 1.1 needs to perform the following elementary operations:
Teaching. To teach the robot, the teacher acts directly on the robot's motor centers, NM. The teacher forces the robot to work as a Turing machine with several sample input data presented on tape. The goal of system AM is to learn to simulate the work of the teacher with any input data.
Examination. The teacher presents new input data written on tape. To pass the exam the robot has to correctly perform the demonstrated algorithm without the teacher.
Note. In a more complex cognitive model discussed in Section 1.6 (Figure 1.16) the robot will be required to perform the demonstrated algorithm without seeing the tape.
Problem of synthesis. Our first goal is to design associative learning system AM providing the described performance of the robot of Figure 1.1. We want to make our model neurobiologically consistent, so we shall try to use only those computational resources which can be reasonably postulated in biological neural networks.
Before proceeding with the problem of synthesis formulated in the previous section I need to define some basic system-theoretical concepts and notation needed for dealing with this problem. The reader who is familiar with these concepts should still read this section to make sure that we are using the same definitions.
A (deterministic) combinatorial machine is an abstract system M=(X,Y,f), where
The work of machine M is described in discrete time, ν, by expression y_{ν} = f( x_{ν} ), where x_{ν} ∈ X and y_{ν} ∈ X are the input and the output symbols of M, respectively, at the moment ν
Example: X={a,b,c}; Y={0,1}; f={(a,0),(b,1),(c,0)}. Input 'a' produces output '0', input 'b' produces output '1', and input 'c' produces output '0'.
The pairs of symbols describing function f are called commands, instructions, or productions, of machine M.
Intuitively, two machines M1 and M2 are equivalent if they cannot be distinguished by observing their input and output signals. In the case of combinatorial machines, machines M1 and M2 are equivalent if they have the same input and output sets and the same output functions. Instead of saying that M1 and M2 are equivalent one can also say that machine M1 simulates machine M2 and vice versa.
A machine universal with respect to the class of combinatorial machines is a system MU=(X,Y,G,F), where
Let MU(g) denote machine MU with a program g∈G. The pair (G,F) satisfies the following condition of universality: for any combinatorial machine M=(X,Y,f) there exists g∈G such that MU(g) is equivalent to M.
Example: X={a,b}; Y={0,1}; and G is the set of all possible functions from X into Y. There exist four such functions: f0={(a,0),(b,0)}; f1={(a,1),(b,0)}; f2={(a,0),(b,1)}; f3={(a,1),(b,1)}, that is G={f0,f1,f2,f3}.
In the general case, there exist n^{m} functions from X into Y, where n=|Y| is the number of elements in Y, and m=|X| is the number of elements in X.
A machine MU from section 1.2.3 is a programmable machine universal with respect to the class of combinatorial machines, if G is a set of (memory) states of MU, and there exists a memory modification procedure (called programming) that allows one to put machine MU in a state corresponding to any combinatorial machine (from the above class).
A programmable machine MU from section 1.2.4 is called a learning machine universal with respect to the class of combinatorial machines, if it has a programming (data storage) procedure that satisfies our intuitive notion of learning.
Note. In this study learning is treated as a "physical" (biological) rather than a "mathematical" problem. Therefore, we shall not attempt to formally define the general concept of a learning system. Instead, we shall try to design examples of programmable systems that can be programmed in a way intuitively similar to the process of human associative learning.
Programmable Logic Array (PLA) is an example of a programmable machine universal with respect to the class of combinatorial machines. The general architecture of a PLA is shown in Figure 1.2. The system has programmable AND-array and programmable OR-array that store, respectively, the input and output parts of commands (productions) of a combinatorial machine. The binary vectors stored in these arrays are represented by the conductivities of fuses (1 is "connected", 0 is "not connected").
The input signals, x[*]=(x[1],..x[m]), are m-dimensional binary vectors: x[*]∈X={0,1}^{m} (The '*' substituted for an index denotes the whole set of components corresponding to this index.). Each bit of x[*] is transformed into two bits of vector x'[*] sent to the AND-array: 0→(0,1), and 1→(1,0). This allows one to use 2*m-input AND-gates as match detectors. Unconnected inputs of an AND-gate are equal to 1. Connected inputs are equal to the corresponding components of vector x'[*].
Matrices gx[*][*] and gy[*][*] describe the conductivities of fuses in the AND-array and the OR-array, respectively. It is convenient to think of vectors gx[*][i], and gy[*][i] as data stored in the i-th locations of Input Long-Term Memory (ILTM) and Output LTM (OLTM), respectively.
Using this terminology, the work of a PLA can be described as follows:
The concept of PLA was originally introduced in IBM in 1969 as the concept of a Read-Only Associative Memory (ROAM). The term PLA was coined by Texas Instruments in 1970. (See Pellerin, D., et al, 1991.). In Section 1.4, I will show that there is much similarity between the basic topology of PLA and the topology of some popular associative neural networks (Eliashberg, 1993).
A (deterministic) finite state machine is a system M=(X,Y,S,f_{y},f_{s}), where
The work of machine M is described by the following expressions: s_{ν+1}=f_{s}(x_{ν},s_{ν}), and y_{ν}=f_{y}(x_{ν},s_{ν}), where x_{ν}∈X, y_{ν}∈Y, and s_{ν}∈S are the values of input, output, and state variables at the moment ν, respectively.
Note. There are different equivalent formalizations of the concept of a finite-state machine. The formalization described above is known as a Mealy machine. Another popular formalization is a Moore machine. In a Moore machine the output is described as a function of the next-state. These details are not important for our current purpose.
Practical electronic designers usually use the term state machine instead of the term finite-state machine.
Any finite-state machine can be implemented as a combinatorial machine with a one-step delayed feedback. The result is obvious from the diagram shown in Figure 1.3.
In Figure 1.3, M1 is a combinatorial machine and M is a finite-state machine. The one-step delayed feedback x"_{ν}=y"_{ν-1} makes x"_{ν} the state variable of machine M. Since one can specify any output function of machine M1 one can implement any desired output and next-state functions for the finite-state machine M.
This result can be naturally extrapolated to programmable (learning) machines universal with respect to the class of finite-state machines. A PLA with a one-step delayed feedback gives an example of a programmable machine universal with respect to the class of finite-state machines.
PLA is often used by logic designers to implement state machines (sequencers). It is also possible to use PROM (Programmable-Read-Only-Memory) and RAM (Random Access Memory) to implement state machines with large numbers of commands, but with relatively small width of input vectors. (The input width is limited by the number of address bits.)
This section was not intended to serve as a tutorial on finite-state automata and Turing machines. There are many good books (such as Minsky, 1967) which an interested reader should consult for more information. My goal was to illustrate the general concept of an abstract machine and to connect this concept with the notion of a real machine. The main point to keep in mind is that some useful constraints on real machines can be formulated at a rather general system-theoretical level without dealing with specific implementations. When such general constraints exist, it is silly to try to overcome them by designing "smart" implementations. In the same way as it is silly to try to invent a Perpetual Motion machine, in violation of the energy conservation law.
Let us return to the robot shown in Figure 1.1. A Turing machine is a finite-state machine coupled with an infinite tape. Therefore, to be able to simulate any Turing machine, the robot (system (D,B)) must be a learning system universal with respect to the class of finite state-machines. Taking into account what was said in Section 1.2.8 and assuming that the proprioceptive feedback utter_symbol→symbol_uttered provides a one-step delay, it is sufficient for system AM to be a learning system universal with respect to the class of combinatorial machines.
Such system is not difficult to design. For example, the PLA shown in Figure 1.2 with the addition of a universal data storage procedure (such as that discussed in Section 1.5) solves the problem. This solution, however, is not good enough for our purpose. We want to implement AM as a neurobiologically plausible neural network model. This additional requirement makes our design problem less trivial and more educational. Solving this problem will put us in a right position for attacking our main problem: the problem of working memory and mental computations (Section 1.6).
This section provides neurocomputing background needed for understanding the neural model described in Section 1.4.
The anatomical structure of a typical neuron is shown in Figure 1.4. The diagram depicts the three main parts of a neuron:
A typical axon branches several times. Its final branches, terminal fibers, can reach tens of thousands of other neurons. A terminal fiber ends with a thickening called the terminal button. The point of contact between the axon of one neuron and the surface of another neuron is called synapse.
In most synapses, the axon terminal releases a chemical transmitter that affects protein molecules (receptors) embedded in the postsynaptic membrane. About fifty different neurotransmitters are identified at the present time. A single neuron can secrete several different neurotransmitters. The width of a typical synaptic gap (cleft) is on the order of 200nm. The neurotransmitter crosses this cleft with a small delay on the order of one millisecond.
All synapses are divided into two categories: a) the excitatory synapses that increase the postsynaptic potential of the receiving neuron, and b) the inhibitory synapses that decrease this potential. The typical resting membrane potential is on the order of -70mV. This potential swings somewhere between +30mV and -80mV during the generation of spike.
Not all axons form synapses. Some serve as "garden sprinklers" that release their neurotransmitters in broader areas. Such non-local chemical messages play important role in various phenomena of activation. See Nicholls, et al, (1992). You may also find useful information in the following Web Tutorial.
It is reasonable to postulate the existence of quite complex computational resources at the level of a single neuron (Kandel, 1968, 1979; Nichols, et al. 1992; Hille, 2001; Eliashberg, 1990a, 2003). In what follows, we won't need this single-cell complexity. A rather simple model of a neuron described below is sufficient for our current purpose. (I must emphasize that a single-cell complexity is needed in more complex models.)
A simple concept of a neuron-like computing element is shown in Figure 1.5. The (a) and (b) parts of this figure illustrate two different graphical representations of this model. The graphical notation (b) will be used in Section 1.4.
In the graphical notation shown in Figure 1.5b, the excitatory and inhibitory synapses are represented by small white and black circles, respectively. To illustrate this agreement, Figure 1.5b shows an inhibitory synapse located on the body of the neuron. (The incoming line and the outgoing line can be thought of as the dendrites and the axon, respectively.)
The dynamics of the postsynaptic potential u is described by the first-order differential equation (2). The output signal y, described by Exp (3), is a linear threshold function of u. For the sake of simplicity the threshold is equal to zero.
The sigmoid function shown in Figure 1.12b is often used instead of the linear threshold function. This distinction is not important for our current purpose.
The models we are going to study are too complex for traditional scientific notation. Therefore, I am forced to use some elements of a computer language to represent these models. I want to avoid verbal descriptions unsupported by formalism. Bear with me.
I believe in Herald Morowitz's proposition that "computers are to biology what mathematics is to physics." Trying to avoid computer language and stick with traditional mathematical notation will only prolong one's suffering.
I use a C-like notation assuming that this language is widely known. To be on the safe side, in what follows I explain some of this notation.
I also use the following "pseudo-scientific" notation:
This section presents a model of a three-layer associative neural network (Figure 1.6) that can work as a machine universal with respect to the class of combinatorial machines. The network has a PLA-like architecture, and, in fact, has all the functional possibilities of PLA. The use of "analog" neurons (rather than logic gates) gives the network some "extras." The model displays some effect of generalization by similarity and, because of the mechanism of random choice, can simulate, in principle, any probabilistic combinatorial machine (with rational probabilities). A similar model was described in Eliashberg (1967).
The model integrates the following basic ideas:
Consider the neural network schematically shown in Figure 1.6. The big circles represent neurons. The small white and black circles
denote excitatory an inhibitory synapses, respectively. The incoming and outgoing lines of a neuron represent its dendrites and its axon,
respectively. The network has four sets of neurons:
Notation: I use C-like notation to represent arrays. The '*' substituted for an index denotes the whole set of elements corresponding to this index. In the above description, I should have written N1[*]=(N1[1],...N1[n1]) instead of N1=(N1[1],...N1[n1]), etc. (For the sake of simplicity, I omit "[*]" when such an omission doesn't cause confusion.) In this notation, S21[i][*] is the set of input excitatory synapses of neuron N2[i], and S32[*][i] is the set of output excitatory synapses of this neuron.
Terminology:
This section presents a functional model corresponding to the topological model of Figure 1.6. The same topological model may have many different functional models associated with it, so this is just one of such models.
Notation:
Assumptions:
As mentioned in Section 1.3.3 I use a combination C-like notation and scientific-like notation without subscripts
and superscripts, and with variable names (identifiers) containing more than one character:
The model presented below is described in a C-function-like format, the braces "{ }" indicating the
boundaries of the model. I use C++ style comments "//" to give the model an appearance of a computer program. (It is almost
a computer program.)
Model ANN0()
//The abbreviation "ANN0" stands for "Associative Neural Network #0"
{
//Beginning of model ANN0
//DECODING (similarity calculation)
for(i=1; i<=n2; i++) s[i] = SUM(j=1,n1)( gx[j][i] * x[j] ); // (1)
//CHOICE (competition of neurons N2 via reciprocal inhibition. Expressions (2) and (3))
for(i=1; i<=n2; i++)
{
u[i]=(1-dt/tau)*u[i]+ s[i] - x_inh - beta*SUM(k=1,n2)( r[k] ) + beta*r[i] + noise[i];
// (2) this expression is
//the same as differential equation: tau*du[i]/dt+u[i] = s[i] - x_inh - beta*SUM(k=1,n2)( r[k] ) + beta*r[i] + noise[i];
if( u[i]>0 ) r[i]=u[i];
else r[i]=0;
// (3)
}
//ENCODING (data retrieval.)
for(j=1; j<=n3; j++) y[j] = SUM(i=1,n2)( gy[j][i] * r[i] );
// (4)
}
//End of Model ANN0
Figures 1.7a and 1.7b show two possible implementations of the winner-take-all layer described by Expressions (2) and (3) from section 1.4.2. The topological model of Figure 1.7a can be referred to as inhibit-everyone-but-itself implementation. The topological model of Figure 1.7b can be called inhibit-everyone-and-excite-itself implementation. If alpha=beta, the two functional models corresponding to these two topological models are mathematically equivalent. If the absolute value of the positive feedback gain, alpha, is not equal to the absolute value of the negative feedback gain, beta, the model of Figure 1.7b has slightly richer properties than model of Figure 1.7a.
The model of Figure 1.7a was studied in Eliashberg (1967). The model of Figure 1.7b was studied in Eliashberg (1979). In both cases, the systems of differential equations describing the dynamics of these models have explicit solutions for any number of neurons N2 (parameter n2).
In what follows I describe some properties of Model ANN0 that can be rigorously proved. (In this study I do not present the proof. The proof can be found in Eliashberg (1979 ). Model ANN0 doesn't include the description of a learning procedure, so I assume that any desired matrices gx[*][*] and gy[*][*] can be preprogrammed. The pair (gx[*][*],gy[*][*]) will be called the program of Model ANN0.
The answer is "Yes". Some plausible topological models providing this answer were discussed in Eliashberg (1979).
The goal of this section is to show that some of the well known neural network models have essentially the same PLA-like topology as the network of Figure 1.6. They do not look similar to this network because of the use of "connectionist" notation. Switching to "engineering" (PLA-like) notation reveals the similarity.
Counterpropagation Network (CPN)
Figure 1.8a shows the topological structure of the CPN as it was presented in Hecht-Nielsen (1987). Figure 1.8b displays another representation of this network borrowed from Freeman, et al. (1991). The reader will probably agree that these "connectionists" diagrams are difficult to understand.
A "miracle" is achieved by translating these diagrams into the PLA-like "engineering" notation. The CPN architecture now looks as shown in Figure 1.9.
This architecture is remarkably similar to the architecture of the associative neural network of Figure 1.6.
Adaptive Resonance Theory (ART1) Network
Figures 1.10 and 1.11 demonstrate a similar transformation in the case of the ART1 network. (Grossberg, 1982)
In Figure 1.11, the "bottom-up LTM traces" are similar to the input synaptic matrix S21[*][*] of Figure 1.6 or the AND-array of Figure 1.2.
The "top-down LTM traces" are similar to the output synaptic matrix S32[*][*] of Figure 1.6 or the OR-array of Figure 1.2. The other blocks shown in Figure 1.10 are of no interest for our current discussion. Our main issue is how the LTM can be represented in the brain and how the information stored in this LTM can be accessed and retrieved.
So far we were dealing with a "local" representation of data in the LTM of the network of Figure 1.6: "one neuron - one memory location." In Feldman's terminology this "local" approach is referred to as the "grandmother cell" approach. (This, of course, is just an easy-to-remember metaphor. There is no single cell in the brain representing one's grandmother.)
With beta<1, layer N2 no longer works as a winner-take-all mechanism. Instead, it produces effect of contrasting and selects a set of several (more than one) locations of Output LTM (call it ACTIVE_SET). A supperposition of vectors gy[*][i] (with i∈ ACTIVE_SET) is sent to the output y[*] of Model ANN0. We can no longer treat this model as "local" associative memory.
Let us assume that the output of a neuron is a sigmoid function (Figure 1.12b) of its postsynaptic potential (instead of the linear threshold function used in Model ANN0). Let us also completely turn off reciprocal inhibition by setting beta=0. Model ANN0 becomes a traditional three-layer "connectionist" neural network shown in "connectionist" notation in Figure 1.12a.
In the current paper we are interested only in the "local" case corresponding to beta>1. A "distributed" case (beta<1) becomes important in the models with hierarchical structure of associative memory (Eliashberg, 1979).
This section discusses a discrete-time counterpart of the continuous-time neural model described in the previous section. It is convenient to view this discrete-time system as a programmable look-up table (LUT). Introduction of the states of "residual excitation" (E-states) transforms this model into a look-up table with "dynamical bias" and leads to the concept of a primitive E-machine. (Eliashberg, 1967, 1979, 1981, 1989.)
Let as assume that the input vectors of Model ANN0 are changing step-wise with a time-step ΔT>>tau. Let beta>1, so the layer N2 performs a random winner-take-all choice. Let x_inh provide a periodic inhibition needed to reset the layer after each step. The exact values of parameters are not important for the current discussion.
In the above step-wise mode of operation, the network of Figure 1.6 can be replaced by the programmable "look-up table" schematically shown in Figure 1.13. The functional model presented below is referred to as Model AF0 (Associative Field #0.) The model is described as a composition of the following blocks:
Notation:
As before, I use a C-like notation mixed with scientific-like notation. I use special notation for two important operations: select a set, and randomly select an element from a set.
Note. I want to remind the reader that all models in this study are aimed at humans. For the purpose of computer simulation, it is easy to replace operations ":=" and ":∈" with valid C or C++ functions.
Model AF0()
{ //Beginning of Model AF0
//DECODING
for(i=0;i<n;i++) s[i]=Similarity(x[*],gx[*][i]);
// (1) compare input vector with all vectors in Input LTM
//CHOICE Exprs (2) and (3)
MAXSET:={i : s[i]=max(s[*]) };
// (2) select the set of locations with the maximum value of s[i]
i_read :∈ MAXSET;
// (3) randomly select a winner (i_read) from MAXSET
if(s[i_read] > x_inh) ym[*] = gy[*][i_read]; else ym[*]=NULL;
// (4) ; read output vector
//from the selected location of Output LTM. "NULL" stands for "no signals".
//OUTPUT CENTERS
if(select==0) y[*]=yt[*]; else y[*]=ym[*];
// (5) if( select==0) the output is from teacher, else it
//is read from memory
//LEARNING
if (learning_enabled) {gx[*][i_write]=x[*]; gy[*][i_write]=y[*]; i_write++;}
// (6)
//if learning is enabled record X-sequence and Y-sequence in Input LTM and Output LTM, respectively.
}// End of Model_AF0
Note. Don't be discouraged by the simplicity of the "dumb" learning algorithm described by Exp. (6). Theoretically, it is the most universal and powerful learning procedure possible (it stores all available input and output experience just in case). Practically, it is not too bad because the size of the required memory grows only linearly with time. Since the memory is addressed by content the decision making time doesn't increase much with the increase of the length of the recorded XY-sequence. (Keep in mind that the presently popular "smart" learning algorithms throw away a lot of information available for learning.) It is easy to improve this "dumb" learning algorithm to make it less "memory hungry." The first obvious improvement is "selection by novelty". In the program EROBOT the user can select one of two learning modes: 1) storing all XY-sequence, 2) storing new XY pairs.
We didn't specify similarity function. Any combination of input encoding (set X) and similarity function ( Similarity() ) will work as long as this combination satisfies the following correct decoding condition.
DEFINITION. Let X be the set of allowed values of input variable x[*], and let f:X x X → R
be a function from X x X into the set of real numbers (usually the set of non-negative numbers with some upper limit).
We will say that set X satisfies correct decoding condition with the similarity function f, if
∀ a,b ∈ X if(a!=b) then f(a,a) > f(a,b); (1)
where
EXAMPLE
The work of the "psychological" Model AF0 is much easier to understand than the work of "neurobiological" Model ANN0. Nevertheless, the information processing (psychological) possibilities of Model AF0 are essentially the same as those of Model ANN0 (with beta>1 and with the input signals changing step-wise with the time step much larger than tau). We no longer need to talk about neurons and synapses, and can treat Model AF0 as a programmable look-up table with some effect of "generalization by similarity". It is heuristically important, however, to keep in mind the relationship between Model AF0 and Model ANN0.
In what follows I describe some properties of Model AF0 without a proof. (The proof was given in Eliashberg, 1979.) I assume that the input set X and the Similarity() function are selected in such a way that the correct decoding condition from Section 1.5.2 is satisfied.
Let us use x1[*] as input variable, y2[*] as output variable, and x2[*] as state variable. It is easy to prove that the resulting learning system is universal with respect to the class of probabilistic finite-state machines (with rational probabilities).
It is useful to compare the general structure of the "neurobiological" model ANN0 from Section 1.4.2 with that of the "psychological" model AF0 from Section 1.5.1.
Terminology and notation
General structure of model ANN0
The work of neurobiological model ANN0 can be described in the following general form:
y_{t}=f_{y}(x_{t},u_{t},g_{t}) (1)General structure of model AF0
The work of psychological model AF0 can be described in the following general form:
y_{t}=F_{y}(x_{t},g_{t}) (3)Because of its simplicity, model AF0 has room for development. The most important of such developments is the introduction of "psychological" STM. The term "psychological" means that the duration of this memory must be longer than the psychological time step Δt. The states of such memory are referred to in this study as the states of "residual excitation" or E-states. Let us add E-states to model AF0.
y_{t}=F_{y}(x_{t},e_{t},g_{t}) (5)An interesting possibility of connecting the dynamics of the postulated phenomenological E-states with the statistical dynamics of the conformations of protein molecules in neural membranes is discussed in Eliashberg, (1989, 1990a, 2003).
Here are some possibilities associated with E-states (Eliashberg, 1979).
The general structure of Model AF1 is shown in Figure 1.15. The model includes the following blocks:
Note. The program EROBOT uses a slightly more complex data storage procedure than that described by Exp. (12). To allow the user to erase and reuse parts of robot's memory, the recording is done in the first "empty" location. Non-empty locations are skipped. Also Exp. (6) has a parameter that allows the user to switch between "teacher" mode and "memory" mode.
This section enhances the structure of the cognitive model shown in Figure 1.1 to give the robot an ability to learn to perform mental computations.
Compare the cognitive model shown in Figure 1.16 with the model shown in Figure 1.1. The model of Figure 1.16 has the following
enhancements:
To avoid ambiguity, in what follows I present an explicit description of the work of external system (W,D). This description is referred to as Model WD1.
Inputs The inputs of system (W,D) are the motor outputs of centers NM.
States
Outputs
To get a complete working functional model of the whole cognitive system (W,D,B) shown in Figure 1.16 one needs to connect all blocks as shown in this figure. For simplicity, I do not formally describe these connections assuming that they are sufficiently clear from Figure 1.16.
In this simple case, the descriptions of blocks NM and NS (call them nuclei) were included in the descriptions of blocks AM and AS (Models AF0 and AF1.) In more complex cases it is more convenient to describe nuclei as separate blocks. Note that signals from eye (symbol_read_eye) play the same role for NS as signals from teacher play for NM.
At this point, I also do not explicitly describe timing details associated with coordinated work of blocks. It is easy to solve such timing problems in computer simulation by doing computations associated with different blocks in a right order. Such timing details, however, become critically important when one addresses the problem of "analog neural implementation" of complex E-machines composed of several primitive E-machines, and nuclei, and including various feedback loops.
This very interesting and complex neurodynamical problem will not be discussed in this paper. There is a vast unexplored world of sophisticated neurodynamical problems to fight with. The best I hope to achieve in this paper is just to show where to "dig."
The best way to understand how the robot of Figure 1.16 works is to experiment with the program EROBOT. In this section, I assume that you have acquired this program, and have it running on your computer.
To get the program go to software.
When you run the program for the first time, four windows are shown on the screen. The windows are resizable so you can rearrange them to your liking. If you want to preserve this new arrangement go to File menu and select Save as default item. Next time the program will start with your arrangement. For now leave the windows as they are displayed.
The windows have the following titles:
EROBOT.EXE This is the program window. This window has the menu bar with the following menu titles: File, Examples and Help. File menu allows you to save and load projects. Examples menu has five examples that demonstrate how the program works and help you learn the user interface. The interface is simple and intuitive, so not much help is needed to master it.
WORKING MEMORY AND MENTAL IMAGERY (AS and NS) This window corresponds to the Associative Filed AS, and to
the sensory nucleus NS of Figure 1.16. This Associative Field (primitive E-machine) forms MS→S associations and
learns to simulate the external system (W,D). The long table on the right displays the contents of the Input and Output LTM of
AS. The shorter table on the left displays the input and output signals. You can edit the names of these signals by
clicking on these names. The upper control from:
tape memory determines where the input signals are
coming from. The lower control learn:
all new none
switches the learning mode. Click on the desired mode to activate it. The red color corresponds to the active mode. Symbols can be
entered in the right table and the table can be scrolled. Click the left mouse button in a square to place the yellow cursor in this
square. You can now enter the desired character from the keyboard. To empty a square press the space bar. Try Backspace and
Del keys to see how they work. The table can store up to 1000 associations. To scroll the table toward higher addresses
press the F12 key or move the yellow cursor by pressing the → key. To scroll toward lower addresses press the
F11 key or move the yellow cursor by pressing the ← key. Press the End key to go to the end of the table
and the Home key to return to the beginning of the table.
Note. The keys work only when the window is selected by clicking the left button inside the table.
MOTOR CONTROL (AM and NM) This window corresponds to the Associative Field AM and the motor nucleus NM of Figure 1.16. This Associative Field forms SM→M associations and learns to simulate the teacher. Editing and scrolling functions, and from and learn controls are similar to those of AS window. When from control is in teacher position the user can enter symbols in the lower three squares (below the red line) of the right column of the input/output table. Click the left mouse button in one of these squares to position the yellow cursor in this square. You can now enter a symbol from the keyboard. When from control is in memory position you cannot enter symbols in these squares, because these "motor" symbols are read from memory. You can force the robot's motor reactions only in teacher mode.
EXTERNAL SYSTEM (W,D): Tape, Eye, Hand, and Speech organ This window corresponds to the external system (W,D) of Figure 1.16. It displays the current state of the tape (the top white row) and the tape history (the gray area below the current tape). The tape can be up to 1000 squares long. The tape history stores 199 previous states of the tape, so you can trace the performance of your Turing machine during the last 200 steps. To edit the tape click left mouse button on the desired square. The yellow cursor is positioned in this square. You can now enter a symbol from the keyboard. The green cursor represents the scanned square. To position this cursor click the right mouse button in the desired square. The yellow cursor can be positioned in the history area, but only the current (white) tape can be edited. To scroll the tape left and right press F12 and F11 keys, respectively, or move the yellow cursor by pressing the → or ← keys. To scroll the history table up and down press the PgUp and PgDn keys or move the yellow cursor up and down by pressing the ↑ and the ↓ keys. The Home key returns the user to the beginning of the current tape. The End key displays the end of the tape. The leftmost column displays discrete time (the step number). The next columns display the command (SM→M association) executed by block AM at this step.
The buttons in AS and AM windows perform the following functions.
The buttons Clr S,E in blocks AS and AM clear the Similarity, s[*] array and E-state, e[*], array displayed in the bottom right part of the window. The s[winner] is displayed in read, s[i] for other locations is displayed in green, and e[i] is displayed in magenta. (In this model the bias in block AM is turned off, so there is no E-state display.) The control tau = 50 displays the time constant of decay for e[i] in block AS. This value can be changed by clicking on the number.
The buttons Clr G in blocks AS and AM clear the right table displaying the associations stored in LTM.
The button Clr TH in AM window clears the Tape History in the (W,D) window. The button Clr T clears the current tape (the upper white row) in (W,D).
The Init and Step buttons in AM window control the work of robot. Pressing the Init button performs a one step of computations without affecting the state of tape and without incrementing time. This button is used to put the robot in the desired initial state. Pressing the F2 key produces the same effect. Pressing the Step button or the F1 key performs a complete cycle of one-step computations. The state of tape and the time are changed, and the tape history is scrolled.
To understand the display of s[i] and e[i] in the lower right part of AS and AM windows you should know how these values are calculated. The similarity, s[i], is computed as follows
s[i]=SUM(j=0,nx-1)(truth(x[j]==gx[j][i] && x[j]!=' ')/SUM(j=0,nx-1)(truth(x[j]!=' ')
where truth(TRUE)=1 and truth(FALSE)=0. The blank (' ') character represents "no signals." If the SUM in the denominator is equal to zero, then s[i]=0. It is easy to see that the maximum value of similarity is 1. Many other Similarity() functions would work as well.
Model AF1 described in Section 1.5.6 is used as AS and AM. In the case of AS, the "multiplicative" biasing coefficient bm=0.5 and the "additive" biasing coefficient ba=0.0 (see Exp. (2) of Section 1.5.6). In the case of AM bm=ba=0 (no bias). Accordingly, the value of time constant tau is needed only in block AS and the E-state front, e[*] is displayed only in this block (magenta).
Go to Examples menu and select Example 1. A set of 12 commands representing a Turing machine is loaded in the LTM of block AM. This Turing machine is a parentheses checker similar to that described in Minsky (1967). The tape shows a parentheses expression that this Turing machine will check.
Symbols A on both sides of the parentheses expression serve as the delimiters indicating the expression boundaries. The green cursor indicating the scanned square is in square 1. Note that symbol_uttered='0' showing that the Turing machine has initial "state of mind" represented by symbol '0'.
Push Step button or F1 key to see how this machine works. The machine reaches the Halt state 'H' and writes symbol 'T' on tape indicating that the checked parentheses expression was correct: for each left parenthesis there was a matching right parenthesis.
Experiment with this program. Enter a new parentheses expression. Click the left button in the desired square to place the yellow cursor in this square and type a parenthesis. Don't forget to place symbols A on both sides of the expression. Click the right button in square 1 to position the green cursor in this square. If the yellow cursor is also in this square the square will become blue.
To put system in initial state '0' (symbol_uttered='0') go to block AM and click on teacher. Position yellow cursor in the square on the right from the name utter_symbol and enter '0' in this square. Note that if you are in memory mode you cannot enter the symbol. Once the symbol is entered press Init button or F2 key. The initial state is set, that is, symbol_uttered='0'. Return to memory mode and press Step button or F1 key to do another round of computations.
Note. Block AS must be in tape mode indicating that the robot can see the tape. This block will be in memory mode in Example 2 where the robot performs mental computations.
Write down all twelve commands of the parentheses checker and clear LTM of block AM by pressing Clr G button. In the following experiment you will teach the robot by entering the output parts of commands (you wrote down) in response to the input parts of these commands. The input parts are displayed in the two upper squares of the XY-column of block AM. Block AM must be in from teacher mode, and block AS in tape mode.
To teach the robot all twelve commands of the parentheses checker it is sufficient to use the following three training examples: A(A, A)A, and A()A . Write the first expression on tape, place the green cursor in square 1, set utter_symbol='0' and press Init button. Put block AM in learn new mode and start pushing Step button or F1 key. See how the new commands (associations) are recorded in LTM of block AM.
Repeat the same teaching experiment with other two training examples. If you did everything correctly, all twelve commands are now in LTM. The robot can now perform parentheses checker algorithm with any parentheses expression.
Go to Examples menu and select Example 2. You can see that both blocks AM and AS have some programs in their LTM's. In the next section (Example 3) I will explain how the program in block AS was created as a result of learning. For now it is sufficient to mention that this program allows this block to simulate the work of external system (W,D) with tape containing up to ten squares, and with external alphabet {'A','(',')','X','T','F'}.
This read/write working memory has limited duration depending on the time constant tau. The bigger this time constant the longer the memory lasts. A qualitative theory of this effect was described in Eliashberg (1979). In this study, it will be discussed in Chapter 4. Now I want to show how this "working memory" works. Interestingly enough, block AS no longer needs to use its LTM. The effect of read/write working memory is achieved without moving symbols, by simply changing the levels of "residual excitation" of the already stored associations.
Block AM has two programs. The program in locations 0-11 is the parentheses checker used in the previous sections. The program in locations 14-26 is a tape scanner. This program starts with the state '3' (symbol_uttered='3'). Block AM is now in memory mode, so the program will run. Block AS is in tape mode, and the robot can see the tape.
Start pressing Step button or F1 key and see how the robot scans the tape. It first scans to the right, reaches the right boundary, and goes back. Once it reaches the left boundary, it moves to square 1, and changes its "state of mind" to '0'. This transfers control to the parentheses checker (the association in location 0).
At this moment close the robot's eye by clicking left button on the word memory in block AS. Make sure this word became red. Continue pressing Step button or F1 key and see how the robot performs mental computations.
Note. Block AS is in learn none mode. It could be in learn all mode. In this case it would keep recording its XY-sequence. The performance would not change. That is, the effect of read/write working memory is combined with the ability to remember everything.
To make it more interesting, I leave it to the reader to figure it out. (Hint. The most recently used association has the highest level of "residual excitation ", e[i], among all competing associations.)
Go to Examples menu and select Example 3. Block AM has two programs in its LTM. The program in locations 0-11 is the program that scans the tape and rewrites it. The program starts in state '8'. In this state (symbol_uttered='8') it rewrites the symbol_read and goes to state '9' (utter_symbol='9'). In state '9' it moves one step to the right, and goes to state '8'.
Run this program by pressing Step button or F1 key to see what the program is doing. Note that block AS is in learn all mode so it records the XY-sequence produced by external system (W,D). This is sufficient to learn to simulate this system with the tape, containing no more than ten squares, and with a single external symbol {'A'}.
Prepare the new tape with symbol '(' in squares 0-9. Switch to from teacher mode and restore initial state '8'. To do so enter utter_symbol='8' and press Init button or F2 key. The symbol_uttered is now equal to '8'. Switch back to from memory mode, and run the program as before. Block AS is now trained to simulate the work of (W,D) with external symbols {'A','('}. Repeat this experiment for the remaining external symbols {')','X','T','F'}. Block AS now has the state of LTM identical to that used in Example 1.
Switch block AS to learn none mode. Write a parentheses expression in squares 0-9 of tape. Don't forget to enter delimiters 'A' on both sides of the expression. The whole expression including delimiters must fit in squares 0-9, because you have taught block AS to simulate the tape only for these squares. Put the green cursor representing the scanned square in square 1 by clicking the right button in this square.
To run the test program in locations 14-25 (this program is the same parentheses checker as before) you need to set initial state '0'. To set this state switch to teacher mode, enter utter_symbol='0', and press Init button or F2 key. Switch back to memory mode and run the program by pressing Step button or F1 key. You are repeating Example 2, but now you trained block AS yourself.
Note. You could train and test block AS in teacher mode by doing what was done by the discussed programs. Or you can write your own training and testing programs in block AM and run them in memory mode.
Select Example 4 in Examples menu. Block AM now has four output channels. The fourth output channel controls closing and opening the robot's eye. ('1' closes the eye and '0' opens it). Run this example by pressing Step button or F1 key. You will see that block AS switches to memory mode once scanning is done and parentheses checking starts. From this moment on the robot performs mental computations.
The interesting thing about this example is that the robot itself decides where to switch from "external world" to "imaginary world." This switching doesn't affect the robot's performance, because its mental imagery predicts the results of the robot's actions in the "external world." To make this effect more obvious, select Example 5 and run it. The robot switches between "external world" and "imaginary world" several times while performing computations.
The cognitive model (W,D,B) implemented by the program EROBOT offers some simplified but nontrivial answers to the following set basic questions associated with the brain.
What is the nature of computational universality of the human brain?
Computational universality of the human brain is a result of interaction of two associative learning systems: a "processor" system, AM, responsible for motor control, and a "memory" system, AS responsible for working memory and mental imagery. I intentionally used the terms "processor" and "memory" to emphasize a similarity with the general "processor-memory" architecture of a traditional universal programmable computer.
The big difference is that, in the discussed model both systems AM and AS are arranged on a similar
principle (primitive E-machines). Both "processor" and "memory" are learning systems that create their software (firmware) almost
completely in the course of learning. Another big difference is that the "processor" is switching back and forth between "real" world
(which serves as external "memory") and "imaginary world" which serves as internal "memory."
Remark. If we identify our "I" with "processor" part, we can say that our "I" is wondering in an infinite loop switching
between real and imaginary worlds. The notion of this infinite loop allows our theory to get rid of the notion of "homunculus." The
need for a "homunculus" is a result of the lack of universality. If a model of B(0) can learn, in principle, to do anything,
then a cognitive theory associated with such model doesn't need "homunculus."
It is reasonable to suggest that the SM→M associative learning system, AM is located mostly in frontal lobe
(see the picture below), whereas the MS→S associative learning system, AS, is located mostly in other three lobes.
Note. It should be emphasized that it is not necessary to identify "functional systems" AM and AS with specific
anatomical structures.
If you want to learn more about the whole brain go to these websites (don't forget to return): Tutorial; Brain atlas; Cranial nerves.
How can a person using an external memory device learn to perform, in principle, any algorithm?
It is sufficient to memorize SM→M and MS→S associations produced by performing several examples of computations. In the universal learning architecture (AM,AS) these associations can serve as software that allows this system to perform the demonstrated algorithm with any input data.
Note. To get the effect of universal programming via associative learning we needed to dedicate a "free" motor channel for the representation of internal states of the finite-state part of simulated Turing machine. In EROBOT this was achieved via "speech" motor channel. This metaphor sheds light on one important role of language. A brain without language cannot achieve the highest level of computing power. Another interesting implication of the EROBOT metaphor is that any sufficiently expressive "free" motor channel can serve as language channel. (Some possibilities of the metaphor the brain as an E-machine with respect to the problem of natural language were discussed in Eliashberg, 1979).Why does a person performing computations with the use of an external memory device learn to perform similar mental computations?
The experiment discussed in Section 1.7.7 provides an answer to this question. While "processor" AM interacts with external system (W,D) the "memory" system AS learns the MS→S associations allowing it to simulate the external system (W,D). Once AS is trained, EROBOT can perform mental computations by switching to the "imaginary" system (W,D).
What is the simplest universal learning algorithm consistent with questions 1-3?
The simplest universal learning algorithm consistent with questions 1-3 is memorizing all XY-experience. As EROBOT shows, this "dumb" algorithm is not too bad when combined with associative memory in which data is addressed in parallel. The time of DECODING doesn't depend on the size of memory (parameter n2 in the neural network of Figure 1.6). The time of CHOICE and ENCODING is not worse than log(n2). As mentioned in Section 1.5.1, there are many ways to make this algorithm less "memory hungry" and more efficient.
Note. A "smart" learning algorithm (such as, for example, backpropagation) is not universal. It optimizes performance in a given context and throws away information needed in a large number of other contexts. I argue that a learning algorithm of this type cannot be employed by the human brain.
What is working memory and mental imagery?
The effect of a read/write symbolic working memory is achieved in EROBOT via "dynamical" E-state without moving symbols in "symbolic" LTM. That is, working memory is not a read/write memory buffer. It is one of many effects associated with dynamic reconfiguration of data stored in LTM. The metaphor the brain as an E-machine suggests that such a dynamic reconfiguration is associated with the postulated phenomenological E-states. Once the effect of working memory is produced, system AS can work as an "imaginary" external system (W,D). This explains the nature and the functional importance of mental imagery. (There is nothing "imaginary" about mental imagery.)
What is the simplest neural implementation of a model of B(0) consistent with all the above questions?
The neural network of Figure 1.6 gives some answer to this question. A connection between phenomenological E-states and the statistical conformational dynamics of ensembles of protein molecules (such as ion channels) is discussed in Eliashberg (1990a, 2003).
1. Introduction of E-states in the Motor Control System (AM)
This enhancement produces several critically important effects mentioned in Section 1.5.
2. Introduction of "centers of emotion" and activating system
Besides SM→M and MS→S associations employed in EROBOT, the human brain forms associations of other modalities, the most important of which is "emotional" modality. We remember and recognize our emotional states. This means that our brain has some recognizable (encoded) signals that carry information about these states. I use letter "H" to denote "emotional" modality. ("H" stands for "Hedonic". The letter "E" is already used in the name "E-machine".)
Let us assume that besides associative learning systems AM and AS responsible, respectively, for motor control, and mental imagery, there is also an associative learning system, call it AH, responsible for motivation. System AH forms SH→H associations that serve as "motivational software." (At this general level, I treat "pain" modality as one of "S" modalities.)
This approach can produce much more sophisticated effect of "reward" and "punishment" than is available in the traditional models of reinforcement learning. In the latter models the effect of reinforcement is limited to the modification of SM→M associations.
In the (AM,AS,AH) architecture, SH→H associations can interact with SM→M associations via an activating system, the effect of this activation depending on sensory inputs and temporal context. This creates a very complex and interesting situation.
3. Introduction of hierarchical structure
Primitive E-machines slightly more complex than Model AF1 described in Section 1.5.6 can be arranged in hierarchical structures. The corresponding complex E-machines can produce various effects of data compression, statistical filtering, and generalization. Some effects of this type were demonstrated in Eliashberg (1979).
4. Introduction of "BIOS"
The biggest advantage of the "whole-brain" metaphor "the brain as an E-machine" as compared with traditional "partial" brain modeling metaphors is that the former metaphor has room for arbitrarily complex "brain software." Traditional learning algorithms do create some "neural firmware." This firmware, however, can be compared with the firmware of Programmable Logic Devices (PLD) rather than with the software of a universal programmable computer.
I believe that this "initial brain's firmware" is the longest part of description of the untrained human brain, B(0). Assuming that it is true that B(0) fits into a single floppy disk, there is still enough room to have a rather complex "BIOS." This "BIOS" should include initial "motivational software" (SH→H associations) that determines the direction of learning, and some initial "motor software" (SM→M associations). It may also have some initial software representing initial knowledge about W (MS→S associations).