1. System (Man,World) as a composition of two "machines"Consider a cognitive system (W,D,B), where W is an external world, D is a set of human-like sensory and motor devices, and B is a hypothetical computing system simulating the work of the human nervous system. One can think of system (D,B) as a human-like robot.
From the system-theoretical viewpoint, it is useful to divide system (W,D,B) into two subsystems: (W,D) and B, where (W,D) is the external world as it appears to the brain B via devices D. In this representation, both subsystems can be treated as abstract "machines", the inputs of B being the outputs of (W,D) and vice versa. (For the sake of simplicity, I refer to B as the brain. At this general level, the rest of the nervous system can be treated as a part of block D.)
Let B(t) denote the state of B at time t, where t=0 corresponds to the beginning of learning. The project aimed at "reverse engineering" the basic principles of organization and functioning of B(0) will be referred to as the Brain Zero or the Brain 0 project.
Note. In the case of the biological brain, it is difficult to distinguish the stage of learning from the stage of development. This makes the notion of the "beginning of learning" somewhat vague. Nevertheless, in a zero-approximation model of B, it is convenient to think of t=0 as a point separating learning from development. (In such a model, the stage of development corresponds to t < 0).
2. "Unprogrammed" brain can have a relatively short formal representation
The Brain 0 project is motivated by the following general beliefs:
3. How can a "simple" Brain 0 produce complex behavior?
I argue that an adequate theory of behavior of system (W,D,B(0)) can (and should) have the general structure somewhat similar to that of a traditional physical theory. The latter theory derives arbitrarily complex implications from the basic equations that describe fundamental principles (basic constraints) coupled with arbitrarily complex specific "external" constraints.
For example, the same simple Maxwell equations (basic constraints) coupled with different boundary conditions and sources (specific external constraints) describe arbitrarily complex classical electromagnetic phenomena. The complexity comes "from the outside." In a similar, but much more sophisticated way, arbitrarily complex psychological phenomena should follow from a relatively simple model of B(0). Such model doesn't need to be overly complex. It only needs to be complex enough to start an efficient process of learning, the ability to learn rapidly increasing due to learning.
4. Useful computer metaphor
The problem of understanding B(0) can be loosely compared with the problem of understanding a conventional computer's hardware and its initial firmware (BIOS). The problem of understanding B(t), with big t, can be loosely compared with the problem of understanding a conventional computer with complex software.
There is no meaningful computer metaphor for characterizing the process of human associative learning, the latter being intuitively quite different from the process of conventional computer programming. I argue, however, that learning by forming associations has the same possibilities as conventional programming, so there are no principle limitations on what can be learnt. (See Section 8)
Note. There is a big difference between the basic principles of organization of the brain and those of a conventional computer, so it is important not to stretch the above computer metaphor too far. I argue, however, that it is also important not to go to another extreme and reject the whole notion of a "software-driven" sequential computing process. (See the falling-maple-leaf metaphor.)
5. G-states and E-states: "symbolic" knowledge and "dynamical" contextI postulate that the brain has two main types of states:
The basic idea of a context-sensitive dynamically-reconfigurable associative learning system employing the above interaction of "symbolic" G-states and "dynamical" E-states was originally introduced in Eliashberg (1967). The systems discussed in that paper were called "Associative Automata." The idea was further developed in Eliashberg (1979) as the "concept of E-machine." The letter "E" was meant to emphasize a connection between this class of brain models and the old neurophysiological notion of "residual Excitation" in the brain as the mechanism of mental set. (See Vvedensky, 1901.)
Download the following articles (if you haven't done that before):
6. How does the brain handle a combinatorial number of contexts?
In a conventional computer, changing context means saving the contents of a few registers (and some other memory locations) and jumping to a new subroutine. The number of such contexts is a linear function of the size of computer's software. This computer metaphor doesn't match our intuitive notion of context in the case of the brain.
Our intuition suggests that the brain is capable of dealing with a much greater (exponential) number of contexts (see Zopf Jr. 1961). The metaphor the brain as an E-machine provides a powerful formalization of this intuition by connecting the notion of context with the idea of dynamic reconfiguration of brain's "symbolic" knowledge (G-states) via transformations of its "dynamical" E-states.
This formalization leads to a combinatorial number of different contexts and explains why it is impossible to adequately simulate human context-dependent behavior by dealing independently with different contexts. An attempt to do so is immediately punished by the "curse of dimensionality." The brain doesn't simply switches available "subroutines" to deal with different contexts. It dynamically synthesizes new "subroutines" depending on context. This makes all the difference in the world.
7. What constitutes the brain's software?
I postulate that the brain forms at least three general types of associations (that serve as its software):
Note. Notation SM→M implies a motor feedback. That is, the system receives signals of "Sensory" and "Motor" modalities on its input, and produces signals of "Motor" modality on its output. Similarly, notations MS→S and SH→H imply Sensory and Hedonic feedback, respectively. (We can recognize our emotional states, so there must be some input signals encoding these states.)
8. What is working memory and mental imagery?
Program EROBOT simulates a simple cognitive system (W,D,B) in which the robot (D,B) learns to perform mental computations by forming SM→M and MS→S associations. The robot interacts with an external world, W, represented by tape divided into squares (similar to the tape of Turing's machine). The robot's sensory and motor devices, D, allow it to perform, in principle, any algorithm by simulating the work of a Turing machine. The robot simulates the internal states of the Turing machine by "talking to itself."
At the stage of training, the teacher forces the robot, by "clamping" its motor centers, to perform several examples of an algorithm with different input data presented on tape. Two results are achieved:
The model provides a simple illustration of the concept of E-machine and explains how the effect of "symbolic" read/write memory (working memory) can be achieved via transformations of "dynamical" E-states without actually moving symbols. It also sheds light on the nature of universality of B(0) as a learning system. It shows that the brain's universality (in Turing's sense) can be naturally understood as a result of interaction of two associative learning systems: SM→M system (called AM) responsible for motor control, and MS→S system (called AS) responsible for mental imagery (simulation of the external system (W,D)). Read this article:
9. How can the brain process symbolic information without moving symbols in memory?
Von Neumann computers and universal Turing machines process information by moving encoded data in a read/write memory. In contrast, the metaphor the "brain as an E-machine" suggests that the brain achieves a similar effect of universal computing by manipulating "dynamical" E-states, the encoded data staying in the same locations in LTM. This approach is critically important because the brain is too slow to process symbolic information in a conventional way.
Interestingly enough, this "high level" approach sheds light on the information processing significance of such "low level" effects as neuromodulation (see Hille , 2001) by connecting these effects with the dynamics of phenomenological E-states.
10. Molecular implementation of E-states and working memory
I argue that traditional models of artificial neurons don't have enough computational resources to implement nontrivial models of B(0). Most importantly, they have no room for various phenomena of neuromodulation. The well known theoretical fact that any computing system can be built, in principle, from such simple artificial neurons doesn't resolve this issue. There are not enough neurons in the brain to implement the needed computations with the use of the above simple neurons.
I postulate that, in the biological brain, the required computational resources can be found at the level of ensembles of protein molecules embedded in membranes of individual neurons (rather than at the level of collective dynamical properties of neural networks).
If you haven't dowloaded the following articles, download them now:
The proposed formalism (referred to as the EPMM formalism, where EPMM stands for Ensemble of Probabilistic Molecular Machines ) offers a nontrivial interpretation of the above-mentioned phenomenological E-states. This interpretation means that a broad range of psychological phenomena of working memory, temporal association, and mental set can be formally connected with the properties of EPMMs.
11 Pitfall of a "smart" learning algorithm
A large part of today's research in learning is devoted to the development and study of what can be referred to as "smart" learning algorithms. Such algorithms attempt to create "optimal" representations of the learner's experience in the learner's memory. I argue that this general approach (whatever interesting and important from the engineering and mathematical viewpoints) cannot be employed by a universal learning system similar to the human brain. The catch is that a smart learning algorithm aimed at a "single-context" optimization is not universal. While optimizing performance in a selected context, it throws away a lot of information needed in a variety of other contexts.
Consider, for example, the question: "What is this?" In the context created by this question a person will behave as a pattern classifier. He/she may answer, for example, that this is a book. His/her brain was able to distinguish a book from other objects, say, a box, a hard disk drive, etc. Now consider the instruction: "take this." In this context, it is no longer important that "this" is a book. What is important is the object's size, weight, etc. The experience that the brain acquired by learning to take a book is applicable to the situation when one needs to take a box or a disk drive. In other words, the same object can be treated as a member of different classes depending on context. That is, in the case of the human brain, there is no such thing as the optimal context-independent classification.
The main issue is not "how" to learn (Hebbian learning, backpropagation, simulated annealing, etc.), and how to store data (distributed, local, synaptic, optical, etc.) but "what" to learn. The human concepts of "good", "bad", "important", and "unimportant" change with experience. Therefore, a "smart" learning algorithm with a fixed criterion of optimality -- the criterion that is not affected by the contents (semantics) of data -- cannot serve as an adequate metaphor for human learning. What seems unimportant today may become important tomorrow when new information is acquired.
I argue that a really smart universal learning system must use a "dumb" but universal learning algorithm. Instead of doing much pre-processing of data before placing it in memory, such system must use an efficient decision-making (data interpretation) procedure to process "raw experience" dynamically depending on context. Theoretically, a powerful enough decision-making procedure can always make up for a "dumb" learning algorithm as long as this algorithm doesn't lose data. In contrast, no decision-making procedure can make up for a "smart" learning algorithm that throws away a lot of information. The loss of data is irremediable.
The program EROBOT mentioned in Section 8 provides some insight into this problem. The model shows that it is
possible to learn to do, in principle, any mental computations by simply memorizing "raw" SM->M and MS->S associations.
12. Pitfall of a "pure phenomenology:" underestimating the power of basic mechanisms
The problem of information processing in the human brain is a "physical" problem. The brain is designed by Mother Nature -- not by human system engineers. We, humans, intentionally make our systems easier to understand, manufacture, test and debug. This costs us extra resources. In contrast, Mother Nature tends to solve its design problems with minimum resources. This makes Her designs look very clever. It also makes them difficult to understand. In such minimum-resource designs, different functions are necessarily strongly intertwined and cannot be easily structured as independent blocks.
The example of physics warns us that one should not underestimate the power of clever integration of the simple basic mechanisms of Mother Nature. I argue that this warning is relevant to the problem of reverse engineering B(0). Recall what was said in Sections 2 and 3 about the existence of a "simple" model for B(0). Sometimes, an adequate formalization, extrapolation and integration of a set of simple basic mechanisms produces a "critical mass" effect. The introduction of the so-called "displacement current" in the Maxwell equations gives a classical example of this interesting phenomenon. All of a sudden, this simple addition to the set of known basic laws of electricity and magnetism, allowed J.C. Maxwell to create his famous equations that cover the whole range of arbitrarily complex classical electromagnetic phenomena.
To understand the pitfall of a "pure phenomenology" consider the following gedanken experiment. Imagine a physicist who wants to simulate the behavior of electromagnetic field in a complex microwave device, say, the Stanford Linear Accelerator (SLAC). Assume that this physicist doesn't know about the existence of the Maxwell equations and, even more importantly, doesn't believe that the complex behavior he observes may have something to do with such simple equations. (In the AI jargon this physicist would be called "scruffy." If he believed in the existence of the basic equations he would be called "neat.")
So this "scruffy physicist" sets out to do a purely phenomenological computer simulation of the observed complex behavior per se. Anyone who was involved in a computer simulation of the behavior of electromagnetic field in a linear accelerator can easily predict the results of this gedanken experiment. (I spent a few years doing such a simulation as a computational physicist at the medical division of Varian Associates.)
In the best case scenario, the above mentioned scruffy physicist comes up with a computer program (with a large number of empirical parameters) that would be able to simulate the behavior of electromagnetic field in a very narrow range. This computer program would have no extrapolating power and would not be accepted by the SLAC community as a theory of a linear accelerator.
Note that it would be impossible to reverse engineer the Maxwell equations (a metaphorical counterpart of B(0)) from the analysis of the behavior of electromagnetic field in such a complex "external world" as SLAC. I argue that, similarly, it is impossible to reverse engineer B(0) from the analysis of such complex cognitive phenomena in system (W,D,B(t)) as playing chess, solving complex mathematical problems, story telling, etc.
Download Eliashberg, V. (1981). The concept of E-machine: On brain hardware and the algorithms of thinking. Proceedings of the Third Annual Meeting of the Cognitive Science Society, U.C. Berkeley 289-291. .pdf file if you haven't done it before.
In some cases, the whole can be easier to describe and understand than its parts. The well known relationship between a complex analytical function and its real and imaginary parts gives an illustration of this phenomenon. As another example, consider the relationship between the behavior of the "whole" electromagnetic field and the behaviors of the electric and magnetic components of this field.
To get a more relevant example consider a mapping between the input and the output sequences of a state machine. Let us limit the length of the sequences to a finite number N. Theoretically, this finite mapping can be represented by a combinatorial machine, without introducing an additional state variable. The price for this reduction of dimensionality is the combinatorial explosion of the size of this inadequate representation. For N → ∞, it becomes theoretically impossible to represent the behavior of a state machine as a behavior of a combinatorial machine.
The same curse of dimensionality punishes an attempt to represent the behavior of a Turing machine with a finite tape as a behavior of a state machine. A tape with 100 squares and two symbols has 2 100 states. Again, if the length of tape goes to infinity it becomes theoretically impossible to represent the behavior of a Turing machine as a behavior of a state machine.
A similar thing happens when one tries to describe the behavior of an E-machine with different "mental sets" without introducing E-states. One is immediately penalized by the combinatorial explosion of the number of such mental sets (contexts).
This pitfall of a wrong dimensionality can be formulated as the following "Catch 22": An attempt to artificially reduce the dimensionality of a cognitive problem to simplify this problem makes the problem more difficult (unsolvable).
I argue that the curse of dimensionality that plagues traditional information processing cognitive theories is a natural penalty for the attempts to represent insufficiently large parts of brain's performance in a state space of insufficiently large dimensionality.
14. Keyword is system integration
The main thrust of the Brain 0 project is system integration. There exists a large and rapidly growing number of publications devoted to what can be loosely referred to as computational models of the "parts" of the brain and/or the "parts" of the brain's performance. Important and interesting as they are, such "partial" models leave open the critical question as to how (and if) such hypothetical parts can be integrated into a single computing system with the basic information processing characteristics similar to those of the whole human brain. The catch is that not all parts fit the whole. In real life, it is seldom possible to reverse engineer parts of a complex system without having a good working hypothesis as to how this system is organized and what it is doing as a whole. Read the story about the blind men and the elephant.
15. What should we do to reverse engineer B(0)?
In Section 3, I argued that the role of B(0) in a cognitive theory can be compared with the role of the Maxwell equations in the classical electrodynamics. That is, it is impossible to have an adequate information processing cognitive theory without having an adequate computational model for the brain. This means that reverse engineering B(0) should be the main goal of brain modeling.
Unlike the case of the Maxwell equations, it is clearly impossible to arrive at an adequate model for B(0) at once. Therefore, it is important to adopt a right strategy for developing such a model.
For many years, I've been using an iterative reverse engineering strategy that can be loosely described as the following algorithm. (The real reverse engineering process is never so straightforward, so this algorithm is just a useful metaphor.)
Step 0 Formulate a set of questions addressing some fundamental properties of B(0) as a computing system. Invent an initial approximation for B(0), call it B0(0), that seems to answer these questions. Study B0(0) to show that it does answer the above mentioned set of questions.
Step 1 Expand the set of questions and improve the current model, call it Bi(0), to answer this expanded set of questions.
Step 2 Study Bi(0) to see if it indeed answers the current set of questions. If not all questions can be answered, try to adjust the model to answer all the questions.
Step 3 If step 2 is a success, go to step 1. Else, go to step 0 and try to invent a better initial approximation.
As in the case of any iterative process, the most important thing is to start with a right initial approximation. One needs to invent such an initial model for B(0) in whatever possible way. It can be just a stroke of luck. The loop 1-2-3-1 will show whether this initial approximation is good or bad.
If the initial approximation for B(0), invented at step 0, is "supercritical" (has a sufficient initial level of system integration) one will get a "converging" iterative process . The more one will enhance such an initial approximation the more possibilities for its further enhancements one will find -- more and more experimental data will be available for formulating the right questions at Step 1.
In contrast, if the initial approximation is "subcritical" (has a sub-threshold level of integration), one will get a "diverging" iterative process. The more one will try to improve such an inadequate initial approximation the more discrepancies one will see. Soon it will become obvious that this initial approximation doesn't work.
Using this terminology, the metaphor "the brain as an E-machine" (see Section 5) can be presented as a candidate for a supercritical initial approximation for B(0). It has been my experience, that the more one tries to falsify this metaphor, the more new possibilities for its further enchancement one finds. The "aha" feeling that led me to this initial approhimation dates back to October 1966. The initial approximation was first described in Eliashberg (1967).
16. On the brain, the mind, and intelligence
The problem of reverse engineering B(0) must not be confused with the problem of understanding the behavior of the cognitive system (W,D,B(0)), more so, system (W,D,B(t)) with big t. The first problem is complex but finite and solvable. The second problem is infinite as is the external world W. Given an adequate model for B(0), we will always be able to improve our understanding of the behavior of system (W,D,B(t)), for bigger and bigger t, but will never be able to reach a "complete understanding." (The last thing we want is to completely understand ourselves. Fortunately, it is impossible, in principle.)
It is important to realize that the human mind and human intelligence are not the properties of the human brain alone, but the properties of the whole system (W,D,B(t)), where W includes other people. As the generations of people come and go, our world becomes more and more complex. New brains B(0) are exposed to this more complex world and become more complex systems B(t). The whole system (W,D,B(t)) is developing, not just the brain. A lot of "external brain software" is created and stored in W. We are all parts of the giant system (PEOPLE,WORLD), and we cannot separate ourselves from this system. I argue that the theories trying to "extract" human intelligence from the properties of the human brain alone miss the main point.
It is the infinite world, W, where the main mystery of the human mind is. If we ever create really intelligent robots -- this is possible, in principle, since system (D,B(0)) has a finite formal representation -- we will create the mystery of the robot's mind. The robot will become intelligent in the human sense only if it is exposed to the human world W.