Analog Neural Networks Analog : Computational Power

The urge to understand the computational capabilities of neural network models has attracted many researchers ever since the introduction of the first neural model as a Boolean discrete device. Marvin Minsky summarized the state of the art in his 1967 book: “Every finite-state machine is equivalent to, and can be simulated by, some neural networks [of finite size].” Marking the 25th anniversary of the ACM in 1972, Michael Arbib proclaimed that the most important challenge for automata theory “in the next 25 years [was] to adequately treat those problems of neurophysiology and psychology which provided the main impetus to its original growth,” because “it is not automata theory as the study of finite state acceptors… and Turing machines which will help us understand the brain.” Arbib urged researchers to “move towards more realistic neurons with much greater complexity.” Progress in this direction has started materializing with the newer analog neural models of the late 1980’s.

While theoretical computer science has historically rested upon the hitherto unquestioned assumption that computers are discrete, static machines (as in the classical Turing model), the nervous system, having 1014 synaptic connections that adapt with experience, should not be conceived as a static algorithm, and the chemical and physical processes affecting the neuronal states are not discrete. The new analog networks have captured these two properties. In the most general terms, analog neural models consist of assemblies of simple processors, or “neurons,” operating in parallel, where each computes the continuous scalar activation function of its input, and affects its neighbors in proportion to the adaptable “weight” number associated with the directed link between them. Equivalently are spiking models where the inter-spike time intervals are analog, and the probabilistic and asynchronous models which rely on continuous probabilistic values. Without any adequate theory, analyzing the computational power of analog networks seemed profoundly difficult.

Turing Machine Equivalence.

In my Ph.D. work (1993), together with my advisor, Eduardo Sontag, I proposed a mathematically simple, but biologically relevant, model of a finite-size circuit of continuous neurons, called the Analog Recurrent Neural Network (ARNN). The network is recurrent by allowing loops and cycles, thus giving rise to dynamical systems, long-term memory, and computation. This model is conceptually different from the classical computer. (A neuron is both a processor and a memory, and these two abstractions can not be separated; the values in all neurons change at every computational step like a dynamical system; the neural value domain is analog/continuous rather than discrete; local operations are all continuous and no comparisons between neural values are allowed.) Because of the fundamental differences, there is no a priori reason to assume equivalence between classical computers and analog neural networks.

We nonetheless quantified the dynamical and computational properties of the analog network, and proved that it is computationally equivalent to the Turing machine. In Simon Haykin’s authoritative textbook Neural Networks: A Comprehensive Foundation, he states: “In a generic sense, the computational power of a recurrent neural network is embodied in two main theorems”, and then describes in the following pages the original Turing equivalence of analog recurrent networks and the theorem on NARX networks, a model useful in engineering . With Joe Kilian, I also proved the Turing equivalence of sigmoidal networks, which are the underlying model used in adaptive neural algorithms. With Margenstern we reduced the size of the universal network.

In related work (1987), Pollack suggested that a rather complex high-order structure is required to achieve computational power beyond that of a finite automaton, even in analog neurons. Pollack’s thoughtful conjecture was widely accepted by the neural network community, and frequently cited as the motivation for the use of high-order networks in the application and design of neural networks. Many theoretical articles in neural computation were influenced by Pollack’s conjecture. Our equivalence theorem refutes this conjecture.

Analog Computation Device.

The surprising finding has been that when analog networks assume real weights, their power encompasses and transcends that of digital computers. However, although able to compute supra-Turing functions, analog networks are not infinitely powerful or coextensive with all conceivable computation: the imposition of resource constraints still results in a nontrivial reduction of the computational power. (Technically, under polynomial time constraint, they compute P/poly, just like Turing machines that consult non-uniform advice. This is a small class of functions that is still richer than P and includes some non-recursive functions.) This is the first uniform model that has been shown to have non-uniform behavior.

One can advance from Turing computation to Super Turing computation not only by changing the set of numbers from rationals to reals. For example by introducing analog versions of probabilistic or asynchronous computation. Click here to learn more about.

Real Numbers vs. Analog Computation.

Field medalist Steven Smale, Mike Shub and Lenore Blum proposed the well-known BSS computational model. This model incorporates real constants and real registers and applies discontinuous operations of exact comparisons as well as real number algebraic operations. We, on the other hand, constrain to real numbers to gain continuous state space but with continuous operations only, since our focus is analog computation and the machine can be in different states even if we are unable to know exactly where it is or act based on the exact internal values. Hence, our model captures nature’s manifest “computation” of the future physical world from the present, in which constants that are not known to us, or cannot even be measured, do affect the evolution of the system. (For example, planets revolve according to the exact values of G, the gravitational coupling constant, and their masses, regardless of our inability to measure these values.) With Gavalda, we investigated the computational differences between the BSS and the ARNN models, providing a partial solution to the important problem of continuous/discontinuous real models. Implications to biological modeling were noted.

In 1995, I authored a paper in Science in which I showed that many dynamical systems and idealized chaotic systems that cannot be described by the Turing machine are well captured within the framework of analog neural networks. This work supports the proposition that, in parallel to the Turing machine, which describes all digital computation, it is possible that the ARNN can serve as a standard in the field of analog computation. Stated analogously to the Church-Turing thesis of computation, we proposed that: No possible abstract analog device can have more computational capabilities (up to polynomial time) than Analog Recurrent networks.

Another way to expand on the Turing model is to consider our theory as describing an extension not from digital to analog, but rather from static to adaptive computational schemes. The interleaving of computation with weights adaptation on a continuum in a single machine, enables the model to outperform the static Turing machine in terms of computational power. For more on Super-Turing Computation see here.

Bibliography

H.T. Siegelmann and E.D. Sontag, “Turing Computability with Neural Networks,” Applied Mathematics Letters, 4(6), 1991: 77-80

H.T. Siegelmann and E.D. Sontag, “Analog Computation via Neural Networks,” Theoretical Computer Science, 131, 1994: 331-360

H.T. Siegelmann and E.D. Sontag, “Computational Power of Neural Networks,” Journal of Computer System Sciences, 50(1), 1995: 132-150

H.T. Siegelmann, “Computation Beyond the Turing Limit,” Science, 238(28), April 1995: 632-637

H.T. Siegelmann, “Stochastic Analog Networks and Computational Complexity,” Journal of Complexity, 15(4), 1999: 451-475

Kilian and H.T. Siegelmann, “The Dynamic Universality of Sigmoidal Neural Networks,” Information and Computation, 128(1), 1996: 45-56

R. Gavaldá and H.T. Siegelmann, “Discontinuities in Recurrent Neural Networks,” Neural Computation, 11(3), April 1999: 715-745

H.T. Siegelmann, Neural Networks and Analog Computation: Beyond the Turing Limit, Birkhauser, Boston, December 1998

H.T. Siegelmann, B.G. Horne, and C.L.Giles, “Computational Capabilities of Recurrent NARX Neural Networks,” IEEE Transaction on Systems, Man and Cybernetics – part B: Cybernetics, 27(2), 1997: 208-215

H.T. Siegelmann and M. Margenstern, “Nine Neurons Suffice for Turing Universality,” Neural Networks, 12, 1999: 593-600