Automatic Learning System
Automatic Learning System
a trainable machine, or self-adjusting system, whose control algorithm changes in conformity with an evaluation of the results of control so that with the passage of time the machine improves its characteristics and quality of performance.
Technological systems can only be designed and built with initial, a priori information on the nature of the processes occurring in the system and on the conditions that accompany the operation of the system and can have disturbing effects on it. When the initial, a priori information is complete, it is possible to determine with sufficient precision what values must be designed into the system in order to ensure the required level of performance. In this case there is no necessity for training the machine. However, where the initial information is not complete, the only way of building a system with the required performance features is to incorporate the principle of training into the system during its development.
In the training process, numerous inputs are entered into the system, and the system’s reaction to these inputs is corrected. External correction, or encouragement and punishment as it is still called, is carried out by an instructor who knows the desired reaction to a given set of inputs. The instructor may be either a human operator or an automaton. It is by the processing of control data, that is, a posteriori information, that the missing initial information is supplied. If the instruction is carried out without an external training device, the system is called self-organizing.
The training is done by means of algorithms. Depending on whether the automatic learning system is of the discrete or of the continuous type, these training algorithms consist of either a system of stochastic difference equations or stochastic differential equations. The algorithms are programmed into digital or analog computers (especially electronic integrators) or hybrid computer systems. As the training process progresses, the learning machine accumulates experience that it uses to gradually work out the required reactions of the system to external inputs.
The automatic learning system is an asymptotic optimal system, in which the optimal system reaction to external inputs is not achieved at once, but only with the passage of time and as the result of training. The most complete studies have been carried out on automatic learning systems programmed for pattern recognition, identification, filtering, and control.
In learning machines programmed for pattern recognition, the entire set of objects to be recognized is subdivided, before the system begins to function, into classes that conform to a selected principle of classification. After this a dictionary of signs of the objects to be recognized is compiled, and technical means for identifying these signs are created. If the volume of initial, a priori information is sufficient to describe the classes in the language of the signs, a recognition system can be built without automatic training. However, if the volume of initial information is not sufficient to describe the classes, or if for some reason it is not convenient to compile such a description, the pattern recognition system may be shaped by means of training.
Before the automatic learning system begins functioning as a recognition system, an instructor shows the machine practice objects from all the selected classes and indicates just what classes they belong to. Then the instructor “tests” the system, correcting its answers until the average number of errors is brought down to the desired level. The initial, a priori information is supplemented through the teaching process, making it possible for the pattern recognition system to describe classes by using the dictionary of selected signs. The greater the precision achieved in the description of classes in the language of the dictionary of signs, the better the system will work and the fewer errors there will be in recognizing unknown objects or phenomena.
Automatic learning systems for filtering are designed to separate a useful signal from noise, an operation which is particularly necessary for radar and for long-range radio communication. Under conditions of full a priori information on inputs (useful signal and noise), a filtering system can be built that maximizes the import of the criterion of optimality appropriate to the work of the system. However, where the a priori information is insufficient, training is the only way to build an optimal filtering system. During the training process the parameters of the filtering system are changed, and sometimes even its structure changes. As a result the criterion of optimality approaches its maximum significance asymptotically.
Automatic learning systems for control can be used in aircraft, complexes of production machinery, and elsewhere. Figure 1 shows the schematic diagram of a model automatic control system in which the desired optimal (in a given sense) process of control is achieved by means of training. Suppose that the goal of the control is to ensure a minimum value for a certain magnitude or functional R, which in the general case depends on the functions of the given input x̄νx(t) and the control input ū(t) and on the magnitude being controlled x̄(t); that is:
(1) R[x̄νx(t), x̄(t), ū(t)] = Rmin
This goal must be achieved with the constraint that certain magnitudes or functionals Fi where i = 1, 2, … m must not surpass values set for them; that is:
(2) Fi[x̄νx(t), x̄(t), ū(t), z̄(t)] ≤ Fi*
where z̄(t) is a disturbance that acts on the object of control. Suppose further that full a priori information with regard to z̄(t) and x̄νx(t) is not available, for if it were, an optimal control system could in principle be constructed without training.
In the system under consideration, the main part of the control device A1 has a control algorithm that is capable of modification over a broad range, while another part, A2 can act on A1 revising its algorithm. Orienting itself toward the goal of the control, device A2 uses training algorithms and accumulates experience from the totality of reactions of A1 to possible changes in the modes of operation of object B. A2 processes this information and works out inputs ȳ*(t), which come progressively closer to the required values. The required values are those values of ȳ(t) which, in conformity with the values obtained in the computer device C for the criterion of optimality R (with constraints F*i), revise the algorithm of work for A1 in such a way that conditions (1) and (2) are met. The automatic learning system for control considered here is asymptotically optimal.
REFERENCESFel’dbaum, A. A. “Protsessy obucheniia liudei i avtomatov.” In the collection Kibernetika, myshlenie, zhizn’. Moscow, 1964.
Nelson, R. J. Obuchaiushchiesia mashiny. Moscow, 1967. (Translated from English.)
Tsypkin, la. Z. Adaptatsiia i obuchenie v avtomaticheskikh sistemakh. Moscow, 1968.
Tsypkin, la. Z. Osnovy teorii obuchaiushchikhsia sistem. Moscow, 1970.
Gorelik, A. L., and V. A. Skripkin. Nekotoiye voprosy postroeniia sistem raspoznavaniia ob”ektov i iavlenii. Moscow, 1974.
A. L. GORELIK