# Controlled Stochastic Process

## Controlled Stochastic Process

a stochastic process whose probability characteristics can be changed by means of control actions. The main goal of the theory of stochastic control is to find optimal or near-optimal controls that provide an extremum for a given performance criterion.

Let us take the simple case of controlled Markov chains and consider one of the ways in which a mathematical statement of the problem of finding the optimal control can be formulated. Suppose is a family of homogeneous Markov chains with a finite number of states *E* = {0,1,..., *N*} and matrices of transition probabilities . The transition probabilities depend on the parameter *d*, which belongs to some set of control actions *D*. The set of functions α = {α_{0}(*x*_{0}), α_{1}(*x*_{0}, *x*_{1},...} with values in *D* is called the strategy, and each of the functions α_{n} = α_{n} (*x*_{0}, ..., *x _{n}*) is called the control at time

*n*. To every strategy α there corresponds a controlled Markov chain where

Let

where the function *f(d, x)* ≥ 0 and *f*(d, 0) = 0. (If the point {0} is an absorbing state and *f(d, x)* = 1, *d* ∊ *D, x* = 1, . . ., *N*, then *V*^{α}(*x*) is the mathematical expectation of the time of transition from point *x* to point 0.) The function

is called the value, and the strategy α* is said to be optimal if

*V*α* (*x*) = *V* (*x*)

for all *x* ∊ *E*.

Under quite general assumptions regarding the set *D*, it can be shown that the value *V(x)* satisfies the following optimality equation (the Bellman equation):

where

In the class of all strategies, homogeneous Markovian strategies, which are characterized by a single function α(*x*) such that α_{n} (*x*_{0},...,*x _{n}*) = α(

*x*) for all

_{n}*n*= 0, 1, ..., are of the greatest interest.

The following optimality criterion, or sufficient condition for optimality, can be used to verify that a given homogeneous Markovian strategy is optimal: let there be functions α* = α*(*x*) and *V** = *V**(*x*) such that for any *d* ∊ *D*

0 = *f*(*x*, α*(*x*)) + *L*α* *V** ≤ *f*(*x*, *d*) + *L*^{d}*V**(*x*)

(where *L ^{d}* =

*T*–

^{d}*I, I*being the identity operator), then

*V**is the value (

*V** =

*V*), and the strategy α* = α*(

*x*) is optimal.

### REFERENCE

Howard, R. A.*Dinamicheskoe programmirovanie i markovskie protsessy*. Moscow, 1964. (Translated from English.)

A. N. SHIRIAEV