FinStateMarkSequence-Summary

pdf

School

University of Southern California *

*We aren’t endorsed by this school

Course

407

Subject

Computer Science

Date

Oct 30, 2023

Type

pdf

Pages

Uploaded by GeneralBoulder9843

A Summary of Time-Homogeneous Finite State Markov Sequences. Main objects. State Space S = { a 1 , . . . , a N } ; The Sequence X = ( X n , n ≥ 0) , X n ∈ S ; The Tail sigma algebra of X : I = \ n ≥ 0 σ ( X k , k ≥ n + 1); (One Step) Transition Probability Matrix P = ( p ij , i, j = 1 , . . . , N ) : p ij = P ( X n +1 = a j | X n = a i ); m Step Transition Probability Matrix P ( m ) = ( p ( m ) ij , i, j = 1 , . . . , N ) : p ( m ) ij = P ( X n + m = a j | X n = a i ); P (1) = P. Basic results. (1) Markov Property : P ( X n = a i | X n − 1 , . . . , X 0 ) = P ( X n = a i | X n − 1 ), n ≥ 1. (2) Transition probability matrices are stochastic : p ( m ) ij ≥ 0 , ∑ N j =1 p ( m ) ij = 1 , i = 1 , . . . , N, m = 1 , 2 , . . . . (3) Chapman-Kolmogorov equation: P ( m ) = P m . Proof. Start with m = 2: p (2) ij = P ( X 2 = a j | X 0 = a j ) = N X ℓ =1 P ( X 2 = a j , X 1 = a ℓ | X 0 = a i ) = N X ℓ =1 P ( X 2 = a j | X 1 = a ℓ , X 0 = a i ) P ( X 1 = a ℓ , X 0 = a i ) = N X ℓ =1 P ( X 2 = a j | X 1 = a ℓ ) P ( X 1 = a ℓ , X 0 = a i ) = N X ℓ =1 p iℓ p ℓj , where the third equality is a particular case of P ( A ∩ B | C ) = P ( A | B ∩ C ) P ( B | C ) and the fourth equality is the Markov property. The general m is similar. (4) Distribution of X n : if µ i = P ( X 0 = a i ), i = 1 , . . . , N , then P ( X n = a j ) = N X i =1 µ i p ( n ) ij . (5) Linear Model Representation : If X n is a column vector in R N with component number i equal to I ( X n = a i ), then X n +1 = P ⊤ X n + V n +1 , where P ⊤ is the transpose of the matrix P and the random variables V n , n ≥ 1, defined by V n = X n − P ⊤ X n − 1 , have zero mean [because, by construction, E ( X n | X n − 1 ) = P ⊤ X n − 1 ], and are uncorrelated : E V ⊤ n V m is the zero matrix for m ̸ = n [similarly, by conditioning]. (6) Recursive generation of a sample path of X : on step n , generate a random variable U that is uniform on (0 , 1) and, if X n = a i , then set X n +1 = a 1 if U ≤ p i 1 , X n +1 = a 2 if p i 1 < U ≤ p i 1 + p i 2 , etc. (7) Exponential (Geometric) Ergodicity: If there exists an ℓ such that p ( ℓ ) ij > 0 for all i, j = 1 , . . . , N , then there exists a unique probability distribution π = ( π 1 , . . . , π N ) on S and numbers C > 0 and r ∈ (0 , 1) with the following properties: • π i > 0 for all i = 1 , . . . , N • π i is the almost-sure limit of 1 n ∑ n k =1 I ( X k = a i ) as n → + ∞ ; • 1 /π i is the average return time to a i if X 0 = a i ; • π j = ∑ N i =1 π i p ij (invariance of π : if P ( X 0 = a i ) = π i , then P ( X n = a i ) = π i for all n > 0; also, as a row vector, π is the left eigenvector of P ); • The distribution of X n converges to π exponentially quickly regardless of the initial distribution of X 0 : max i,j | π j − p ( n ) ij | ≤ Cr n , max j N X i =1 µ i p ( n ) ij − π j = max j N X i =1 µ i ( p ( n ) ij − π j ) ≤ Cr n . (1) 1

2 An outline of the proof. Taking for granted that π exists, let δ > 0 be such that p ( ℓ ) ij > δπ j for all i . Take θ = 1 − δ and define Q by P ℓ = (1 − θ )Π + θQ , where the matrix Π has all rows equal to π . Next, argue by induction that P nℓ = (1 − θ n )Π + θ n Q n so that, for m ≥ 1, P nℓ + m − Π = θ n ( Q n P m − Π) , from which (1) follows with r = θ 1 /ℓ and C = 1 /θ . Further Definitions. (1) Chain is an indication that the state space is countable; Sequence is an indication that the time is discrete. (2) Irreducible Markov chain X : for every i, j there is an n ≥ 1 such that p ( n ) ij > 0 (each state is accessible from every other state); (3) The Period d i of the state a i is the greatest common divisor of the numbers n for which p n ii > 0. Aperiodic state has d i = 1. (4) Total variation distance between two probability distributions µ and ν on S is s ∥ µ − ν ∥ TV = 1 2 N X i =1 | µ i − ν i | . Further Results. (1) A state a i is aperiodic if and only if there exists an n 0 ≥ 1 such that, for all n > n 0 , p ( n ) ii > 0. (2) All states in an irreducible chain have the same period. (3) If the chain is irreducible and aperiodic, then there exists an m such that p ( m ) ij > 0 for all i, j = 1 , . . . , N . [Indeed, p ( m + q + r ) ij ≥ p ( m ) ik p ( q ) kk p ( r ) kj .] (4) A stronger version of (1) is ∥ π − p ( n ) i • ∥ TV ≤ Cr n . The fundamental question: given the chain X , how to find the r from (1). Further developments: Mixing time and cut-off phenomena; Metropolis-Hastings algorithm and MCMC; Hidden Markov Models (HMM) and the estimation algorithms of Viterbi and Baum-Welch. The reference: D. A. Levin and Y. Peres. Markov chains and mixing times, Second edition. American Mathematical Society, Providence, RI, 2017. xvi+447 pp. ISBN: 978-1-4704-2962-1. The bottom line: A nice discrete time Markov chain is time-homogeneous, finite state, irreducible and aperiodic; for a time-homogenous finite-state Markov chain X with transition matrix P , the following three properties are equivalent: (a) all entries of the matrix P n are non-zero for some n ≥ 1; (b) as n → ∞ , every row of P n converges to the same (unique) invariant distribution π for X and π i > 0 for all i ; (c) X is irreducible and aperiodic. Beyond the nice setting. Starting point: a discrete-time, time-homogenous Markov chain X = ( X 0 , X 1 , X 2 , . . . ) with a countable state space S = { a 1 , a 2 , . . . } and transition matrix P = ( p ij , i, j ≥ 1); p ij = P ( X n +1 = a j | X n = a i ) , and, for m ≥ 2, P m = ( p ( m ) ij , i, j ≥ 1) , p ( m ) ij = P ( X n + m = a j | X n = a i ) , with conventions p (0) ii = 1 , p (1) ij = p ij . Definitions based on the arithmetic properties of P . (1) State a i is absorbing if p ii = 1; (2) State a j is accessible from a i , i ̸ = j if p ( m ) ij > 0 for some m ≥ 1 [by convention, every state is accessible from itself]; (3) Communicating states a i and a j are mutually accessible: p ( n 1 ) ij > 0 , p ( n 2 ) ji > 0 for some n 1 , n 2 ≥ 0 [by convention, each state communicates with itself, making communication an equivalence relation]; (4) State a j is essential if it is absorbing or if it communicates with every state that is accessible from it: either p jj = 1 or, for every i ̸ = j , if p ( n ) ji > 0 for some n then p ( m ) ij > 0 for some m ; (5) Inessential states are those that are not essential: X eventually gets out of an inessential state but never comes back to it; (6) Irreducible class is an equivalence class of (essential) communicating states; (7) The Period d i of the state a i is the greatest common divisor of the numbers n for which p n ii > 0;

3 (8) Irreducible chain has only one irreducible class; (9) Aperiodic chain has all states with period 1. (10) An invariant (stationary) measure for P (or for X ) is a positive measure µ on S such that ∑ i µ i p ij = µ j , with µ i = µ ( a i ). Note that µ ( S ) = + ∞ is allowed. If µ ( S ) < + ∞ and π i = µ i /µ ( S ), then π is the invariant (stationary) distribution for P . (11) An invariant measure µ for P is called reversible if µ i p ij = µ j p ji for all i, j. (2) Note that a measure satisfying (2) must be invariant: just sum up both sides over i (or over j ). (12) A function f : S → R is called [sub](super)-harmonic if f ( a i ) [ ≤ ]( ≥ ) = ∑ j p ij f ( a j ). Definitions based on the asymptotic properties of P n as n → ∞ . (1) State a i is recurrent if ∑ n ≥ 1 p ( n ) ii = + ∞ and is transient otherwise; (2) State a i is positive recurrent if ∞ X n =1 nf ( n ) i < ∞ , f (1) i = p ii , p ( n ) ii = n X k =1 f ( k ) i p ( n − k ) ii , n ≥ 2 . (3) State a i is null recurrent if it is recurrent but not positive recurrent; (4) A chain X is (positive) recurrent if all the states are (positive) recurrent; (5) A chain X is ergodic if, as n → ∞ , all rows of P n converge to the (unique) invariant distribution π for P and π i > 0 for all i . Basic facts: (1) If the state space has N elements, then X is irreducible if and only if the matrix I N × N + P + P 2 + · · · + P N has all entries strictly positive; (2) All states within one irreducible class have the same period, leading to a cycle, or cyclic, decomposition of the class; (3) If a i is recurrent and a j is accessible from a i , then a j is also recurrent and communicates with a i ; (4) With τ i = inf { n ≥ 1 : X n = a i } , we have f ( k ) i = P ( τ i = k | X 0 = a i ) , so that P ( τ i < + ∞| X 0 = a i ) = ∞ X k =1 f ( k ) i < ∞ , X n ≥ 1 p ( n ) ii = P ( τ i < + ∞| X 0 = a i )   1 + X n ≥ 1 p ( n ) ii   , and the state a i is • transient if and only if P ( τ i = + ∞| X 0 = a i ) > 0; • recurrent if and only P ( τ i < + ∞| X 0 = a i ) = 1, • positive recurrent if and only if E ( τ i | X 0 = a i ) < ∞ ; (5) Inessential states are transient, and, if the state space is finite, then all transient states are inessential (that is, in a finite state Markov sequence, a state is (positive) recurrent if and only if it is essential); (6) If a j is accessible from a i , i ̸ = j , and a j is absorbing, then a i is inessential; (7) In terminology of graph theory, a chain on a graph is • irreducible if and only if the graph is connected; • aperiodic and irreducible if and only if the graph is non-bipartite; Further facts. (1) If a i is a recurrent state and τ i = inf { n ≥ 1 : X n = a i } , then µ ( i ) j = ∞ X n =0 P ( X n = a j , τ i > n | X 0 = a i ) defines an invariant measure for P ; the measure is finite if a i is positive recurrent. (2) If the chain is irreducible, then the following three statements are equivalent: (a) at least one state is positive recurrent; (b) all states are positive recurrent; (c) an invariant distribution exists. (3) If π is an invariant distribution for P and π ( a i ) > 0, then a i is recurrent; (4) If X is irreducible, aperiodic, and recurrent, then the tail sigma algebra of X is trivial;

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

4 (5) If X is irreducible and recurrent, and all states have period d > 1, then the tail sigma algebra of X is determined by the cycle decomposition of the state space S and is non-trivial. (6) A function f : S → R is [sub](super)-harmonic if and only if the sequence f ( X n ) , n ≥ 0 is a [sub](super)- martingale with respect to the filtration generated by X . An irreducible X is recurrent if and only if every non-negative super-harmonic function is constant. Examples (1) Simple symmetric random walk on the line with arbitrary starting point is irreducible; each state is null recurrent and has period 2; the tail sigma algebra consists of four sets. (2) Ehrenfest chain has S = { 0 , 1 , . . . , N } , p ij =      j/N, j = i − 1 , ( N − j ) /N, j = i + 1 , 0 , otherwise and, even though all states have period 2, there is a unique invariant distribution π with π k = 2 − N ( N k ) . Still, the chain is not ergodic in the sense that the rows of P n do not converge to π as n → ∞ . (3) A birth and death chain with S = { 0 , 1 , 2 . . . } , p 00 = 2 / 3 , p 01 = 1 / 3 , p ij =      2 / 3 , j = i − 1 , 1 / 3 , j = i + 1 , 0 , otherwise , is reversible, with invariant distribution π k = 2 − k − 1 , k = 0 , 1 , 2 , . . . . (4) M/G/ 1 queue with arrival intensity λ and service time cdf F : if a k = 1 k ! R + ∞ 0 e − λt ( λt ) k dF ( t ) [probability that k customers arrive while one is served], then the numbers of customers X n in the buffer at the time n -th customer enters service is a Markov chain with p ij =      a 0 + a 1 , i = j = 0 , a k , j = i − 1 + k and i ≥ 1 or k > 1 , 0 , otherwise . With ν = ∑ k ≥ 1 ka k [average number of new customers arriving while one is served], the chain is transient if µ > 1, null recurrent if µ = 1, and positive recurrent, with a unique invariant distribution, if µ < 1. (5) A particular example of an M/M/ ∞ queue is X n +1 = ∑ X n m =1 ξ mn + Y n , where ξ mn are iid Bernoulli with probability of success p ∈ (0 , 1) [representing the customers still served at time n + 1] and Y n are iid (also independent of ξ ) Poisson random variables with mean λ [representing new customers arriving in one time unit], then the chain ( X n ) , n ≥ 1 , is ergodic, with the unique invariant distribution equal to Poisson with mean λ/ (1 − p ). (6) Gambler’s ruin model has a i = i, i = 0 , . . . , N, with absorbing states a 0 and a N and p ij =      p, j = i + 1 , i < N [winning one unit] q = 1 − p, j = i − 1 , i > 0 [losing one unit] 0 , otherwise . Here, all states a i , i = 1 , . . . , N − 1 are inessential, and there are infinitely many invariant distributions: any probability distribution of the form π 0 = α ∈ [0 , 1], π N = 1 − α , π i = 0 otherwise. Accordingly, for this problem, the object of interest is not the long-time behavior but the probability of ruin r n , that is, the probability of reaching a 0 before reaching a N if the starting state is a n so that n represents gambler’s starting capital. Because r n = pr n +1 + qr n − 1 and r 0 = 1 , r N = 0, we have r n = 1 − ( n/N ) for p = 1 / 2 and r n = ( β n − β N ) / (1 − β N ) , β = q/p ̸ = 1. When the odds are not in gambler’s favor [that is, p < q ], the ruin is essentially certain: for example, with N = 100, p = 0 . 455, and q = 0 . 545 we have β ≈ 1 . 1978 and r 80 ≈ 0 . 97.