Go to top, next, previous, or johnstachurski.net

Application: Finite State Optimal Growth

This lecture:

The lecture is based on section 5.1 of EDTC

Outline of the Problem

Model behavior of Colonel Kurtz, who can be found on small island in the Nung river living on catfish

Catfish bite at dawn, colonel bags a random quantity W

Catfish last only if refrigerated, and the Colonel's freezer holds up to M fish

Let Xt be the state variable: stock of fish at noon on day t

In the evening, the Colonel consumes quantity Ct, and freezes Rt = Xt - Ct

Following the next morning's fishing trip, the state is



Assume that Kurtz follows policy function g: on observing state X, saves R = g(X)

State space (possible values for X) is S = 0,...,M+B

Policy function g maps S into 0,...,M, and



Denote the set of all feasible policies by G. For given g in G, state evolves according to



Letting U be the utility function, optimization problem of Colonel Kurtz is



Given that we are currently at state x, the reward from following policy g is



The value function is defined as



The precise definition of optimality is as follows:



The Bellman Equation

Set of all feasible actions (number of fish that can be frozen) when the current state is x:



Bellman equation can be written as



The idea is as follows

Suppose we know the values of different states in terms of maximum future rewards

Then best action is found by trading off the two effects inherent in choosing an action:

Making this trade off optimally gives maximum value from the current state

Given a real-valued function w on S, we call g in G w-greedy if



It is known that a policy is optimal if and only if it is v*-greedy

Hence computing an optimal policy is trivial if we know the value function

How to solve for the value function?

Let bS be the real-valued functions on S

Define the Bellman operator sending bS into bS by



Facts (see the text):



In consequence, iteration of T on any v in bS converges to v*

Value iteration algorithm

Iterate with T on some initial v in bS, producing (Tnv)

Continue iteration until



Computer a w-greedy policy g, where w is the final iterate

If the tolerance is small, then g is almost optimal (see the text)

Exercises

Solve for the optimal policy

For the utility function set



For the parameters, use beta, rho, B, M = 0.5, 0.9, 10, 5

Let q be uniform on 0,...,B

Plot the sequence of functions created in the value iteration algorithm



Print the greedy policy (as a list) calcuated from the last iterate

The solution is

>>> g
[0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 5, 5]



Solution

import pylab

beta, rho, B, M = 0.5, 0.9, 10, 5
S = range(B + M + 1)  # State space
Z = range(B + 1)      # Shock space

def d_infty(v, w):
    return max(abs(v[x] - w[x]) for x in S)

def U(c):      
    "Utility function."
    return c**beta

def q(z):    
    "Probability mass function, uniform distribution."
    return 1.0 / len(Z) if 0 <= z <= B else 0

def Gamma(x):  
    "The correspondence of feasible actions."
    return range(min(x, M) + 1)

def T(v):      
    """An implementation of the Bellman operator.
    Parameters: v is a sequence representing a function defined on S.
    Returns: Tv, a list."""
    Tv = []        
    for x in S:
        vals = []  # Stores rewards for a in Gamma(x)
        for a in Gamma(x):
            y = U(x - a) + rho * sum(v[a + z] * q(z) for z in Z)
            vals.append(y)
        Tv.append(max(vals))
    return Tv 

def greedy(v):
    g = []
    for x in S:
        runningmax = -1
        for a in Gamma(x):
            y = U(x - a) + rho * sum(v[a + z] * q(z) for z in Z)
            if y > runningmax:
                runningmax = y
                maximizer = a
        g.append(maximizer)
    return g
 
def main():
    current = [U(x) for x in S]
    tolerance = 0.001
    while 1:
        pylab.plot(current, 'bo-')
        new = T(current)
        if d_infty(new, current) < tolerance:
            break
        current = new
    g = greedy(new)
    print 'The optimal policy is:'
    print g
    pylab.show()