Go to top, next, previous, or johnstachurski.net

Continuous State Dynamic Programming

In this section we solve for the optimal policy of a stochastic growth model

The model is quite specific, but the method is fairly general

You should be able to apply the same ideas to other kinds of models

The lecture is based on section 6.2 of EDTC

The presentation here is very terse: see the book for more details

To learn more about dynamic programming in general see chapter 10 of EDTC

An Optimal Growth Model

At each point in time, agent receives income yt

Split into consumption ct and savings kt

Output at t+1 is



where φ is a density function

The agent's behavior is specified by a policy function



Here σ(y) interpreted as agent's choice of savings when income = y

The set of all such policies will be denoted by Σ

The agent's decision problem is



and



The utility function U is assumed to be bounded and continuous, and f is assumed to be continuous

Dynamic Programming

We will solve for the optimal policy using dynamic programming (DP)

We start with principles of DP and then go on to implementation

Basic Principles

The value function is defined by



A policy in Σ is called optimal if it attains this supremum for all y in S

The Bellman equation states that



Let bcS be the set of continuous bounded real-valued functions on S.

Given a function w in bcS, we say that σ in Σ is w-greedy if



At least one such policy exists

Moreover, a policy is optimal if and only if it is v*-greedy

Hence to find an optimal policy, we can compute v* and then solve for a v*-greedy policy

The value function v* can be obtained by iterating on the Bellman operator

A map from bcS into itself defined by



A common starting point is to set w = U

Iteration produces a sequence of functions



Fitted Value Iteration

In practice, the functions on this sequence typically cannot be stored on a computer

Many people use discretization: Replace continuous state problem with a similar discrete state problem

In general, this is not the most efficient way

Instead, we approximate the iterates at each step, using simple functions which can be stored on a computer

This is called fitted value iteration

For a theoretical discussion of fitted value iteration, see section 6.2.2 of EDTC

Key idea:

What approximation to use when we go from Tw to ATw?

The simplest option is step functions (i.e., piecewise constant functions)

Fitted Value Iteration with Step Functions

We can now construct an approximate Bellman operator

Corresponds to the model



When iterating, we will use step functions via the StepFun class, as defined in this lecture

One of the problems we need to solve is how to evaluate the integral



Recall that a StepFun instance w has an expectation() method

In the program, we find the cdf of f(k, W) and pass it to this method

## Filename rpd.py
## Author: John Stachurski

import numpy as np
from scipy.optimize import fminbound
from scipy.stats import lognorm
from stepfun import StepFun

theta, alpha, rho = 0.5, 0.8, 0.9      # Parameters

def U(c): 
    return 1 - np.exp(- theta * c)  # Utility

L = lognorm(1)
G = L.cdf   # Cumulative distribution (cdf) of lognormal shock

def dist(c):
    """
    Creates and returns (as a function) the cdf of c * W, where W is 
    lognormal and c is a constant.
    """
    if c == 0.0:
        return np.vectorize(lambda x: 0 if x < 0 else 1)
    return lambda x: G(x / c)

# Next we make a grid.  The grid is constructed in a nonlinear way, 
# so that there are more points near the bottom of the grid.
# The reason is that the iterates have more curvature at the bottom,
# so we need more data there to approximate well.
gridmax, gridsize = 8, 150
grid = np.linspace(0, gridmax**1e-1, gridsize)**10

def maximum(h, a, b):
    "Computes the max of h on the interval [a, b]."
    return h(fminbound(lambda x: -h(x), a, b))  

def bellman(w):
    """
    The approximate Bellman operator.

     Parameters: 
     
        * w is an instance of StepFun with w.X = grid

    Returns: A new instance of StepFun 
    """
    Tw = np.empty(gridsize)
    for i, y in enumerate(grid):
        h = lambda k: U(y - k) + rho * w.expectation(dist(k**alpha))
        Tw[i] = maximum(h, 0, y)
    return StepFun(grid, Tw)

Problem:

Generate iterates of the approximate Bellman operator and plot them

Start from initial condition v = StepFun(grid, U(grid))

Iterate until deviation between current and previous iterate is less than 0.005

Here, deviation between functions v and w is measured by



When these functions are instances of StepFun, the supremum equals the supremum over the grid points alone

You should produce the following graph



Solution:

from rpd import *
from scipy import absolute as abs
import matplotlib.pyplot as plt

v = StepFun(grid, U(grid))
tol = 0.005     # Error tolerance 

while 1:
    plt.plot(grid, v.Y, 'k-')
    new_v = bellman(v)
    if max(abs(new_v.Y - v.Y)) < tol:
        break
    v = new_v

plt.show()