sampledoc

Table Of Contents

Previous topic

Handling Exceptions

This Page

Generators

A generator is a kind of iterator (i.e., it implements a next() method)

We will study two ways to build generators

  • Generator expressions
  • Generator functions

Generator Expressions

The easiest way to build generators is using generator expressions

Just like a list comprehension, but with round brackets

Here is the list comprehension:

>>> singular = ('dog', 'cat', 'bird')
>>> type(singular)
<type 'tuple'>
>>> plural = [string + 's' for string in singular]  # Creates a list
>>> plural
['dogs', 'cats', 'birds']
>>> type(plural)
<type 'list'>

And here is the generator expression

>>> singular = ('dog', 'cat', 'bird')
>>> plural = (string + 's' for string in singular)  # Creates a generator
>>> type(plural)
<type 'generator'>
>>> plural.next()
'dogs'
>>> plural.next()
'cats'
>>> plural.next()
'birds'

Since sum() can be called on iterators, we can do this

>>> sum((x * x for x in range(10)))
285

The function sum() calls next() to get the items, adds successive terms

In fact, we can omit the outer brackets in this case

>>> sum(x * x for x in range(10))
285

Generator Functions

The most flexible way to create generator objects

(Note that this section is technical, and you can probably get by without it)

Here’s an example

Example 1

def f():
    yield 'start'
    yield 'middle'
    yield 'end'

Here f() is called a generator function

Looks like a function, uses new keyword yield

Let’s see how it works

john@c246:~/sync_dir/teaching/kyoto_08$ python -i temp.py
>>> type(f)           # f itself is a function
<type 'function'>
>>> gen = f()         # Creates a generator object
>>> gen
<generator object at 0xb7cf31ac>
>>> gen.next()
'start'
>>> gen.next()
'middle'
>>> gen.next()
'end'
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

The function f() is used to create generator objects (in this case gen)

Generators are iterators, because they support a next() method

The first call to gen.next()

  • Executes code in the body of f() until it meets a yield statement
  • Returns that value to the caller of gen.next()

The second call to gen.next()

  • Starts executing from the next line
def f():
    yield 'start'
    yield 'middle'  # This line!
    yield 'end'
  • Continues until the next yield statement
  • Returns that value to the caller of gen.next()
  • Etc.

When the code block ends, throws a StopIteration error

Example 2

Our next example receives an argument x from the caller

def g(x):
    while x < 100:
        yield x
        x = x * x

Let’s see how it works

john@c246:~$ python -i test.py
>>> g
<function g at 0xb7d6b25c>
>>> gen = g(2)  # Call generator function to make a generator
>>> type(gen)   # gen is an object of type generator
<type 'generator'>
>>> gen.next()  # Generators are iterators
2
>>> gen.next()
4
>>> gen.next()
16
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

The call gen = g(2) binds gen to a generator

Inside the generator, the name x is bound to 2

When we call gen.next()

  • The body of g() executes until the line yield x
  • The value of x is returned

Note that value of x is retained inside the generator

When we call gen.next() again, execution continues from where it left off

def g(x):
    while x < 100:
        yield x
        x = x * x  # execution continues from here

Continues until yield x, returns the value of x, repeats

When x < 100 fails, throws a StopIteration error

Here’s the generator used with for

gen = g(2)
for v in gen:
    print v

Note that the loop inside the generator can be infinite

def g(x):
    while 1:
        yield x
        x = x * x

Here’s how it works

>>> gen = g(3)
>>> gen.next()
3
>>> gen.next()
9
>>> gen.next()
81
>>> gen.next()
6561
>>> gen.next()
43046721
>>> gen.next()
1853020188851841L

Don’t use this in a for loop ; )

Advantages of Iterators

What’s the advantage of using an iterator here?

Suppose we want to sample a binomial(n,0.5)

One way to do it is as follows

>>> n = 10000000
>>> draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
>>> sum(draws)

But we are creating two huge lists here

  • range(n), and
  • draws

Uses up lots of memory, very slow

If I make n even bigger then my computer refuses to allocate the memory

>>> n = 1000000000
>>> draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

We can avoid these problems using iterators

Here is the generator function:

import random

def f(n):
    i = 1
    while i <= n:
        yield random.uniform(0, 1) < 0.5
        i += 1

Now let’s do the sum:

john@c246:~/sync_dir/teaching/kyoto_08$ python -i temp.py
>>> n = 10000000
>>> draws = f(n)
>>> draws
<generator object at 0xb7d8b2cc>
>>> sum(draws)
4999141

In summary

  • Iterables avoid the need to create big lists/tuples

  • Provide a uniform interface to iteration
    • Can be used transparently in for loops

Exercises

Exercise 1

Write a generator which yields a time series for the quadratic map

(1)\[x_{t+1} = 4 (1 - x_t) x_t\]

Inputs to the generator are <i>x<sub>0</sub></i> and n, the length of the series

Plot a series with Matplotlib

Exercise 2

Complete the following code, and test it using [this](programs/table.csv) file

def column_iterator(target_file, column_number):
    """A generator function for CSV files.
    When called with a file name target_file (string) and column number
    column_number (integer), the generator function returns a generator
    which steps through the elements of column column_number in file
    target_file.
    """
    # put your code here

dates = column_iterator('table.csv', 1)

for date in dates:
    print date

Solutions

Solution to Exercise 1:

## Filename: quadmap.py
## Author: John Stachurski

import pylab

def qm(x, n):
    i = 0
    while i < n:
        yield x
        x = 4 * (1 - x) * x
        i += 1

h = qm(0.1, 200)

time_series = [x for x in h]
pylab.plot(time_series)
pylab.show()

Solution to Exercise 2:

def column_iterator(target_file, column_number):
    """A generator function for CSV files.
    When called with a file name target_file (string) and column number
    column_number (integer), the generator function returns a generator
    which steps through the elements of column column_number in file
    target_file.
    """
    f = open(target_file, 'r')
    for line in f:
        yield line.split(',')[column_number - 1]
    f.close()

dates = column_iterator('table.csv', 1)

for date in dates:
    print date