A generator is a kind of iterator (i.e., it implements a next() method)
We will study two ways to build generators
The easiest way to build generators is using generator expressions
Just like a list comprehension, but with round brackets
Here is the list comprehension:
>>> singular = ('dog', 'cat', 'bird')
>>> type(singular)
<type 'tuple'>
>>> plural = [string + 's' for string in singular] # Creates a list
>>> plural
['dogs', 'cats', 'birds']
>>> type(plural)
<type 'list'>
And here is the generator expression
>>> singular = ('dog', 'cat', 'bird')
>>> plural = (string + 's' for string in singular) # Creates a generator
>>> type(plural)
<type 'generator'>
>>> plural.next()
'dogs'
>>> plural.next()
'cats'
>>> plural.next()
'birds'
Since sum() can be called on iterators, we can do this
>>> sum((x * x for x in range(10)))
285
The function sum() calls next() to get the items, adds successive terms
In fact, we can omit the outer brackets in this case
>>> sum(x * x for x in range(10))
285
The most flexible way to create generator objects
(Note that this section is technical, and you can probably get by without it)
Here’s an example
Example 1
def f():
yield 'start'
yield 'middle'
yield 'end'
Here f() is called a generator function
Looks like a function, uses new keyword yield
Let’s see how it works
john@c246:~/sync_dir/teaching/kyoto_08$ python -i temp.py
>>> type(f) # f itself is a function
<type 'function'>
>>> gen = f() # Creates a generator object
>>> gen
<generator object at 0xb7cf31ac>
>>> gen.next()
'start'
>>> gen.next()
'middle'
>>> gen.next()
'end'
>>> gen.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
The function f() is used to create generator objects (in this case gen)
Generators are iterators, because they support a next() method
The first call to gen.next()
The second call to gen.next()
def f():
yield 'start'
yield 'middle' # This line!
yield 'end'
When the code block ends, throws a StopIteration error
Example 2
Our next example receives an argument x from the caller
def g(x):
while x < 100:
yield x
x = x * x
Let’s see how it works
john@c246:~$ python -i test.py
>>> g
<function g at 0xb7d6b25c>
>>> gen = g(2) # Call generator function to make a generator
>>> type(gen) # gen is an object of type generator
<type 'generator'>
>>> gen.next() # Generators are iterators
2
>>> gen.next()
4
>>> gen.next()
16
>>> gen.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
The call gen = g(2) binds gen to a generator
Inside the generator, the name x is bound to 2
When we call gen.next()
Note that value of x is retained inside the generator
When we call gen.next() again, execution continues from where it left off
def g(x):
while x < 100:
yield x
x = x * x # execution continues from here
Continues until yield x, returns the value of x, repeats
When x < 100 fails, throws a StopIteration error
Here’s the generator used with for
gen = g(2)
for v in gen:
print v
Note that the loop inside the generator can be infinite
def g(x):
while 1:
yield x
x = x * x
Here’s how it works
>>> gen = g(3)
>>> gen.next()
3
>>> gen.next()
9
>>> gen.next()
81
>>> gen.next()
6561
>>> gen.next()
43046721
>>> gen.next()
1853020188851841L
Don’t use this in a for loop ; )
What’s the advantage of using an iterator here?
Suppose we want to sample a binomial(n,0.5)
One way to do it is as follows
>>> n = 10000000
>>> draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
>>> sum(draws)
But we are creating two huge lists here
Uses up lots of memory, very slow
If I make n even bigger then my computer refuses to allocate the memory
>>> n = 1000000000
>>> draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
We can avoid these problems using iterators
Here is the generator function:
import random
def f(n):
i = 1
while i <= n:
yield random.uniform(0, 1) < 0.5
i += 1
Now let’s do the sum:
john@c246:~/sync_dir/teaching/kyoto_08$ python -i temp.py
>>> n = 10000000
>>> draws = f(n)
>>> draws
<generator object at 0xb7d8b2cc>
>>> sum(draws)
4999141
In summary
Iterables avoid the need to create big lists/tuples
Exercise 1
Write a generator which yields a time series for the quadratic map
Inputs to the generator are <i>x<sub>0</sub></i> and n, the length of the series
Plot a series with Matplotlib
Exercise 2
Complete the following code, and test it using [this](programs/table.csv) file
def column_iterator(target_file, column_number):
"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
which steps through the elements of column column_number in file
target_file.
"""
# put your code here
dates = column_iterator('table.csv', 1)
for date in dates:
print date
Solution to Exercise 1:
## Filename: quadmap.py
## Author: John Stachurski
import pylab
def qm(x, n):
i = 0
while i < n:
yield x
x = 4 * (1 - x) * x
i += 1
h = qm(0.1, 200)
time_series = [x for x in h]
pylab.plot(time_series)
pylab.show()
Solution to Exercise 2:
def column_iterator(target_file, column_number):
"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
which steps through the elements of column column_number in file
target_file.
"""
f = open(target_file, 'r')
for line in f:
yield line.split(',')[column_number - 1]
f.close()
dates = column_iterator('table.csv', 1)
for date in dates:
print date