sampledoc

Table Of Contents

Previous topic

Built-In Functions

Next topic

Functions

This Page

Iterators

Iterators are a uniform interface to stepping through elements in a collection

One of the many nice features of the Python language

In this lecture we’ll talk about using iterators

In a later lecture we’ll learn how to build our own

Definitions

First we define iterators and iterables

Iterators

An iterator is an object with a next() method

For example, file objects (which we met in this lecture) are iterators

Recall that we had a file test.txt with contents

Foo foo
Bar bar

Let’s create a file object linked to this file

>>> f = open('test.txt', 'r')

This object has a next() method:

>>> f.next()
'Foo foo\n'
>>> f.next()
'Bar bar\n'

Calling f.next() is essentially the same as calling f.readline()

Other examples are

  • enumerate objects
>>> e = enumerate(['foo', 'bar'])
>>> e.next()
(0, 'foo')
>>> e.next()
(1, 'bar')
  • reader objects from the csv module (which is used to manipulate CSV files)
>>> from csv import reader
>>> nikkei_data = reader(open('table.csv'))  # The reader() function is passed a file object
>>> nikkei_data.next()
['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
>>> nikkei_data.next()
['2008-05-19', '14294.52', '14343.19', '14219.08', '14269.61', '133800', '14269.61']
  • objects returned by urllib.urlopen()
>>> import urllib
>>> webpage = urllib.urlopen("http://www.cnn.com")
>>> webpage.next()
'<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/...' # etc
>>> webpage.next()
'<meta http-equiv="refresh" content="1800;url=?refresh=1">\n'
>>> webpage.next()
'<meta name="Description" content="CNN.com delivers the latest breaking news and information..' # etc

Iterables

The built-in function iter() can be used for creating iterators from certain objects

An object is said to be iterable if it can be passed to iter()

A good example is a list:

>>> X = ['foo', 'bar']
>>> type(X)
<type 'list'>
>>> Y = iter(X)
>>> type(Y)
<type 'listiterator'>
>>> Y.next()
'foo'
>>> Y.next()
'bar'
>>> Y.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Another example is a dictionary

>>> d = {'name': 'godzilla', 'height in meters': 10}
>>> d = iter(d)
>>> type(d)
<type 'dictionary-keyiterator'>
>>> d.next()
'height in meters'
>>> d.next()
'name'

The next() method steps through the keys of the dictionary

  • The keys are not ordered, so no notion of “first”, “second”, etc.

Incidentally, we can get iterators directly

  • d.iterkeys() returns same iterator as iter(d.keys()) or iter(d)
  • d.itervalues() returns same iterator as iter(d.values())
  • d.iteritems() returns same iterator as iter(d.items())

Of course, not all objects are iterable

>>> iter(42)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable

Using Iterators

Let’s look at some different ways we can use iterators

Iterators in For Loops

A very common use of iterators is in for loops

In fact this is how the for loop works!

for x in iterator:
    <code block>

This is what happens:

  • Interpreter calls iterator.next() and binds x to result
  • Executes code block
  • Repeats until StopIteration error

Remember that in this lecture we introduced the syntax

f = open('somefile.txt')
for line in f:
    # do something

Now you know how it works:

  • f is bound to an iterator
    • A file object, which implements a next() method
  • Interpreter
    • Calls f.next() and binds line to return value
    • Executes body of loop
    • Repeats until StopIteration error

Another example

for i, x in enumerate(X):
    # do something

Again, enumerate(X) is an iterator

What about this example

X = ['a', 'b']
for x in X:
    print x

Here X is a list (an iterable), not an iterator

Internally, Python calls iter(X) to make an iterator

More generally,

  • for loops work on either iterators or iterables

  • In the second case, the iterable is converted into an iterator
    • iter(iterable)

Here’s another example

d = {'name': 'godzilla', 'height in meters': 10}
for key in d:
    # do something

Now you know how this works

Internally, the iterable d is passed to iter()

The resulting iterator steps through the keys of d

Iterators and built-ins

Some built-in functions that act on sequences also work with iterables

  • max(), min(), sum(), all(), any()
>>> X = [10, -10]
>>> max(X)
10
>>> Y = iter(X)
>>> type(Y)
<type 'listiterator'>
>>> max(Y)
10

Use and reuse

A major difference in usage is that iterators are depleted by use

>>> X = [10, -10]
>>> Y = iter(X)
>>> max(Y)
10
>>> max(Y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence

Application: Web Data

The application involves downloading data with the module urllib

URL stands for uniform resource locator

Examples:

Some URLs have a query string

The part after (but not including) ? is the query string

Passed to the server as an argument

We can obtain stock price data from Yahoo Finance using query strings, such as

The query string is a collection of field/value pairs, separated by &

The meanings of the main fields are

  • a: start month, base zero (e.g., jan = 0, feb = 1, etc.)
  • b: start day
  • c: start year
  • d: end month, base zero
  • e: end day
  • f: end year
  • g: period (in this case, d = daily)
  • s: ticker symbol for the stock (in this case, Google)

Here is an example of useage

import urllib

base_url = 'http://ichart.finance.yahoo.com/table.csv'

request_data = {'s': 'GOOG',          # Ticker symbol for Google
                'a': '00',            # Start month, base zero
                'b': '01',            # Start day
                'c': '2005',          # Start year
                'd': '05',            # End month, base zero
                'e': '03',            # End day
                'f': '2009',          # End year
                'g': 'd',             # Daily
                'ignore': '.csv'}     # Data type

encoded = urllib.urlencode(request_data)  # Formats the query string
response = urllib.urlopen(base_url + '?' + encoded)

After running this script, we can get successive lines of the data as follows

>>> response.next()
'Date,Open,High,Low,Close,Volume,Adj Close\n'
>>> response.next()
'2009-06-03,426.00,432.46,424.00,431.65,3532800,431.65\n'
>>> response.next()
'2009-06-02,426.25,429.96,423.40,428.40,2623600,428.40\n'
>>> response.next()
'2009-06-01,418.73,429.60,418.53,426.56,3322400,426.56\n'

We see that Google’s share price opened at 426.00 on the 3rd of June 2009, etc.

Note: If you have problems runnning this, your internet connection might be using a proxy server

Try googling for some help with urllib and proxy servers

Exercise:

Write a program to print out the percentage change in value since the start of the year for all of the stocks in this list

  • Change is from Jan 1st until the most recent price available

  • Use the last column (i.e., Adj Close) as the price

  • Stock prices should be downloaded at runtime from Yahoo Finance

  • If you can, print returns in order, from largest to smallest
    • Hint: use the sorted() function

A hint: if

line = '2009-06-01,418.73,429.60,418.53,426.56,3322400,426.56\n'

then line.split(',') returns the elements as a list of strings

Solution

## Filename: yahoo_fin.py
## Author: John Stachurski

from urllib import urlopen, urlencode
from datetime import date
from operator import itemgetter

# Record current day and month as strings, month is base zero
today = date.today()
mm = str(today.month - 1)  
dd = str(today.day)

base_url = 'http://ichart.finance.yahoo.com/table.csv'

request_data = {'a': '00',            # Start month, base zero
                'b': '01',            # Start day
                'c': '2008',          # Start year
                'd': mm,              # End month, base zero
                'e': dd,              # End day
                'f': '2008',          # End year
                'g': 'd',             # Daily
                'ignore': '.csv'}     # Data type

# Main loop

portfolio = open('portfolio.txt')  
percent_change = {}
for line in portfolio:
    ticker, company_name = [item.strip() for item in line.split(',')]
    request_data['s'] = ticker
    response = urlopen(base_url + '?' + urlencode(request_data))
    response.next()  # Skip the first line
    prices = [line.split(',')[-1] for line in response]
    old_price, new_price = float(prices[-1]), float(prices[0])    
    percent_change[company_name] = 100 * (new_price - old_price) / old_price
portfolio.close()

items = percent_change.items()

for name, change in sorted(items, key=itemgetter(1), reverse=True):
    print '%-12s %10.2f' % (name, change)