Go to top, next, previous, or johnstachurski.net

NumPy

NumPy is a first class library for numerical programming

Numerical libraries such as NumPy provide code for common numerical operations

In non-compiled languages such as Python, MATLAB, etc., there is an extra value:

These languages are inherently slower than compiled languages like C and Fortran

Using numerical libraries, individual operations on large arrays can be shifted out to fast compiled code

Introduction to NumPy

Used by scientists all around the world

Mature, fast and and stable

NumPy can be installed individually or as part of SciPy

NumPy provides the definition of a numerical array class, plus some utilities

The utilities are Python interfaces to fast numerical routines compiled from C and Fortran

The rest of the lecture assumes that NumPy has been imported as np

>>> import numpy as np

Although this lecture cannot cover all the details of NumPy, remember

NumPy Arrays

Let's now turn to the basics of NumPy arrays

Internally, a NumPy array is an instance of numpy.ndarray (np.ndarray)

Like a native Python list, except that

Data Types

Some dtypes:

bool 8 bit True or False
int32 32 bit integer
int64 64 bit integer
float32 32 bit floating point number
float64 64 bit floating point number

There are also dtypes to represent complex numbers, unsigned integers, etc

At present, the default dtype for arrays is float64

Here's an example

>>> Z = np.zeros(10)
>>> Z
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
>>> type(Z)
<type 'numpy.ndarray'>
>>> type(Z[0])
<type 'numpy.float64'>

zeros() is a NumPy function that returns an ndarray of zeros

If we had wanted to use integers we could have specified as follows:

>>> Z = np.zeros(10, dtype=int)  # Here 'int' is equivalent to 'np.int32'
>>> Z
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

There's no problem mixing native Python numeric types and NumPy dtypes:

>>> Z = np.zeros(1)
>>> Z
array([ 0.])
>>> Z[0] = Z[0] + 1    # Add a Python int to a NumPy float64
>>> Z
array([ 1.])           # The int is upcast to a float64 before addition
>>> type(Z[0])
<type 'numpy.float64'>

Shape and Dimension

Suppose we create an array such as

>>> Z = np.zeros(10)

At present, Z has no dimension:

The dimension is recorded in the shape attribute, which is a tuple

>>> Z.shape
(10,)

The shape tuple has only one element, which is the length of the array

To give it dimension, we need to change the shape attribute

>>> Z.shape = (10, 1)   # Column vector: 10 rows and 1 column
>>> Z
array([[ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.]])
>>> Z = np.zeros(4)
>>> Z.shape = (2, 2)
>>> Z
array([[ 0.,  0.],
       [ 0.,  0.]])

In the last case, we could also pass a tuple to the zeros() function

>>> Z = np.zeros((2, 2))
>>> Z.shape
(2, 2)
>>> Z
array([[ 0.,  0.],
       [ 0.,  0.]])

Creating Arrays

NumPy arrays can be created from Python lists, tuples, etc. using array() or asarray()

>>> Z = np.array([10, 20])                 # ndarray from Python list
>>> Z
array([10, 20])
>>> type(Z)
<type 'numpy.ndarray'>
>>> Z = np.array((10, 20), dtype=float)    # Here 'float' is equivalent to 'np.float64'
>>> Z
array([ 10.,  20.])
>>> Z = np.array([[1, 2], [3, 4]])         # 2D array from a list of lists
>>> Z
array([[1, 2],
       [3, 4]])
>>> Z.shape
(2, 2)

To read in the data from a text file use loadtxt()

Here is the text file test.txt

1 2
3 4

Now

>>> Z = np.loadtxt('test.txt')
>>> Z
array([[ 1.,  2.],
       [ 3.,  4.]])

In addition there are utility functions for creating standard arrays

>>> Z = np.empty(3)
>>> Z
array([ -2.41128814e-039,   1.48539705e-313,  -8.30185262e-040])

What are those numbers, by the way?

To generate an array of ones, use ones():

>>> Z = np.ones((4, 4))
>>> Z
array([[ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])

To set up a grid of evenly spaced numbers use linspace()

>>> Z = np.linspace(2, 4, 5)  # From 2 to 4, with 5 elements
>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> Z = np.linspace(2, 4, 6)  # From 2 to 4, with 6 elements
>>> Z
array([ 2. ,  2.4,  2.8,  3.2,  3.6,  4. ])

To create an identity matrix use identity()

>>> Z = np.identity(2)
>>> Z
array([[ 1.,  0.],
       [ 0.,  1.]])

Array Indexing

For a flat array, indexing is the same as Python sequences:

>>> Z = np.linspace(1, 2, 5)
>>> Z
array([ 1.  ,  1.25,  1.5 ,  1.75,  2.  ])
>>> Z[0]
1.0
>>> Z[0:2]
array([ 1.  ,  1.25])
>>> Z[-1]
2.0

For 2D arrays the syntax is as follows:

>>> Z = np.array([[1, 2], [3, 4]])
>>> Z
array([[1, 2],
       [3, 4]])
>>> Z[0, 0]
1
>>> Z[0, 1]
2

And so on

Note that indices are still zero-based, to maintain compatibility with Python sequences

Columns and rows can be extracted as follows

>>> Z[0,:]   # First row
array([1, 2])
>>> Z[:,1]   # Second column
array([2, 4])

NumPy arrays of integers can be used to extract elements:

>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> indices = np.array((0, 2, 3))
>>> Z[indices]
array([ 2. ,  3. ,  3.5])

We can also use an array of dtype bool to extract elements

>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> d = np.array([0, 1, 1, 0, 0], dtype=bool)
>>> d
array([False,  True,  True, False, False], dtype=bool)
>>> Z[d]
array([ 2.5,  3. ])

An aside: all elements of an array can be set equal to one number using slice notation:

>>> Z = np.empty(3)
>>> Z
array([  3.45740277e-269,  -6.35113306e-040,   5.88173752e-313])
>>> Z[:] = 42
>>> Z
array([ 42.,  42.,  42.])

Comparisons

In general, comparisons are done elementwise:

>>> Z = np.array([2, 3])
>>> Y = np.array([2, 3])
>>> Z == Y
array([ True,  True], dtype=bool)
>>> Y[0] = 5
>>> Z == Y
array([False,  True], dtype=bool)
>>> Z != Y
array([ True, False], dtype=bool)

The situation is similar for >, <, >= and <=

>>> Z > Y
array([False, False], dtype=bool)
>>> Z < Y
array([ True, False], dtype=bool)
>>> Z >= Y
array([False,  True], dtype=bool)
>>> Z <= Y
array([ True,  True], dtype=bool)

We can also do comparisons against scalars:

>>> Z = np.linspace(0, 10, 5)
>>> Z
array([  0. ,   2.5,   5. ,   7.5,  10. ])
>>> Z > 3
array([False, False,  True,  True,  True], dtype=bool)

This is particularly useful for conditional extraction

>>> b = Z > 3
>>> b
array([False, False,  True,  True,  True], dtype=bool)
>>> Z[b]
array([  5. ,   7.5,  10. ])   # All elements of Z that are > 3

Of course we can do this in one step

>>> Z[Z > 3]   
array([  5. ,   7.5,  10. ])

Algebraic Operations in NumPy

The algebraic operators +, -, *, / and ** all act elementwise on arrays

>>> A = np.array([1, 2, 3, 4])
>>> B = np.array([5, 6, 7, 8])
>>> A + B
array([ 6,  8, 10, 12])
>>> A * B
array([ 5, 12, 21, 32])

We can also add a scalar to each element as follows

>>> A + 10   
array([11, 12, 13, 14])

Scalar multiplication is similar

>>> A = np.array([1, 2, 3, 4])
>>> A * 10
array([10, 20, 30, 40])

The two dimensional arrays follow the same general rules:

>>> A = np.ones((2, 2))
>>> B = np.ones((2, 2))
>>> A + B
array([[ 2.,  2.],
       [ 2.,  2.]])
>>> A + 10
array([[ 11.,  11.],
       [ 11.,  11.]])
>>> A * B
array([[ 1.,  1.],
       [ 1.,  1.]])

Note that A * B is not the matrix product, it is an elementwise product

To do matrix multiplication we use the dot() function

>>> A
array([[ 1.,  1.],
       [ 1.,  1.]])
>>> B
array([[ 1.,  1.],
       [ 1.,  1.]])
>>> np.dot(A, B)
array([[ 2.,  2.],
       [ 2.,  2.]])

With dot() we can take the inner product of two flat arrays

>>> A = np.array([1, 2])
>>> B = np.array([10, 20])
>>> np.dot(A, B)   # Returns a scalar in this case
50

We can use dot() with a Python list or tuple

>>> A = np.empty((2, 2))
>>> A
array([[ -2.41124330e-039,   0.00000000e+000],
       [  2.12199579e-314,   9.88131292e-324]])
>>> np.dot(A, (0, 1))
array([  0.00000000e+000,   9.88131292e-324])

Here dot() knows we are postmultiplying, so (0, 1) is treated as a column vector

I could go on but you get the general idea...

Array Methods

Here are some useful methods of the ndarray class

>>> A = np.array((4, 3, 2, 1))
>>> A
array([4, 3, 2, 1])
>>> A.sort()              # Sorts A in place
>>> A
array([1, 2, 3, 4])        
>>> A.sum()               # Sum
10
>>> A.mean()              # Mean
2.5
>>> A.max()               # Max
4
>>> A.argmax()            # Returns the index of the maximal element
3
>>> A.cumsum()            # Cumulative sum of the elements of A
array([ 1,  3,  6, 10])
>>> A.cumprod()           # Cumulative product of the elements of A
array([ 1,  2,  6, 24])
>>> A.var()               # Variance
1.25
>>> A.std()               # Standard deviation
1.1180339887498949
>>> A.shape = (2, 2)
>>> A.T                   # Equivalent to A.transpose()
array([[1, 3],
       [2, 4]])

A slightly more complicated method is searchsorted()

>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. ,  2.5,  3. ,  3.5,  4. ])
>>> Z.searchsorted(2.2)
1
>>> Z.searchsorted(2.5)
1
>>> Z.searchsorted(2.6)
2

Note that these methods have equivalent functions in the NumPy namespace

>>> A = np.array((4, 3, 2, 1))
>>> np.sum(A)    #  Use NumPy's sum(). The built-in sum() is slower.
10
>>> np.mean(A)
2.5
>>> np.sort(A)
array([1, 2, 3, 4])

Exercises

Problem 1

Consider evaluating the polynomial



at some x given the vector of coefficients

We can do this using np.poly1d, but for the sake of the exercise don't use this class

We also want to avoid for and while loops, which are slow

Write a function which takes the vector of coefficients and a number x, and returns the value p(x)

Solution

import numpy as np

def p(coef, x): 
    X = np.empty(len(coef))
    X[0] = 1 
    X[1:] = x 
    Y = np.cumprod(X)   # Y = [1, x, x**2,...]
    return np.dot(coef, Y)