NumPy is a first class library for numerical programming
Numerical libraries such as NumPy provide code for common numerical operations
In non-compiled languages such as Python, MATLAB, etc., there is an extra value:
These languages are inherently slower than compiled languages like C and Fortran
Using numerical libraries, individual operations on large arrays can be shifted out to fast compiled code
Used by scientists all around the world
Mature, fast and and stable
NumPy can be installed individually or as part of SciPy
NumPy provides the definition of a numerical array class, plus some utilities
The utilities are Python interfaces to fast numerical routines compiled from C and Fortran
The rest of the lecture assumes that NumPy has been imported as np
>>> import numpy as np
Although this lecture cannot cover all the details of NumPy, remember
help(np.poly1d)
np.poly1d?
Let's now turn to the basics of NumPy arrays
Internally, a NumPy array is an instance of numpy.ndarray (np.ndarray)
Like a native Python list, except that
dtypes) provided by NumPy
Some dtypes:
| bool | 8 bit True or False |
| int32 | 32 bit integer |
| int64 | 64 bit integer |
| float32 | 32 bit floating point number |
| float64 | 64 bit floating point number |
There are also dtypes to represent complex numbers, unsigned integers, etc
At present, the default dtype for arrays is float64
Here's an example
>>> Z = np.zeros(10)
>>> Z
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> type(Z)
<type 'numpy.ndarray'>
>>> type(Z[0])
<type 'numpy.float64'>
zeros() is a NumPy function that returns an ndarray of zeros
If we had wanted to use integers we could have specified as follows:
>>> Z = np.zeros(10, dtype=int) # Here 'int' is equivalent to 'np.int32'
>>> Z
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
There's no problem mixing native Python numeric types and NumPy dtypes:
>>> Z = np.zeros(1)
>>> Z
array([ 0.])
>>> Z[0] = Z[0] + 1 # Add a Python int to a NumPy float64
>>> Z
array([ 1.]) # The int is upcast to a float64 before addition
>>> type(Z[0])
<type 'numpy.float64'>
Suppose we create an array such as
>>> Z = np.zeros(10)
At present, Z has no dimension:
The dimension is recorded in the shape attribute, which is a tuple
>>> Z.shape
(10,)
The shape tuple has only one element, which is the length of the array
To give it dimension, we need to change the shape attribute
>>> Z.shape = (10, 1) # Column vector: 10 rows and 1 column
>>> Z
array([[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 0.]])
>>> Z = np.zeros(4)
>>> Z.shape = (2, 2)
>>> Z
array([[ 0., 0.],
[ 0., 0.]])
In the last case, we could also pass a tuple to the zeros() function
>>> Z = np.zeros((2, 2))
>>> Z.shape
(2, 2)
>>> Z
array([[ 0., 0.],
[ 0., 0.]])
NumPy arrays can be created from Python lists, tuples, etc. using array() or asarray()
>>> Z = np.array([10, 20]) # ndarray from Python list
>>> Z
array([10, 20])
>>> type(Z)
<type 'numpy.ndarray'>
>>> Z = np.array((10, 20), dtype=float) # Here 'float' is equivalent to 'np.float64'
>>> Z
array([ 10., 20.])
>>> Z = np.array([[1, 2], [3, 4]]) # 2D array from a list of lists
>>> Z
array([[1, 2],
[3, 4]])
>>> Z.shape
(2, 2)
To read in the data from a text file use loadtxt()
Here is the text file test.txt
1 2
3 4
Now
>>> Z = np.loadtxt('test.txt')
>>> Z
array([[ 1., 2.],
[ 3., 4.]])
In addition there are utility functions for creating standard arrays
zeros()
empty()
>>> Z = np.empty(3)
>>> Z
array([ -2.41128814e-039, 1.48539705e-313, -8.30185262e-040])
What are those numbers, by the way?
float64
To generate an array of ones, use ones():
>>> Z = np.ones((4, 4))
>>> Z
array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
To set up a grid of evenly spaced numbers use linspace()
>>> Z = np.linspace(2, 4, 5) # From 2 to 4, with 5 elements
>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> Z = np.linspace(2, 4, 6) # From 2 to 4, with 6 elements
>>> Z
array([ 2. , 2.4, 2.8, 3.2, 3.6, 4. ])
To create an identity matrix use identity()
>>> Z = np.identity(2)
>>> Z
array([[ 1., 0.],
[ 0., 1.]])
For a flat array, indexing is the same as Python sequences:
>>> Z = np.linspace(1, 2, 5)
>>> Z
array([ 1. , 1.25, 1.5 , 1.75, 2. ])
>>> Z[0]
1.0
>>> Z[0:2]
array([ 1. , 1.25])
>>> Z[-1]
2.0
For 2D arrays the syntax is as follows:
>>> Z = np.array([[1, 2], [3, 4]])
>>> Z
array([[1, 2],
[3, 4]])
>>> Z[0, 0]
1
>>> Z[0, 1]
2
And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows
>>> Z[0,:] # First row
array([1, 2])
>>> Z[:,1] # Second column
array([2, 4])
NumPy arrays of integers can be used to extract elements:
>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> indices = np.array((0, 2, 3))
>>> Z[indices]
array([ 2. , 3. , 3.5])
We can also use an array of dtype bool to extract elements
>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> d = np.array([0, 1, 1, 0, 0], dtype=bool)
>>> d
array([False, True, True, False, False], dtype=bool)
>>> Z[d]
array([ 2.5, 3. ])
An aside: all elements of an array can be set equal to one number using slice notation:
>>> Z = np.empty(3)
>>> Z
array([ 3.45740277e-269, -6.35113306e-040, 5.88173752e-313])
>>> Z[:] = 42
>>> Z
array([ 42., 42., 42.])
In general, comparisons are done elementwise:
>>> Z = np.array([2, 3])
>>> Y = np.array([2, 3])
>>> Z == Y
array([ True, True], dtype=bool)
>>> Y[0] = 5
>>> Z == Y
array([False, True], dtype=bool)
>>> Z != Y
array([ True, False], dtype=bool)
The situation is similar for >, <, >= and <=
>>> Z > Y
array([False, False], dtype=bool)
>>> Z < Y
array([ True, False], dtype=bool)
>>> Z >= Y
array([False, True], dtype=bool)
>>> Z <= Y
array([ True, True], dtype=bool)
We can also do comparisons against scalars:
>>> Z = np.linspace(0, 10, 5)
>>> Z
array([ 0. , 2.5, 5. , 7.5, 10. ])
>>> Z > 3
array([False, False, True, True, True], dtype=bool)
This is particularly useful for conditional extraction
>>> b = Z > 3
>>> b
array([False, False, True, True, True], dtype=bool)
>>> Z[b]
array([ 5. , 7.5, 10. ]) # All elements of Z that are > 3
Of course we can do this in one step
>>> Z[Z > 3]
array([ 5. , 7.5, 10. ])
The algebraic operators +, -, *, / and ** all act elementwise on arrays
>>> A = np.array([1, 2, 3, 4])
>>> B = np.array([5, 6, 7, 8])
>>> A + B
array([ 6, 8, 10, 12])
>>> A * B
array([ 5, 12, 21, 32])
We can also add a scalar to each element as follows
>>> A + 10
array([11, 12, 13, 14])
Scalar multiplication is similar
>>> A = np.array([1, 2, 3, 4])
>>> A * 10
array([10, 20, 30, 40])
The two dimensional arrays follow the same general rules:
>>> A = np.ones((2, 2))
>>> B = np.ones((2, 2))
>>> A + B
array([[ 2., 2.],
[ 2., 2.]])
>>> A + 10
array([[ 11., 11.],
[ 11., 11.]])
>>> A * B
array([[ 1., 1.],
[ 1., 1.]])
Note that A * B is not the matrix product, it is an elementwise product
To do matrix multiplication we use the dot() function
>>> A
array([[ 1., 1.],
[ 1., 1.]])
>>> B
array([[ 1., 1.],
[ 1., 1.]])
>>> np.dot(A, B)
array([[ 2., 2.],
[ 2., 2.]])
With dot() we can take the inner product of two flat arrays
>>> A = np.array([1, 2])
>>> B = np.array([10, 20])
>>> np.dot(A, B) # Returns a scalar in this case
50
We can use dot() with a Python list or tuple
>>> A = np.empty((2, 2))
>>> A
array([[ -2.41124330e-039, 0.00000000e+000],
[ 2.12199579e-314, 9.88131292e-324]])
>>> np.dot(A, (0, 1))
array([ 0.00000000e+000, 9.88131292e-324])
Here dot() knows we are postmultiplying, so (0, 1) is treated as a column vector
I could go on but you get the general idea...
Here are some useful methods of the ndarray class
>>> A = np.array((4, 3, 2, 1))
>>> A
array([4, 3, 2, 1])
>>> A.sort() # Sorts A in place
>>> A
array([1, 2, 3, 4])
>>> A.sum() # Sum
10
>>> A.mean() # Mean
2.5
>>> A.max() # Max
4
>>> A.argmax() # Returns the index of the maximal element
3
>>> A.cumsum() # Cumulative sum of the elements of A
array([ 1, 3, 6, 10])
>>> A.cumprod() # Cumulative product of the elements of A
array([ 1, 2, 6, 24])
>>> A.var() # Variance
1.25
>>> A.std() # Standard deviation
1.1180339887498949
>>> A.shape = (2, 2)
>>> A.T # Equivalent to A.transpose()
array([[1, 3],
[2, 4]])
A slightly more complicated method is searchsorted()
Z is a nondecreasing array, then
Z.searchsorted(a) returns index of first z in Z such that z >= a
>>> Z = np.linspace(2, 4, 5)
>>> Z
array([ 2. , 2.5, 3. , 3.5, 4. ])
>>> Z.searchsorted(2.2)
1
>>> Z.searchsorted(2.5)
1
>>> Z.searchsorted(2.6)
2
Note that these methods have equivalent functions in the NumPy namespace
>>> A = np.array((4, 3, 2, 1))
>>> np.sum(A) # Use NumPy's sum(). The built-in sum() is slower.
10
>>> np.mean(A)
2.5
>>> np.sort(A)
array([1, 2, 3, 4])
Problem 1
Consider evaluating the polynomial
at some x given the vector of coefficients
We can do this using np.poly1d, but for the sake of the exercise don't use this class
We also want to avoid for and while loops, which are slow
Write a function which takes the vector of coefficients and a number x, and returns the value p(x)
np.cumprod()
Solution
import numpy as np
def p(coef, x):
X = np.empty(len(coef))
X[0] = 1
X[1:] = x
Y = np.cumprod(X) # Y = [1, x, x**2,...]
return np.dot(coef, Y)