Go to top, next, previous, or johnstachurski.net

More NumPy

Additional features of NumPy

Linear Algebra

NumPy provides a module for linear algebra called linalg

This is a sub-module of NumPy

The module provides functions for computing determinants, eigenvalues, etc.

>>> A = np.array([[1, 2], [3, 4]])
>>> np.linalg.det(A)           # Compute the determinant
-2.0
>>> np.linalg.inv(A)           # Compute the inverse
array([[-2. ,  1. ],
       [ 1.5, -0.5]])
>>> np.linalg.eigvals(A)       # Compute eigenvalues
array([-0.37228132+0.j,  5.37228132+0.j])

To solve the linear system A x = b for x we can use solve()

>>> A
array([[1, 2],
       [3, 4]])
>>> b = (5, 6)
>>> np.linalg.solve(A, b)
array([-4. ,  4.5])

Another alternative is to invert A

>>> np.dot(np.linalg.inv(A), b)
array([-4. ,  4.5])

But solve() is usually more efficient

Random Sampling

NumPy has a sub-module called random

Populates ndarrays with samples of various distributions

For example, to generate 3 draws from the beta distribution



we use random.beta

>>> a = 0.5
>>> b = 0.5
>>> np.random.beta(a, b, size=3)
array([ 0.23607443,  0.90941746,  0.49067643])

Binomial distribution:

>>> y = np.random.binomial(10, 0.5, size=1000)    # 1,000 observations of Bin(10, 0.5)
>>> y.mean()
5.1109999999999998
>>> y = np.random.binomial(10, 0.8, size=1000)
>>> y.mean()
7.9909999999999997

The normal distribution can be accessed via np.random.normal, but there is also the convenience function randn, which gives standard normals

>>> Z = np.random.randn(10000)
>>> Z = 5 * Z + 3   # Each element is now N(3, 25)
>>> Z.mean()
2.997125110268561
>>> Z.var()
24.970266678202606

All the other standard distributions can be found in np.random

Other Goodies

NumPy provides some vectorized functions, which act elementwise on arrays

>>> Z = np.linspace(2, 4, 4)
>>> np.log(Z)
array([ 0.69314718,  0.98082925,  1.2039728 ,  1.38629436])
>>> np.exp(Z)
array([  7.3890561 ,  14.3919161 ,  28.03162489,  54.59815003])
>>> np.cos(Z)
array([-0.41614684, -0.88932657, -0.981674  , -0.65364362])
>>> np.sin(Z)
array([ 0.90929743,  0.45727263, -0.19056796, -0.7568025 ])

The advantage is that we avoid looping

Note that not all Python functions will be vectorized

>>> f = lambda x: -1 if x < 0 else 1
>>> f(-2)
-1
>>> f(3)
1
>>> Z = np.array([-2, 0, 2])
>>> f(Z)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
ValueError: The truth value...

We can vectorize f using np.vectorize()

>>> f = np.vectorize(f)
>>> f(Z)
array([-1,  1,  1])

In addition

See docstrings for details

A Note on Pylab and Matplotlib

A quick aside on the relationship between Pylab and Matplotlib

One way to do plots with Matplotlib is like this

import pylab
pylab.plot([1, 2, 3])
pylab.show()

The same can be achieved by

import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
plt.show()

What is the difference?

We can see the difference in the pylab initialization file:

## some stuff
from numpy import *
from numpy.fft import *
from numpy.random import *
from numpy.linalg import *
## some more stuff
from matplotlib.pyplot import *
## some more stuff

Thus, import pylab brings in

The plotting functions are in matplotlib.pyplot

Exercises

Here is a common problem in simulation

Let's say we have an array q which represents a probability mass function

We want to generate a random variable X such that

The standard (inverse transform) algorithm is as follows

The probability of drawing i is the length of Ii, which is equal to q[i]

EDTC provides some code for this operation

from random import uniform

def sample(q):
    """
    Returns i with probability q[i], where q is an array 
    (e.g., list or tuple).
    """
    a = 0.0
    U = uniform(0,1)  
    N = len(q)
    for i in range(N):
        if a < U <= a + q[i]:
            return i
        a = a + q[i]

Can you see how it works?

Problem 1

Make the following two improvements to this code:

  1. Speed it up using NumPy, avoiding for and while loops
    • Hint: Use searchsorted() and cumsum()
  2. Implement as a class, where each instance stores its own value of q
    • Give the class a draw() method, which returns one draw from {0,...,len(q)-1} according to q
      • Or even better, write the method so that draw(n) returns n draws from q

Solution:

## Filename: discreterv.py
## Author: John Stachurski

from numpy import cumsum
from numpy.random import uniform

class DiscreteRV:
    """
    Each instance is provided with an array of probabilities q. 
    The draw() method returns x with probability q[x].
    """

    def __init__(self, q):
        self.set_q(q)

    def set_q(self, q):
        self.Q = cumsum(q)   # Cumulative sum

    def draw(self, n=1):
        """
        Returns n draws from q
        """
        return self.Q.searchsorted(uniform(0, 1, size=n)) 

A bit tricky, but with some thought you will be able to understand it

Problem 2

In a previous lecture and EDTC, section 6.1.2, I discuss the empirical (cumulative) distribution function



Thus Fn(x) is the fraction of the sample that falls below x.

This function approximates the true distribution function F

The ECDF is implemented as follows:

## Filename: ecdf.py
## Author: John Stachurski

class ECDF:

    def __init__(self, observations):
        self.observations = observations

    def __call__(self, x):
        counter = 0.0
        for obs in self.observations:
            if obs <= x:
                counter += 1
        return counter / len(self.observations)

Here's an example of usage:

from ecdf import ECDF
samples = np.random.uniform(0, 1, size=10)
F = ECDF(samples)
F(0.5)  # Returned 0.29
F.observations = np.random.uniform(0, 1, size=1000)
F(0.5)  # Returned 0.479

Our implementation is not very efficient because it uses a for loop

Problem:

Solution:

## Filename: ecdf2.py
## Author: John Stachurski

import numpy as np
import matplotlib.pyplot as plt

class ECDF:

    def __init__(self, observations):
        "Parameters: observations is a NumPy array."
        self.observations = observations

    def __call__(self, x): 
        try:
            return (self.observations <= x).mean()
        except AttributeError:
            # self.observations probably needs to be converted to
            # a NumPy array
            self.observations = np.array(self.observations)
            return (self.observations <= x).mean()

    def plot(self, a=None, b=None): 
        if a == None:
            # Set up a reasonable default
            a = self.observations.min() - self.observations.std()
        if b == None:
            # Set up a reasonable default
            b = self.observations.max() + self.observations.std()
        X = np.linspace(a, b, num=100)
        f = np.vectorize(self.__call__)
        plt.plot(X, f(X))
        plt.show()