Additional features of NumPy
NumPy provides a module for linear algebra called linalg
This is a sub-module of NumPy
np.linalg.attribute_nameThe module provides functions for computing determinants, eigenvalues, etc.
>>> A = np.array([[1, 2], [3, 4]])
>>> np.linalg.det(A) # Compute the determinant
-2.0
>>> np.linalg.inv(A) # Compute the inverse
array([[-2. , 1. ],
[ 1.5, -0.5]])
>>> np.linalg.eigvals(A) # Compute eigenvalues
array([-0.37228132+0.j, 5.37228132+0.j])
To solve the linear system A x = b for x we can use solve()
>>> A
array([[1, 2],
[3, 4]])
>>> b = (5, 6)
>>> np.linalg.solve(A, b)
array([-4. , 4.5])
Another alternative is to invert A
>>> np.dot(np.linalg.inv(A), b)
array([-4. , 4.5])
But solve() is usually more efficient
NumPy has a sub-module called random
Populates ndarrays with samples of various distributions
For example, to generate 3 draws from the beta distribution
we use random.beta
>>> a = 0.5
>>> b = 0.5
>>> np.random.beta(a, b, size=3)
array([ 0.23607443, 0.90941746, 0.49067643])
Binomial distribution:
>>> y = np.random.binomial(10, 0.5, size=1000) # 1,000 observations of Bin(10, 0.5)
>>> y.mean()
5.1109999999999998
>>> y = np.random.binomial(10, 0.8, size=1000)
>>> y.mean()
7.9909999999999997
The normal distribution can be accessed via np.random.normal, but there
is also the convenience function randn, which gives standard normals
>>> Z = np.random.randn(10000)
>>> Z = 5 * Z + 3 # Each element is now N(3, 25)
>>> Z.mean()
2.997125110268561
>>> Z.var()
24.970266678202606
All the other standard distributions can be found in np.random
dir(np.random)NumPy provides some vectorized functions, which act elementwise on arrays
>>> Z = np.linspace(2, 4, 4)
>>> np.log(Z)
array([ 0.69314718, 0.98082925, 1.2039728 , 1.38629436])
>>> np.exp(Z)
array([ 7.3890561 , 14.3919161 , 28.03162489, 54.59815003])
>>> np.cos(Z)
array([-0.41614684, -0.88932657, -0.981674 , -0.65364362])
>>> np.sin(Z)
array([ 0.90929743, 0.45727263, -0.19056796, -0.7568025 ])
The advantage is that we avoid looping
Note that not all Python functions will be vectorized
>>> f = lambda x: -1 if x < 0 else 1
>>> f(-2)
-1
>>> f(3)
1
>>> Z = np.array([-2, 0, 2])
>>> f(Z)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <lambda>
ValueError: The truth value...
We can vectorize f using np.vectorize()
>>> f = np.vectorize(f)
>>> f(Z)
array([-1, 1, 1])
In addition
np.interp provides linear interpolation in 1 dim np.poly1d is a class for manipulating polynomials np.histogram gives histogram informationSee docstrings for details
A quick aside on the relationship between Pylab and Matplotlib
One way to do plots with Matplotlib is like this
import pylab
pylab.plot([1, 2, 3])
pylab.show()
The same can be achieved by
import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
plt.show()
What is the difference?
We can see the difference in the pylab initialization file:
## some stuff
from numpy import *
from numpy.fft import *
from numpy.random import *
from numpy.linalg import *
## some more stuff
from matplotlib.pyplot import *
## some more stuff
Thus, import pylab brings in
random, linalg, etc.)matplotlib.pyplotThe plotting functions are in matplotlib.pyplot
Here is a common problem in simulation
Let's say we have an array q which represents a probability mass
function
q.sum() equal to onelen(q) be equal to NWe want to generate a random variable X such that
X takes values in 0,...,N-1X = i is equal to q[i]The standard (inverse transform) algorithm is as follows
q[i]U on [0, 1] and i such that U is in IiThe probability of drawing i is the length of Iq[i]
EDTC provides some code for this operation
from random import uniform
def sample(q):
"""
Returns i with probability q[i], where q is an array
(e.g., list or tuple).
"""
a = 0.0
U = uniform(0,1)
N = len(q)
for i in range(N):
if a < U <= a + q[i]:
return i
a = a + q[i]
Can you see how it works?
q = [0.25, 0.75]U Problem 1
Make the following two improvements to this code:
for and while loopsqdraw() method, which returns one draw from {0,...,len(q)-1} according to qdraw(n) returns n draws from qSolution:
## Filename: discreterv.py
## Author: John Stachurski
from numpy import cumsum
from numpy.random import uniform
class DiscreteRV:
"""
Each instance is provided with an array of probabilities q.
The draw() method returns x with probability q[x].
"""
def __init__(self, q):
self.set_q(q)
def set_q(self, q):
self.Q = cumsum(q) # Cumulative sum
def draw(self, n=1):
"""
Returns n draws from q
"""
return self.Q.searchsorted(uniform(0, 1, size=n))
A bit tricky, but with some thought you will be able to understand it
Problem 2
In a previous lecture and EDTC, section 6.1.2, I discuss the empirical (cumulative) distribution function
Thus Fn(x) is the fraction of the sample that falls below x.
This function approximates the true distribution function F
The ECDF is implemented as follows:
## Filename: ecdf.py
## Author: John Stachurski
class ECDF:
def __init__(self, observations):
self.observations = observations
def __call__(self, x):
counter = 0.0
for obs in self.observations:
if obs <= x:
counter += 1
return counter / len(self.observations)
Here's an example of usage:
from ecdf import ECDF
samples = np.random.uniform(0, 1, size=10)
F = ECDF(samples)
F(0.5) # Returned 0.29
F.observations = np.random.uniform(0, 1, size=1000)
F(0.5) # Returned 0.479
Our implementation is not very efficient because it uses a for loop
Problem:
__call__ method more efficient using NumPy.Solution:
## Filename: ecdf2.py
## Author: John Stachurski
import numpy as np
import matplotlib.pyplot as plt
class ECDF:
def __init__(self, observations):
"Parameters: observations is a NumPy array."
self.observations = observations
def __call__(self, x):
try:
return (self.observations <= x).mean()
except AttributeError:
# self.observations probably needs to be converted to
# a NumPy array
self.observations = np.array(self.observations)
return (self.observations <= x).mean()
def plot(self, a=None, b=None):
if a == None:
# Set up a reasonable default
a = self.observations.min() - self.observations.std()
if b == None:
# Set up a reasonable default
b = self.observations.max() + self.observations.std()
X = np.linspace(a, b, num=100)
f = np.vectorize(self.__call__)
plt.plot(X, f(X))
plt.show()