Introduction to Data Science

1MS041, 2022

©2022 Raazesh Sainudiin, Benny Avelin. Attribution 4.0 International (CC BY 4.0)

Common discrete random variables

Bernoulli random variable

Single trial with success probability $p$.

In [1]:
from Utils import plotEMF
In [2]:
p = 0.1
plotEMF([(0,p),(1,1-p)])
In [3]:
import numpy as np
In [4]:
np.random.randint(0,2,size=10)
Out[4]:
array([0, 0, 1, 1, 1, 1, 1, 1, 0, 1])
In [5]:
from Utils import plotEDF,emfToEdf
In [6]:
plotEDF(emfToEdf([(0,0),(0,p),(1,1-p)]))

Binomial random variable

If we do $n$ trials with success probability $p$, then the binomial random variable is the number of successes. The PMF is $$ f(x) = {n \choose x} p^x (1-p)^{n-x} $$ Can only produce numbers $0,1,\ldots,n$.

In [7]:
from scipy.special import binom as binomial
n = 20
p = 0.5
plotEMF([(i,binomial(n,i)*(p**i)*((1-p)**(n-i))) for i in range(n)])
In [8]:
np.random.binomial(20,0.5,size=10)
Out[8]:
array([ 7,  9,  7, 11, 11,  8,  8,  8,  9, 10])
In [9]:
plotEDF(emfToEdf([(i,binomial(n,i)*(p**i)*((1-p)**(n-i))) for i in range(n)]))

Poisson random variable

Pois($\lambda$) where $\lambda \in (0,\infty)$ is called the rate $$ f(x) = \frac{\lambda^x e^{-\lambda}}{x!} $$

In [10]:
from scipy.special import factorial
from math import exp
l = 2
plotEMF([(i,l**i*exp(-l)/factorial(i)) for i in range(10)])
In [11]:
np.random.poisson(2,size=10)
Out[11]:
array([1, 3, 2, 1, 5, 0, 2, 2, 1, 3])
In [12]:
plotEDF(emfToEdf([(i,l**i*exp(-l)/factorial(i)) for i in range(10)]))

Empirical means

In [13]:
from random import randint

def X():
    """Produces a single random number from DeMoivre(1/3,1/3,1/3)"""
    return randint(0,2)

def empirical_mean(n=1):
    """Produces the empirical mean of n experiments of the X above"""
    Z = [X() for i in range(n)]
    return sum(Z)/n
In [14]:
# Run this to get an observation of X and rerun for another
X()
Out[14]:
0
In [15]:
# Run this to get an observation of the empirical mean of X
# when doing 10 experiments
empirical_mean(100)
Out[15]:
0.94

Common continuous random variables

The uniform [0,1] random variable

In this case we have

$$ f(x) = \begin{cases} 1 & \text{if } 0 \leq x \leq 1 \\ 0 & \text{otherwise} \end{cases} $$

Also, for $x \in [0,1]$ we have

$$ F(x) = \int_{-\infty}^x f(v) dv = \int_0^x dv = x $$
500px-Uniform_Distribution_PDF_SVG.svg.png wikipedia image 500px-Uniform_cdf.svg.png
In [16]:
import matplotlib.pyplot as plt
In [17]:
x = np.random.uniform(0,1,size=1000000)
In [18]:
plt.hist(x,density=True)
Out[18]:
(array([0.9976107 , 1.0003007 , 0.9973307 , 0.9995207 , 1.0000907 ,
        0.9975607 , 1.00215071, 1.00900071, 0.9992407 , 0.9972007 ]),
 array([3.28943159e-07, 1.00000258e-01, 2.00000188e-01, 3.00000118e-01,
        4.00000047e-01, 4.99999977e-01, 5.99999906e-01, 6.99999836e-01,
        7.99999765e-01, 8.99999695e-01, 9.99999624e-01]),
 <BarContainer object of 10 artists>)
In [19]:
from Utils import makeEDF,makeEMF
plotEMF(makeEMF(np.random.uniform(size=100)))
In [21]:
import numpy as np
from Utils import makeEDF,makeEMF,plotEDF
plotEDF(makeEDF(np.random.uniform(size=100)))

The Gaussian random variable (Normal)

In this case we have $$ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2} \left ( \frac{x-\mu}{\sigma}\right )} $$ here we have two parameters, the mean $\mu$ and the standard deviation $\sigma$.

In [22]:
np.random.normal(size=10)
Out[22]:
array([-0.01565326, -1.37283243, -0.79421853, -0.6526662 , -0.42412326,
        0.39694522,  0.53652667, -0.03408596,  1.2976797 ,  0.69897693])
In [23]:
_=plt.hist(np.random.normal(size=100000),bins=200)