Introduction to Data Science¶

1MS041, 2024¶

©2024 Raazesh Sainudiin, Benny Avelin. Attribution 4.0 International (CC BY 4.0)

Standard error simulations¶

Balance scale problem¶

With the help of a balance scale we have measured the weights of two items with true weights $m_1,m_2$. We have measured $m_1,m_2,m_1-m_2,m_1+m_2$. Whenever we measure we make a measurement error with standard deviation $\sigma$. $X_1,X_2,X_3,X_4$ are the four measurements

  • We can use $X_1$ as an estimate for $m_1$
  • We could also use $(X_1-X_2+X_4)/2$ as an estimate for $m_1$
  • To estimate $m_1+m_2$ we could use $(X_3+X_4)/2$.
  • To estimate $m_1+m_2$ we could also use $(X_1+X_2+X_3)/2$
In [55]:
import numpy as np
def experiment():
    m1 = 25
    m2 = 16

    err = np.random.normal(size=4)
    X1 = m1+err[0]
    X2 = m2+err[1]
    X3 = m1-m2+err[2]
    X4 = m1+m2+err[3]

    theta1 = X1
    theta2 = (X1-X2+X4)/2
    theta3 = (X3 + X4)/2
    theta4 = (X1 + X2 + X3)/2

    return (theta1,theta2,theta3,theta4)
In [57]:
experiments = np.array([experiment() for i in range(10000)])
In [58]:
experiments.shape
Out[58]:
(10000, 4)
In [61]:
import matplotlib.pyplot as plt
for i in range(experiments.shape[1]):
    plt.figure(figsize=(3,3))
    _=plt.hist(experiments[:,i],bins=20)
    plt.xlim(22,28)
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
In [62]:
np.var(experiments,axis=0)
Out[62]:
array([1.00503568, 0.73489716, 0.50705038, 0.75441519])

Reliability of parallel systems¶

A reliability system consists of two parallell circuits which break independently of each other and has the probabilities $p_1$ and $p_2$ to break during a week. The weekwise probability that the system breaks is thus $p_1p_2$. We now test $n$ such systems and find that the first circuit breaks $x_1$ times and the second system breaks $x_2$ times and that the full system has broken down $x$ times. Two estimators for $p_1p_2$ has been proposed. \begin{align*} p^\ast &:= \frac{x}{n} \\ \hat p &:= \frac{x_1}{n}\frac{x_2}{n} \end{align*} Show that they are unbiased and compute the corresponding variances.

In [63]:
import numpy as np
def experiment(n=100):
    p1 = 0.1
    p2 = 0.3

    X1 = np.random.binomial(1,p1,size=n)
    X2 = np.random.binomial(1,p2,size=n)

    past = np.mean(X1*X2)
    phat = np.mean(X1)*np.mean(X2)
    return (past,phat)
In [64]:
experiments = np.array([experiment() for i in range(1000)])
In [65]:
true_value = 0.1*0.3
true_value
Out[65]:
0.03
In [48]:
np.mean(experiments,axis=0)
Out[48]:
array([0.02971  , 0.0297426])
In [54]:
np.std(experiments,axis=0)
Out[54]:
array([0.01732674, 0.01040136])