{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false }, "source": [ "# [Introduction to Data Science: A Comp-Math-Stat Approach](http://datascience-intro.github.io/1MS041-2020/)\n", "## 1MS041, 2020 \n", "©2020 Raazesh Sainudiin, Benny Avelin. [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 10. Convergence of Limits of Random Variables, Confidence Set Estimation and Testing\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Inference and Estimation: The Big Picture\n", "\n", "- [Limits](#Limits)\n", " - Limits of Sequences of Real Numbers\n", " - Limits of Functions\n", " - Limit of a Sequence of Random Variables\n", "- [Convergence in Distribution](#Convergence-in-Distribution)\n", "- [Convergence in Probability](#Convergence-in-Probability)\n", "- [Some Basic Limit Laws in Statistics](#Some-Basic-Limit-Laws-in-Statistics)\n", "- [Weak Law of Large Numbers](#Weak-Law-of-Large-Numbers)\n", "- [Central Limit Theorem](#Central-Limit-Theorem)\n", "- [Asymptotic Normality of the Maximum Likelihood Estimator](#Properties-of-the-MLE)\n", "- [Set Estimators - Confidence Intervals and Sets from Maximum Likelihood Estimators](#Confidence-Interval-and-Set-Estimation-from-MLE)\n", "- [Parametric Hypothesis Test - From Confidence Interval to Wald test](#Hypothesis-Testing)\n", " \n", "\n", "### Inference and Estimation: The Big Picture\n", "\n", "The Models and their maximum likelihood estimators we discussed earlier fit into our Big Picture, which is about inference and estimation and especially inference and estimation problems where computational techniques are helpful. \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
 Point estimationSet estimationHypothesis Testing
\n", "

Parametric

\n", "

 

\n", "
\n", "

MLE of finitely many parameters
done

\n", "
\n", "

Asymptotically Normal Confidence Intervals
about to see ...

\n", "
\n", "

Wald Test from Confidence Interval
about to see ...

\n", "
\n", "

Non-parametric
(infinite-dimensional parameter space)

\n", "
coming up ... coming up ... coming up ...
\n", "\n", "But before we move on we have to discuss what makes it all work: the idea of limits - where do you get to if you just keep going?\n", "\n", "## Limits\n", "\n", "We talked about the likelihood function and maximum likelihood estimators for making point estimates of model parameters. For example for the $Bernoulli(\\theta^*)$ RV (a $Bernoulli$ RV with true but possibly unknown parameter $\\theta^*$, we found that the likelihood function was $L_n(\\theta) = \\theta^{t_n}(1-\\theta)^{(n-t_n)}$ where $t_n = \\displaystyle\\sum_{i=1}^n x_i$. We also found the maxmimum likelihood estimator (MLE) for the $Bernoulli$ model, $\\widehat{\\theta}_n = \\frac{1}{n}\\displaystyle\\sum_{i=1}^n x_i$. \n", "\n", "We demonstrated these ideas using samples simulated from a $Bernoulli$ process with a secret $\\theta^*$. We had an interactive plot of the likelihood function where we could increase $n$, the number of simulated samples or the amount of data we had to base our estimate on, and see the effect on the shape of the likelihood function. The animation belows shows the changing likelihood function for the Bernoulli process with unknown $\\theta^*$ as $n$ (the amount of data) increases.\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", "
Likelihood function for Bernoulli process, as $n$ goes from 1 to 1000 in a continuous loop.
\n", "\n", "For large $n$, you can probably make your own guess about the true value of $\\theta^*$ even without knowing $t_n$. As the animation progresses, we can see the likelihood function 'homing in' on $\\theta = 0.3$. \n", "\n", "We can see this in another way, by just looking at the sample mean as $n$ increases. An easy way to do this is with running means: generate a very large sample and then calculate the mean first over just the first observation in the sample, then the first two, first three, etc etc (running means were discussed in an earlier worksheet if you want to go back and review them in detail in your own time). Here we just define a function so that we can easily generate sequences of running means for our $Bernoulli$ process with the unknown $\\theta^*$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Preparation: Let's just evaluate the next cell and focus on concepts.\n", "\n", "You can see what they are as you need to." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "def likelihoodBernoulli(theta, n, tStatistic):\n", " '''Bernoulli likelihood function.\n", " theta in [0,1] is the theta to evaluate the likelihood at.\n", " n is the number of observations.\n", " tStatistic is the sum of the n Bernoulli observations.\n", " return a value for the likelihood of theta given the n observations and tStatistic.'''\n", " retValue = 0 # default return value\n", " if (theta >= 0 and theta <= 1): # check on theta\n", " mpfrTheta = RR(theta) # make sure we use a Sage mpfr \n", " retValue = (mpfrTheta^tStatistic)*(1-mpfrTheta)^(n-tStatistic)\n", " return retValue\n", " \n", "def bernoulliFInverse(u, theta):\n", " '''A function to evaluate the inverse CDF of a bernoulli.\n", " \n", " Param u is the value to evaluate the inverse CDF at.\n", " Param theta is the distribution parameters.\n", " Returns inverse CDF under theta evaluated at u'''\n", " \n", " return floor(u + theta)\n", " \n", "def bernoulliSample(n, theta, simSeed=None):\n", " '''A function to simulate samples from a bernoulli distribution.\n", " \n", " Param n is the number of samples to simulate.\n", " Param theta is the bernoulli distribution parameter.\n", " Param simSeed is a seed for the random number generator, defaulting to 30.\n", " Returns a simulated Bernoulli sample as a list.'''\n", " \n", " set_random_seed(simSeed)\n", " us = [random() for i in range(n)]\n", " set_random_seed(None)\n", " return [bernoulliFInverse(u, theta) for u in us] # use bernoulliFInverse in a list comprehension\n", " \n", "def bernoulliSampleSecretTheta(n, theta=0.30, simSeed=30):\n", " '''A function to simulate samples from a bernoulli distribution.\n", " \n", " Param n is the number of samples to simulate.\n", " Param theta is the bernoulli distribution parameter.\n", " Param simSeed is a seed for the random number generator, defaulting to 30.\n", " Returns a simulated Bernoulli sample as a list.'''\n", " \n", " set_random_seed(simSeed)\n", " us = [random() for i in range(n)]\n", " set_random_seed(None)\n", " return [bernoulliFInverse(u, theta) for u in us] # use bernoulliFInverse in a list comprehension\n", "\n", "def bernoulliRunningMeans(n, myTheta, mySeed = None):\n", " '''Function to give a list of n running means from bernoulli with specified theta.\n", " \n", " Param n is the number of running means to generate.\n", " Param myTheta is the theta for the Bernoulli distribution\n", " Param mySeed is a value for the seed of the random number generator, defaulting to None.'''\n", " \n", " sample = bernoulliSample(n, theta=myTheta, simSeed = mySeed)\n", " from pylab import cumsum # we can import in the middle of code\n", " csSample = list(cumsum(sample))\n", " samplesizes = range(1, n+1,1)\n", " return [RR(csSample[i])/samplesizes[i] for i in range(n)]\n", " \n", "#return a plot object for BernoulliLikelihood using the secret theta bernoulli generator\n", "def plotBernoulliLikelihoodSecretTheta(n):\n", " '''Return a plot object for BernoulliLikelihood using the secret theta bernoulli generator.\n", " \n", " Param n is the number of simulated samples to generate and do likelihood plot for.'''\n", " \n", " thisBSample = bernoulliSampleSecretTheta(n) # make sample\n", " tn = sum(thisBSample) # summary statistic\n", " from pylab import arange\n", " ths = arange(0,1,0.01) # get some values to plot against\n", " liks = [likelihoodBernoulli(t,n,tn) for t in ths] # use the likelihood function to generate likelihoods\n", " redshade = 1*n/1000 # fancy colours\n", " blueshade = 1 - redshade\n", " return line(zip(ths, liks), rgbcolor = (redshade, 0, blueshade))\n", " \n", "def cauchyFInverse(u):\n", " '''A function to evaluate the inverse CDF of a standard Cauchy distribution.\n", " \n", " Param u is the value to evaluate the inverse CDF at.'''\n", " \n", " return RR(tan(pi*(u-0.5)))\n", " \n", "def cauchySample(n):\n", " '''A function to simulate samples from a standard Cauchy distribution.\n", " \n", " Param n is the number of samples to simulate.'''\n", " \n", " us = [random() for i in range(n)]\n", " return [cauchyFInverse(u) for u in us]\n", "\n", "def cauchyRunningMeans(n):\n", " '''Function to give a list of n running means from standardCauchy.\n", " \n", " Param n is the number of running means to generate.'''\n", " \n", " sample = cauchySample(n)\n", " from pylab import cumsum\n", " csSample = list(cumsum(sample))\n", " samplesizes = range(1, n+1,1)\n", " return [RR(csSample[i])/samplesizes[i] for i in range(n)]\n", "\n", "def twoRunningMeansPlot(nToPlot, iters):\n", " '''Function to return a graphics array containing plots of running means for Bernoulli and Standard Cauchy.\n", " \n", " Param nToPlot is the number of running means to simulate for each iteration.\n", " Param iters is the number of iterations or sequences of running means or lines on each plot to draw.\n", " Returns a graphics array object containing both plots with titles.'''\n", " xvalues = range(1, nToPlot+1,1)\n", " for i in range(iters):\n", " shade = 0.5*(iters - 1 - i)/iters # to get different colours for the lines\n", " bRunningMeans = bernoulliSecretThetaRunningMeans(nToPlot)\n", " cRunningMeans = cauchyRunningMeans(nToPlot)\n", " bPts = zip(xvalues, bRunningMeans)\n", " cPts = zip(xvalues, cRunningMeans)\n", " if (i < 1):\n", " p1 = line(bPts, rgbcolor = (shade, 0, 1))\n", " p2 = line(cPts, rgbcolor = (1-shade, 0, shade))\n", " cauchyTitleMax = max(cRunningMeans) # for placement of cauchy title\n", " else:\n", " p1 += line(bPts, rgbcolor = (shade, 0, 1))\n", " p2 += line(cPts, rgbcolor = (1-shade, 0, shade))\n", " if max(cRunningMeans) > cauchyTitleMax: cauchyTitleMax = max(cRunningMeans)\n", " titleText1 = \"Bernoulli running means\" # make title text\n", " t1 = text(titleText1, (nToGenerate/2,1), rgbcolor='blue',fontsize=10) \n", " titleText2 = \"Standard Cauchy running means\" # make title text\n", " t2 = text(titleText2, (nToGenerate/2,ceil(cauchyTitleMax)+1), rgbcolor='red',fontsize=10)\n", " return graphics_array((p1+t1,p2+t2))\n", "\n", "def pmfPointMassPlot(theta):\n", " '''Returns a pmf plot for a point mass function with parameter theta.'''\n", " \n", " ptsize = 10\n", " linethick = 2\n", " fudgefactor = 0.07 # to fudge the bottom line drawing\n", " pmf = points((theta,1), rgbcolor=\"blue\", pointsize=ptsize)\n", " pmf += line([(theta,0),(theta,1)], rgbcolor=\"blue\", linestyle=':')\n", " pmf += points((theta,0), rgbcolor = \"white\", faceted = true, pointsize=ptsize)\n", " pmf += line([(min(theta-2,-2),0),(theta-0.05,0)], rgbcolor=\"blue\",thickness=linethick)\n", " pmf += line([(theta+.05,0),(theta+2,0)], rgbcolor=\"blue\",thickness=linethick)\n", " pmf+= text(\"Point mass f\", (theta,1.1), rgbcolor='blue',fontsize=10)\n", " pmf.axes_color('grey') \n", " return pmf\n", " \n", "def cdfPointMassPlot(theta):\n", " '''Returns a cdf plot for a point mass function with parameter theta.'''\n", " \n", " ptsize = 10\n", " linethick = 2\n", " fudgefactor = 0.07 # to fudge the bottom line drawing\n", " cdf = line([(min(theta-2,-2),0),(theta-0.05,0)], rgbcolor=\"blue\",thickness=linethick) # padding\n", " cdf += points((theta,1), rgbcolor=\"blue\", pointsize=ptsize)\n", " cdf += line([(theta,0),(theta,1)], rgbcolor=\"blue\", linestyle=':')\n", " cdf += line([(theta,1),(theta+2,1)], rgbcolor=\"blue\", thickness=linethick) # padding\n", " cdf += points((theta,0), rgbcolor = \"white\", faceted = true, pointsize=ptsize)\n", " cdf+= text(\"Point mass F\", (theta,1.1), rgbcolor='blue',fontsize=10)\n", " cdf.axes_color('grey') \n", " return cdf\n", " \n", "def uniformFInverse(u, theta1, theta2):\n", " '''A function to evaluate the inverse CDF of a uniform(theta1, theta2) distribution.\n", " \n", " u, u should be 0 <= u <= 1, is the value to evaluate the inverse CDF at.\n", " theta1, theta2, theta2 > theta1, are the uniform distribution parameters.'''\n", " \n", " return theta1 + (theta2 - theta1)*u\n", "\n", "def uniformSample(n, theta1, theta2):\n", " '''A function to simulate samples from a uniform distribution.\n", " \n", " n > 0 is the number of samples to simulate.\n", " theta1, theta2 (theta2 > theta1) are the uniform distribution parameters.'''\n", " \n", " us = [random() for i in range(n)]\n", " \n", " return [uniformFInverse(u, theta1, theta2) for u in us]\n", "\n", "def exponentialFInverse(u, lam):\n", " '''A function to evaluate the inverse CDF of a exponential distribution.\n", " \n", " u is the value to evaluate the inverse CDF at.\n", " lam is the exponential distribution parameter.'''\n", " \n", " # log without a base is the natural logarithm\n", " return (-1.0/lam)*log(1 - u)\n", " \n", "def exponentialSample(n, lam):\n", " '''A function to simulate samples from an exponential distribution.\n", " \n", " n is the number of samples to simulate.\n", " lam is the exponential distribution parameter.'''\n", " \n", " us = [random() for i in range(n)]\n", " \n", " return [exponentialFInverse(u, lam) for u in us]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get back to our running means of Bernoullin RVs:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def bernoulliSecretThetaRunningMeans(n, mySeed = None):\n", " '''Function to give a list of n running means from Bernoulli with unknown theta.\n", " \n", " Param n is the number of running means to generate.\n", " Param mySeed is a value for the seed of the random number generator, defaulting to None\n", " Note: the unknown theta parameter for the Bernoulli process is defined in bernoulliSampleSecretTheta\n", " Return a list of n running means.'''\n", " \n", " sample = bernoulliSampleSecretTheta(n, simSeed = mySeed)\n", " from pylab import cumsum # we can import in the middle of code\n", " csSample = list(cumsum(sample))\n", " samplesizes = range(1, n+1,1)\n", " return [RR(csSample[i])/samplesizes[i] for i in range(n)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use this function to look at say 5 different sequences of running means (they will be different, because for each iteration, we will simulate a different sample of $Bernoulli$ observations). " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics object consisting of 5 graphics primitives" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nToGenerate = 1500\n", "iterations = 5\n", "xvalues = range(1, nToGenerate+1,1)\n", "for i in range(iterations):\n", " redshade = 0.5*(iterations - 1 - i)/iterations # to get different colours for the lines\n", " bRunningMeans = bernoulliSecretThetaRunningMeans(nToGenerate)\n", " pts = zip(xvalues,bRunningMeans)\n", " if (i == 0):\n", " p = line(pts, rgbcolor = (redshade,0,1))\n", " else:\n", " p += line(pts, rgbcolor = (redshade,0,1))\n", "show(p, figsize=[5,3], axes_labels=['n','sample mean'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we notice is how the different lines **converge** on a sample mean of close to 0.3. \n", "\n", "Is life always this easy? Unfortunately no. In the plot below we show the well-behaved running means for the $Bernoulli$ and beside them the running means for simulated standard $Cauchy$ random variables. They are all over the place, and each time you re-evaluate the cell you'll get different all-over-the-place behaviour. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics Array of size 1 x 2" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nToGenerate = 15000\n", "iterations = 5\n", "g = twoRunningMeansPlot(nToGenerate, iterations) # uses above function to make plot\n", "show(g,figsize=[10,5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We talked about the Cauchy in more detail in an earlier notebook. If you cannot recall the detail and are interested, go back to that in your own time. The message here is that although with the Bernoulli process, the sample means converge as the number of observations increases, with the Cauchy they do not. \n", "\n", "\n", "\n", "# Limits of a Sequence of Real Numbers\n", "\n", "A sequence of real numbers $x_1, x_2, x_3, \\ldots $ (which we can also write as $\\{ x_i\\}_{i=1}^\\infty$) is said to converge to a limit $a \\in \\mathbb{R}$,\n", "\n", "$$\\underset{i \\rightarrow \\infty}{\\lim} x_i = a$$\n", "\n", "if for every natural number $m \\in \\mathbb{N}$, a natural number $N_m \\in \\mathbb{N}$ exists such that for every $j \\geq N_m$, $\\left|x_j - a\\right| \\leq \\frac{1}{m}$\n", "\n", "What is this saying? $\\left|x_j - a\\right|$ is measuring the closeness of the $j$th value in the sequence to $a$. If we pick bigger and bigger $m$, $\\frac{1}{m}$ will get smaller and smaller. The definition of the limit is saying that if $a$ is the limit of the sequence then we can get the sequence to become as close as we want ('arbitrarily close') to $a$, and to stay that close, by going far enough into the sequence ('for every $j \\geq N_m$, $\\left|x_j - a\\right| \\leq \\frac{1}{m}$')\n", "\n", "($\\mathbb{N}$, the natural numbers, are just the 'counting numbers' $\\{1, 2, 3, \\ldots\\}$.)\n", "\n", " \n", "\n", "Take a trivial example, the sequence $\\{x_i\\}_{i=1}^\\infty = 17, 17, 17, \\ldots$\n", "\n", "Clearly, $\\underset{i \\rightarrow \\infty}{\\lim} x_i = 17$, but let's do this formally:\n", "\n", "For every $m \\in \\mathbb{N}$, take $N_m =1$, then\n", "\n", "$\\forall$ $j \\geq N_m=1, \\left|x_j -17\\right| = \\left|17 - 17\\right| = 0 \\leq \\frac{1}{m}$, as required.\n", "\n", "($\\forall$ is mathspeak for 'for all' or 'for every')\n", "\n", "\n", "\n", "What about $\\{x_i\\}_{i=1}^\\infty = \\displaystyle\\frac{1}{1}, \\frac{1}{2}, \\frac{1}{3}, \\ldots$, i.e., $x_i = \\frac{1}{i}$?\n", "\n", "$\\underset{i \\rightarrow \\infty}{\\lim} x_i = \\underset{i \\rightarrow \\infty}{\\lim}\\frac{1}{i} = 0$\n", "\n", "For every $m \\in \\mathbb{N}$, take $N_m = m$, then $\\forall$ $j \\geq m$, $\\left|x_j - 0\\right| \\leq \\left |\\frac{1}{m} - 0\\right| = \\frac{1}{m}$\n", "\n", "### YouTry\n", "\n", "Think about $\\{x_i\\}_{i=1}^\\infty = \\frac{1}{1^p}, \\frac{1}{2^p}, \\frac{1}{3^p}, \\ldots$ with $p > 0$. The limit$\\underset{i \\rightarrow \\infty}{\\lim} \\displaystyle\\frac{1}{i^p} = 0$, provided $p > 0$.\n", "\n", "You can draw the plot of this very easily using the Sage symbolic expressions we have already met (`f.subs(...)` allows us to substitute a particular value for one of the symbolic variables in the symbolic function `f`, in this case a value to use for $p$)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics object consisting of 1 graphics primitive" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "var('i, p')\n", "f = 1/(i^p)\n", "# make and show plot, note we can use f in the label\n", "plot(f.subs(p=1), (x, 0.1, 3), axes_labels=('i',f)).show(figsize=[6,3]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about $\\{x_i\\}_{i=1}^\\infty = 1^{\\frac{1}{1}}, 2^{\\frac{1}{2}}, 3^{\\frac{1}{3}}, \\ldots$. The limit$\\underset{i \\rightarrow \\infty}{\\lim} i^{\\frac{1}{i}} = 1$.\n", "\n", "This one is not as easy to see intuitively, but again we can plot it with SageMath." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics object consisting of 2 graphics primitives" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "var('i')\n", "f = i^(1/i)\n", "n=500\n", "p=plot(f.subs(p=1), (x, 0, n), axes_labels=('i',f)) # main plot\n", "p+=line([(0,1),(n,1)],linestyle=':') # add a dotted line at height 1\n", "p.show(figsize=[6,3]) # show the plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, $\\{x_i\\}_{i=1}^\\infty = p^{\\frac{1}{1}}, p^{\\frac{1}{2}}, p^{\\frac{1}{3}}, \\ldots$, with $p > 0$. The limit$\\underset{i \\rightarrow \\infty}{\\lim} p^{\\frac{1}{i}} = 1$ provided $p > 0$.\n", "\n", "You can cut and paste (with suitable adaptations) to try to plot this one as well ..." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "x" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(end of You Try)\n", "\n", "---\n", "\n", "*back to the real stuff ...*\n", "\n", "# Limits of Functions\n", "\n", "We say that a function $f(x): \\mathbb{R} \\rightarrow \\mathbb{R}$ has a limit $L \\in \\mathbb{R}$ as $x$ approaches $a$:\n", "\n", "$$\\underset{x \\rightarrow a}{\\lim} f(x) = L$$\n", "\n", "provided $f(x)$ is arbitrarily close to $L$ for all ($\\forall$) values of $x$ that are sufficiently close to but not equal to $a$.\n", "\n", "For example\n", "\n", "Consider the function $f(x) = (1+x)^{\\frac{1}{x}}$\n", "\n", "$\\underset{x \\rightarrow 0}{\\lim} f(x) = \\underset{x \\rightarrow 0}{\\lim} (1+x)^{\\frac{1}{x}} = e \\approx 2.71828\\cdots$\n", "\n", "even though $f(0) = (1+0)^{\\frac{1}{0}}$ is undefined!" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# x is defined as a symbolic variable by default by Sage so we do not need var('x')\n", "f = (1+x)^(1/x)\n", "# uncomment and try evaluating next line\n", "#f.subs(x=0) # this will give you an error message" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can get some idea of what is going on with two plots on different scales" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics Array of size 1 x 2" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "f = (1+x)^(1/x)\n", "n1=5\n", "p1=plot(f.subs(p=1), (x, 0.001, n1), axes_labels=('x',f)) # main plot\n", "t1 = text(\"Large scale plot\", (n1/2,e), rgbcolor='blue',fontsize=10) \n", "n2=0.1\n", "p2=plot(f.subs(p=1), (x, 0.0000001, n2), axes_labels=('x',f)) # main plot\n", "p2+=line([(0,e),(n2,e)],linestyle=':') # add a dotted line at height e\n", "t2 = text(\"Small scale plot\", (n2/2,e+.01), rgbcolor='blue',fontsize=10) \n", "show(graphics_array((p1+t1,p2+t2)),figsize=[6,3]) # show the plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "all this has been laying the groundwork for the topic of real interest to us ...\n", "\n", "# Limit of a Sequence of Random Variables\n", "\n", "We want to be able to say things like $\\underset{i \\rightarrow \\infty}{\\lim} X_i = X$ in some sensible way. $X_i$ are some random variables, $X$ is some 'limiting random variable', but what do we mean by 'limiting random variable'?\n", "\n", "To help us, lets introduce a very very simple random variable, one that puts all its mass in one place. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics Array of size 1 x 2" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "theta = 2.0\n", "show(graphics_array((pmfPointMassPlot(theta),cdfPointMassPlot(theta))),\\\n", " figsize=[8,2]) # show the plots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is known as the $Point\\,Mass(\\theta)$ random variable, $\\theta \\in \\mathbb(R)$: the density $f(x)$ is 1 if $x=\\theta$ and 0 everywhere else\n", "\n", "$$\n", "f(x;\\theta) =\n", "\\begin{cases}\n", "0 & \\text{ if } x \\neq \\theta \\\\\n", "1 & \\text{ if } x = \\theta\n", "\\end{cases}\n", "$$\n", "\n", "$$\n", "F(x;\\theta) =\n", "\\begin{cases}\n", "0 & \\text{ if } x < \\theta \\\\\n", "1 & \\text{ if } x \\geq \\theta\n", "\\end{cases}\n", "$$\n", "\n", "So, if we had some sequence $\\{\\theta_i\\}_{i=1}^\\infty$ and $\\underset{i \\rightarrow \\infty}{\\lim} \\theta_i = \\theta$\n", "\n", "and we had a sequence of random variables $X_i \\sim Point\\,Mass(\\theta_i)$, $i = 1, 2, 3, \\ldots$\n", "\n", "then we could talk about a limiting random variable as $X \\sim Point\\,Mass(\\theta)$:\n", "\n", "i.e., we could talk about $\\underset{i \\rightarrow \\infty}{\\lim} X_i = X$" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics object consisting of 202 graphics primitives" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# mock up a picture of a sequence of point mass rvs converging on theta = 0\n", "ptsize = 20\n", "i = 1\n", "theta_i = 1/i\n", "p = points((theta_i,1), rgbcolor=\"blue\", pointsize=ptsize)\n", "p += line([(theta_i,0),(theta_i,1)], rgbcolor=\"blue\", linestyle=':')\n", "while theta_i > 0.01:\n", " i+=1\n", " theta_i = 1/i\n", " p += points((theta_i,1), rgbcolor=\"blue\", pointsize=ptsize)\n", " p += line([(theta_i,0),(theta_i,1)], rgbcolor=\"blue\", linestyle=':')\n", "p += points((0,1), rgbcolor=\"red\", pointsize=ptsize)\n", "p += line([(0,0),(0,1)], rgbcolor=\"red\", linestyle=':')\n", "p.show(xmin=-1, xmax = 2, ymin=0, ymax = 1.1, axes=false, gridlines=[None,[0]], \\\n", " figsize=[7,2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we want to generalise this notion of a limit to other random variables (that are not necessarily $Point\\,Mass(\\theta_i)$ RVs)\n", "\n", "What about one many of you will be familiar with - the 'bell-shaped curve' \n", "\n", "## The $Gaussian(\\mu, \\sigma^2)$ or $Normal(\\mu, \\sigma^2)$ RV?\n", "\n", "The probability density function (PDF) $f(x)$ is given by\n", "\n", "$$\n", "f(x ;\\mu, \\sigma) = \\displaystyle\\frac{1}{\\sigma\\sqrt{2\\pi}}\\exp\\left(\\frac{-1}{2\\sigma^2}(x-\\mu)^2\\right)\n", "$$\n", "\n", "The two parameters, $\\mu \\in \\mathbb{R} := (-\\infty,\\infty)$ and $\\sigma \\in (0,\\infty)$, are sometimes referred to as the location and scale parameters.\n", "\n", "To see why this is, use the interactive plot below to have a look at what happens to the shape of the density function $f(x)$ when you change $\\mu$ or increase or decrease $\\sigma$:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "da67126df9d24c6f9634d0e068ca5d7c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Interactive function with 2 widgets\n", " my_mu: EvalText(value='0', description='mu',…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "@interact\n", "def _(my_mu=input_box(0, label='mu') ,my_sigma=input_box(1,label='sigma')):\n", " '''Interactive function to plot the normal pdf and ecdf.'''\n", " \n", " if my_sigma > 0:\n", " html('

Normal('+str(my_mu)+','+str(my_sigma)+'2)

')\n", " var('mu sigma')\n", " f = (1/(sigma*sqrt(2.0*pi)))*exp(-1.0/(2*sigma^2)*(x - mu)^2)\n", " p1=plot(f.subs(mu=my_mu,sigma=my_sigma), \\\n", " (x, my_mu - 3*my_sigma - 2, my_mu + 3*my_sigma + 2),\\\n", " axes_labels=('x','f(x)'))\n", " show(p1,figsize=[8,3])\n", " else:\n", " print( \"sigma must be greater than 0\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consider the sequence of random variables $X_1, X_2, X_3, \\ldots$, where\n", "\n", "- $X_1 \\sim Normal(0, 1)$\n", "- $X_2 \\sim Normal(0, \\frac{1}{2})$\n", "- $X_3 \\sim Normal(0, \\frac{1}{3})$\n", "- $X_4 \\sim Normal(0, \\frac{1}{4})$\n", "- $\\vdots$\n", "- $X_i \\sim Normal(0, \\frac{1}{i})$\n", "- $\\vdots$\n", "\n", "We can use the animation below to see how the PDF $f_{i}(x)$ looks as we move through the sequence of $X_i$ (the animation only goes to $i = 25$, $\\sigma = 0.04$ but you get the picture ...)\n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", "
Normal curve animation, looping through $\\sigma = \\frac{1}{i}$ for $i = 1, \\dots, 25$
\n", "\n", "We can see that the probability mass of $X_i \\sim Normal(0, \\frac{1}{i})$ increasingly concentrates about 0 as $i \\rightarrow \\infty$ and $\\frac{1}{i} \\rightarrow 0$\n", "\n", "Does this mean that $\\underset{i \\rightarrow \\infty}{\\lim} X_i = Point\\,Mass(0)$?\n", "\n", "No, because for any $i$, however large, $P(X_i = 0) = 0$ because $X_i$ is a continuous RV (for any continous RV $X$, for any $x \\in \\mathbb{R}$, $P(X=x) = 0$).\n", "\n", "So, we need to refine our notions of convergence when we are dealing with random variables\n", "\n", "# Convergence in Distribution\n", "\n", "Let $X_1, X_2, \\ldots$ be a sequence of random variables and let $X$ be another random variable. Let $F_i$ denote the distribution function (DF) of $X_i$ and let $F$ denote the distribution function of $X$.\n", "\n", "Now, if for any real number $t$ at which $F$ is continuous,\n", "\n", "$$\\underset{i \\rightarrow \\infty}{\\lim} F_i(t) = F(t)$$\n", "\n", "(in the sense of the convergence or limits of functions we talked about earlier)\n", "\n", "Then we can say that the sequence or RVs $X_i$, $i = 1, 2, \\ldots$ **converges to $X$ in distribution** and write $X_i \\overset{d}{\\rightarrow} X$.\n", "\n", "An equivalent way of defining convergence in distribution is to go right back to the meaning of the probabilty space 'under the hood' of a random variable, a random variable $X$ as a mapping from the sample space $\\Omega$ to the real line ($X: \\Omega \\rightarrow \\mathbb{R}$), and the sample points or outcomes in the sample space, the $\\omega \\in \\Omega$. For $\\omega \\in \\Omega$, $X(\\omega)$ is the mapping of $\\omega$ to the real line $\\mathbb{R}$. We could look at the set of $\\omega$ such that $X(\\omega) \\leq t$, i.e. the set of $\\omega$ that map to some value on the real line less than or equal to $t$, $\\{\\omega: X(\\omega) \\leq t \\}$. \n", "\n", "Saying that for any $t \\in \\mathbb{R}$, $\\underset{i \\rightarrow \\infty}{\\lim} F_i(t) = F(t)$ is the equivalent of saying that for any $t \\in \\mathbb{R}$, \n", "\n", "$$\\underset{i \\rightarrow \\infty}{\\lim} P\\left(\\{\\omega:X_i(\\omega) \\leq t \\}\\right) = P\\left(\\{\\omega: X(\\omega) \\leq t\\right)$$\n", "\n", "Armed with this, we can go back to our sequence of $Normal$ random variables $X_1, X_2, X_3, \\ldots$, where\n", "\n", "- $X_1 \\sim Normal(0, 1)$\n", "- $X_2 \\sim Normal(0, \\frac{1}{2})$\n", "- $X_3 \\sim Normal(0, \\frac{1}{3})$\n", "- $X_4 \\sim Normal(0, \\frac{1}{4})$\n", "- $\\vdots$\n", "- $X_i \\sim Normal(0, \\frac{1}{i})$\n", "- $\\vdots$\n", "\n", "and let $X \\sim Point\\,Mass(0)$,\n", "\n", "and say that the $X_i$ **converge in distribution** to the $x \\sim Point\\,Mass$ RV $X$,\n", "\n", "$$X_i \\overset{d}{\\rightarrow} X$$\n", "\n", "What we are saying with convergence in distribution, informally, is that as $i$ increases, we increasingly expect to see the next outcome in a sequence of random experiments becoming better and better modeled by the limiting random variable. In this case, as $i$ increases, the $Point\\,Mass(0)$ is becoming a better and better model for the next outcome of a random experiment with outcomes $\\sim Normal(0,\\frac{1}{i})$." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics object consisting of 14 graphics primitives" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# mock up a picture of a sequence of converging normal distributions\n", "my_mu = 0\n", "upper = my_mu + 5; lower = -upper; # limits for plot\n", "var('mu sigma')\n", "stop_i = 12\n", "html('

N(0,1) to N(0, 1/'+str(stop_i)+')

')\n", "f = (1/(sigma*sqrt(2.0*pi)))*exp(-1.0/(2*sigma^2)*(x - mu)^2)\n", "p=plot(f.subs(mu=my_mu,sigma=1.0), (x, lower, upper), rgbcolor = (0,0,1))\n", "for i in range(2, stop_i, 1): # just do a few of them\n", " shade = 1-11/i # make them different colours\n", " p+=plot(f.subs(mu=my_mu,sigma=1/i), (x, lower, upper), rgbcolor = (1-shade, 0, shade))\n", "textOffset = -0.2 # offset for placement of text - may need adjusting \n", "p+=text(\"0\",(0,textOffset),fontsize = 10, rgbcolor='grey') \n", "p+=text(str(upper.n(digits=2)),(upper,textOffset),fontsize = 10, rgbcolor='grey') \n", "p+=text(str(lower.n(digits=2)),(lower,textOffset),fontsize = 10, rgbcolor='grey') \n", "p.show(axes=false, gridlines=[None,[0]], figsize=[7,3])" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics object consisting of 15 graphics primitives" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# mock up a picture of a sequence of converging normal distributions\n", "my_mu = 0\n", "upper = my_mu + 5; lower = -upper; # limits for plot\n", "var('mu sigma')\n", "stop_i = 12\n", "html('

N(0,1) to N(0, 1/'+str(stop_i)+')

')\n", "f = (1/2)*(1+erf((x - mu)/(sqrt(2)*sigma)))\n", "p=plot(f.subs(mu=my_mu,sigma=1.0), (x, lower, upper), rgbcolor = (0,0,1))\n", "for i in range(2, stop_i, 1): # just do a few of them\n", " shade = 1-11/i # make them different colours\n", " p+=plot(f.subs(mu=my_mu,sigma=1/i), (x, lower, upper), rgbcolor = (1-shade, 0, shade))\n", "textOffset = -0.2 # offset for placement of text - may need adjusting \n", "p+=text(\"0\",(0,textOffset),fontsize = 10, rgbcolor='grey') \n", "p+=text(str(upper.n(digits=2)),(upper,textOffset),fontsize = 10, rgbcolor='grey') \n", "p+=text(str(lower.n(digits=2)),(lower,textOffset),fontsize = 10, rgbcolor='grey') \n", "p.show(axes=false, gridlines=[None,[0]], figsize=[7,3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### There is an interesting point to note about this convergence: \n", "\n", "We have said that the $X_i \\sim Normal(0,\\frac{1}{i})$ with distribution functions $F_i$ converge in distribution to $X \\sim Point\\,Mass(0)$ with distribution function $F$, which means that we must be able to show that for any real number $t$ at which $F$ is continuous,\n", "\n", "$$\\underset{i \\rightarrow \\infty}{\\lim} F_i(t) = F(t)$$\n", "\n", "Note that for any of the $X_i \\sim Normal(0, \\frac{1}{i})$, $F_i(0) = \\frac{1}{2}$, and also note that for $X \\sim Point,Mass(0)$, $F(0) = 1$, so clearly $F_i(0) \\neq F(0)$. \n", "\n", "What has gone wrong? \n", "\n", "Nothing: we said that we had to be able to show that $\\underset{i \\rightarrow \\infty}{\\lim} F_i(t) = F(t)$ for any $t \\in \\mathbb{R}$ at which $F$ is continuous, but the $Point\\,Mass(0)$ distribution function $F$ is not continous at 0!" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics Array of size 1 x 2" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "theta = 0.0\n", "# show the plots\n", "show(graphics_array((pmfPointMassPlot(theta),cdfPointMassPlot(theta))),figsize=[8,2]) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Convergence in Probability\n", "\n", "Let $X_1, X_2, \\ldots$ be a sequence of random variables and let $X$ be another random variable. Let $F_i$ denote the distribution function (DF) of$X_i$ and let $F$ denote the distribution function of $X$.\n", "\n", "Now, if for any real number $\\varepsilon > 0$,\n", "\n", "$$\\underset{i \\rightarrow \\infty}{\\lim} P\\left(|X_i - X| > \\varepsilon\\right) = 0$$\n", "\n", "Then we can say that the sequence $X_i$, $i = 1, 2, \\ldots$ **converges to $X$ in probability** and write $X_i \\overset{P}{\\rightarrow} X$.\n", "\n", "Or, going back again to the probability space 'under the hood' of a random variable, we could look the way the $X_i$ maps each outcome $\\omega \\in \\Omega$, $X_i(\\omega)$, which is some point on the real line, and compare this to mapping $X(\\omega)$. \n", "\n", "Saying that for any $\\varepsilon \\in \\mathbb{R}$, $\\underset{i \\rightarrow \\infty}{\\lim} P\\left(|X_i - X| > \\varepsilon\\right) = 0$ is the equivalent of saying that for any $\\varepsilon \\in \\mathbb{R}$, \n", "\n", "$$\\underset{i \\rightarrow \\infty}{\\lim} P\\left(\\{\\omega:|X_i(\\omega) - X(\\omega)| > \\varepsilon \\}\\right) = 0$$\n", "\n", "Informally, we are saying $X$ is a limit in probabilty if, by going far enough into the sequence $X_i$, we can ensure that the mappings $X_i(\\omega)$ and $X(\\omega)$ will be arbitrarily close to each other on the real line for all $\\omega \\in \\Omega$.\n", "\n", "**Note** that convergence in distribution is implied by convergence in probability: convergence in distribution is the weakest form of convergence; any sequence of RV's that converges in probability to some RV $X$ also converges in distribution to $X$ (but not necessarily vice versa). " ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics object consisting of 25 graphics primitives" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# mock up a picture of a sequence of converging normal distributions\n", "my_mu = 0\n", "var('mu sigma')\n", "upper = 0.2; lower = -upper\n", "i = 20 # start part way into the sequence\n", "lim = 100 # how far to go\n", "stop_i = 12\n", "html('

N(0,1/'+str(i)+') to N(0, 1/'+str(lim)+')

')\n", "f = (1/(sigma*sqrt(2.0*pi)))*exp(-1.0/(2*sigma^2)*(x - mu)^2)\n", "p=plot(f.subs(mu=my_mu,sigma=1.0/i), (x, lower, upper), rgbcolor = (0,0,1))\n", "for j in range(i, lim+1, 4): # just do a few of them\n", " shade = 1-(j-i)/(lim-i) # make them different colours\n", " p+=plot(f.subs(mu=my_mu,sigma=1/j), (x, lower,upper), rgbcolor = (1-shade, 0, shade))\n", "textOffset = -1.5 # offset for placement of text - may need adjusting \n", "p+=text(\"0\",(0,textOffset),fontsize = 10, rgbcolor='grey') \n", "p+=text(str(upper.n(digits=2)),(upper,textOffset),fontsize = 10, rgbcolor='grey') \n", "p+=text(str(lower.n(digits=2)),(lower,textOffset),fontsize = 10, rgbcolor='grey') \n", "p.show(axes=false, gridlines=[None,[0]], figsize=[7,3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For our sequence of $Normal$ random variables $X_1, X_2, X_3, \\ldots$, where\n", "\n", "- $X_1 \\sim Normal(0, 1)$\n", "- $X_2 \\sim Normal(0, \\frac{1}{2})$\n", "- $X_3 \\sim Normal(0, \\frac{1}{3})$\n", "- $X_4 \\sim Normal(0, \\frac{1}{4})$\n", "- $\\vdots$\n", "- $X_i \\sim Normal(0, \\frac{1}{i})$\n", "- $\\vdots$\n", "\n", "and $X \\sim Point\\,Mass(0)$,\n", "\n", "It can be shown that the $X_i$ converge in probability to $X \\sim Point\\,Mass(0)$ RV $X$,\n", "\n", "$$X_i \\overset{P}{\\rightarrow} X$$\n", "\n", "Since we are going to be using Markovs inequality later, we might as well take a look at it here and prove it.\n", "\n", "### Markovs inequality\n", "\n", "Let $x$ be a nonnegative random variable. Then for $a > 0$,\n", "$$\n", " P(X \\geq a) \\leq \\frac{E[X]}{a}\n", "$$\n", "\n", "#### Proof\n", "\n", "Let us begin by assuming that $X$ is a continuous random variable and let us write\n", "$$\n", " P(X \\geq a) = \\int_{a}^\\infty f(x) dx = \\frac{1}{a} \\int_{a}^\\infty a f(x) dx \\leq \\frac{1}{a} \\int_{a}^\\infty x f(x) dx \\leq \\frac{1}{a} E[x]\n", "$$\n", "\n", "#### Convergence in probability of our sequence\n", "Lets now use it! Remember we need to prove\n", "\n", "$$\\underset{i \\rightarrow \\infty}{\\lim} P\\left(|X_i - X| > \\varepsilon\\right) = 0$$\n", "\n", "but $X = 0$ since its a $Point\\,Mass(0)$ r.v. so by Markovs inequality we get\n", "\n", "$$\n", " P\\left(|X_i - X| \\geq \\varepsilon\\right) = P\\left(|X_i - X|^2 \\geq \\varepsilon^2 \\right) \\leq \\frac{E[|X_i-X|^2]}{\\varepsilon^2} = \\text{Var}[X_i]\\frac{1}{\\varepsilon^2} \\leq \\frac{1}{i}\\frac{1}{\\varepsilon^2}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Some Basic Limit Laws in Statistics\n", "\n", "Intuition behind Law of Large Numbers and Central Limit Theorem\n", "\n", "Take a look at the Khan academy videos on the Law of Large Numbers and the Central Limit Theorem. This will give you a working idea of these theorems. In the sequel, we will strive for a deeper understanding of these theorems on the basis of the two notions of convergence of sequences of random variables we just saw.\n", " \n", "\n", "## Weak Law of Large Numbers\n", "\n", "Remember that a statistic is a random variable, so a sample mean is a random variable. If we are given a sequence of independent and identically distributed RVs, $X_1,X_2,\\ldots \\overset{IID}{\\sim} X_1$, then we can also think of a sequence of random variables $\\overline{X}_1, \\overline{X}_2, \\ldots, \\overline{X}_n, \\ldots$ ($n$ being the sample size). \n", "\n", "Since $X_1, X_2, \\ldots$ are $IID$, they all have the same expection, say $E(X_1)$ by convention.\n", "\n", "If $E(X_1)$ exists, then the sample mean $\\overline{X}_n$ converges in probability to $E(X_1)$ (i.e., to the expectatation of any one of the individual RVs):\n", "\n", "$$\n", "\\text{If} \\quad X_1,X_2,\\ldots \\overset{IID}{\\sim} X_1 \\ \\text{and if } \\ E(X_1) \\ \\text{exists, then } \\ \\overline{X}_n \\overset{P}{\\rightarrow} E(X_1) \\ .\n", "$$\n", "\n", "Going back to our definition of convergence in probability, we see that this means that for any real number $\\varepsilon > 0$, $\\underset{n \\rightarrow \\infty}{\\lim} P\\left(|\\overline{X}_n - E(X_1)| > \\varepsilon\\right) = 0$\n", "\n", "Informally, this means that means that, by taking larger and larger samples we can make the probability that the average of the observations is more than $\\varepsilon$ away from the expected value get smaller and smaller.\n", "\n", "Proof of this is beyond the scope of this course, but we have already seen it in action when we looked at the $Bernoulli$ running means. Have another look, this time with only one sequence of running means. You can increase $n$, the sample size, and change $\\theta$. Note that the seed for the random number generator is also under your control. This means that you can get replicable samples: in particular, in this interact, when you increase the sample size it looks as though you are just adding more to an existing sample rather than starting from scratch with a new one. " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "347a97c419934a3ab3d97757233c72cc", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Interactive function with 3 widgets\n", " nToGen: TransformIntSlider(value=100, descri…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "@interact\n", "def _(nToGen=slider(1,1500,1,100,label='n'),my_theta=input_box(0.3,label='theta'),rSeed=input_box(1234,label='random seed')):\n", " '''Interactive function to plot running mean for a Bernoulli with specified n, theta and random number seed.'''\n", " \n", " if my_theta >= 0 and my_theta <= 1:\n", " html('

Bernoulli('+str(my_theta.n(digits=2))+')

')\n", " xvalues = range(1, nToGen+1,1)\n", " bRunningMeans = bernoulliRunningMeans(nToGen, myTheta=my_theta, mySeed=rSeed)\n", " pts = zip(xvalues, bRunningMeans)\n", " p = line(pts, rgbcolor = (0,0,1))\n", " p+=line([(0,my_theta),(nToGen,my_theta)],linestyle=':',rgbcolor='grey')\n", " show(p, figsize=[5,3], axes_labels=['n','sample mean'],ymax=1)\n", " else:\n", " print ('Theta must be between 0 and 1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Central Limit Theorem\n", "\n", "You have probably all heard of the Central Limit Theorem before, but now we can relate it to our definition of convergence in distribution. \n", "\n", "Let $X_1,X_2,\\ldots \\overset{IID}{\\sim} X_1$ and suppose $E(X_1)$ and $V(X_1)$ both exist,\n", "\n", "then\n", "\n", "$$\n", "\\overline{X}_n = \\frac{1}{n} \\sum_{i=1}^n X_i \\overset{d}{\\rightarrow} X \\sim Normal \\left(E(X_1),\\frac{V(X_1)}{n} \\right)\n", "$$\n", "\n", "And remember $Z \\sim Normal(0,1)$?\n", "\n", "Consider $Z_n := \\displaystyle\\frac{\\overline{X}_n-E(\\overline{X}_n)}{\\sqrt{V(\\overline{X}_n)}} = \\displaystyle\\frac{\\sqrt{n} \\left( \\overline{X}_n -E(X_1) \\right)}{\\sqrt{V(X_1)}}$\n", "\n", "If $\\overline{X}_n = \\displaystyle\\frac{1}{n} \\displaystyle\\sum_{i=1}^n X_i \\overset{d}{\\rightarrow} X \\sim Normal \\left(E(X_1),\\frac{V(X_1)}{n} \\right)$, then $\\overline{X}_n -E(X_1) \\overset{d}{\\rightarrow} X-E(X_1) \\sim Normal \\left( 0,\\frac{V(X_1)}{n} \\right)$\n", "\n", "and $\\sqrt{n} \\left( \\overline{X}_n -E(X_1) \\right) \\overset{d}{\\rightarrow} \\sqrt{n} \\left( X-E(X_1) \\right) \\sim Normal \\left( 0,V(X_1) \\right)$\n", "\n", "so $Z_n := \\displaystyle \\frac{\\overline{X}_n-E(\\overline{X}_n)}{\\sqrt{V(\\overline{X}_n)}} = \\displaystyle\\frac{\\sqrt{n} \\left( \\overline{X}_n -E(X_1) \\right)}{\\sqrt{V(X_1)}} \\overset{d}{\\rightarrow} Z \\sim Normal \\left( 0,1 \\right)$\n", "\n", "Thus, for sufficiently large $n$ (say $n>30$), probability statements about $\\overline{X}_n$ can be approximated using the $Normal$ distribution. \n", "\n", "The beauty of the CLT, as you have probably seen from other courses, is that $\\overline{X}_n \\overset{d}{\\rightarrow} Normal \\left( E(X_1), \\frac{V(X_1)}{n} \\right)$ does not require the $X_i$ to be normally distributed. \n", "\n", "We can try this with our $Bernoulli$ RV generator. First, a small number of samples:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[7/10, 2/5, 3/5, 4/5, 3/5]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "theta, n, samples = 0.6, 10, 5 # concise way to set some variable values\n", "sampleMeans=[] # empty list\n", "for i in range(0, samples, 1): # loop \n", " thisMean = QQ(sum(bernoulliSample(n, theta)))/n # get a sample and find the mean\n", " sampleMeans.append(thisMean) # add mean to the list of means\n", "sampleMeans # disclose the sample means" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use the interactive plot to increase the number of samples and make a histogram of the sample means. According to the CLT, for lots of reasonably-sized samples we should get a nice symmetric bell-curve-ish histogram centred on $\\theta$. You can adjust the number of bins in the histogram as well as the number of samples, sample size, and $\\theta$. " ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f9e4807b9c30430e97cdca69156c3f51", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Interactive function with 4 widgets\n", " replicates: TransformIntSlider(value=100, de…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pylab\n", "@interact\n", "def _(replicates=slider(1,3000,1,100,label='replicates'), \\\n", " nToGen=slider(1,1500,1,100,label='sample size n'),\\\n", " my_theta=input_box(0.3,label='theta'),Bins=5):\n", " '''Interactive function to plot distribution of replicates of sample means for n IID Bernoulli trials.'''\n", " \n", " if my_theta >= 0 and my_theta <= 1 and replicates > 0:\n", " sampleMeans=[] # empty list\n", " for i in range(0, replicates, 1): \n", " thisMean = RR(sum(bernoulliSample(nToGen, my_theta)))/nToGen\n", " sampleMeans.append(thisMean)\n", " pylab.clf() # clear current figure\n", " n, bins, patches = pylab.hist(sampleMeans, Bins, density=true) \n", " pylab.ylabel('normalised count')\n", " pylab.title('Normalised histogram for Bernoulli sample means')\n", " pylab.savefig('myHist') # to actually display the figure\n", " pylab.show()\n", " #show(p, figsize=[5,3], axes_labels=['n','sample mean'],ymax=1)\n", " else:\n", " print ('Theta must be between 0 and 1, and samples > 0')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Increase the sample size and the numbe rof bins in the above interact and see if the histograms of the sample means are looking more and more normal as the CLT would have us believe." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But although the $X_i$ do not have to be $\\sim Normal$ for $\\overline{X}_n = \\overset{d}{\\rightarrow} X \\sim Normal\\left(E(X_1),\\frac{V(X_1)}{n} \\right)$, remember that we said \"Let $X_1,X_2,\\ldots \\overset{IID}{\\sim} X_1$ and suppose $E(X_1)$ and $V(X_1)$ both exist\", then,\n", "\n", "$$\n", "\\overline{X}_n = \\frac{1}{n} \\sum_{i=1}^n X_i \\overset{d}{\\rightarrow} X \\sim Normal \\left(E(X_1),\\frac{V(X_1)}{n} \\right)\n", "$$\n", "\n", "This is where is all goes horribly wrong for the standard $Cauchy$ distribution (any $Cauchy$ distribution in fact): neither the expectation nor the variance exist for this distribution. The Central Limit Theorem cannot be applied here. In fact, if $X_1,X_2,\\ldots \\overset{IID}{\\sim}$ standard $Cauchy$, then $\\overline{X}_n = \\displaystyle \\frac{1}{n} \\sum_{i=1}^n X_i \\sim$ standard $Cauchy$.\n", "\n", "### YouTry\n", "\n", "Try looking at samples from two other RVs where the expectation and variance do exist, the $Uniform$ and the $Exponential$:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "67196e1715f14e4d9430c6784e6cfe66", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Interactive function with 5 widgets\n", " replicates: EvalText(value='100', descriptio…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pylab\n", "@interact\n", "def _(replicates=input_box(100,label='replicates'), \\\n", " nToGen=slider(1,1500,1,100,label='sample size n'),\\\n", " my_theta1=input_box(2,label='theta1'),\\\n", " my_theta2=input_box(4,label='theta1'),Bins=5):\n", " '''Interactive function to plot distribution of \n", " sample means for n IID Uniform(theta1, theta2) trials.'''\n", " \n", " if (my_theta1 < my_theta2) and replicates > 0:\n", " sampleMeans=[] # empty list\n", " for i in range(0, replicates, 1):\n", " \n", " thisMean = RR(sum(uniformSample(nToGen, my_theta1, my_theta2)))/nToGen\n", " sampleMeans.append(thisMean)\n", " pylab.clf() # clear current figure\n", " n, bins, patches = pylab.hist(sampleMeans, Bins, density=true) \n", " pylab.ylabel('normalised count')\n", " pylab.title('Normalised histogram for Uniform sample means')\n", " pylab.savefig('myHist') # to actually display the figure\n", " pylab.show()\n", " #show(p, figsize=[5,3], axes_labels=['n','sample mean'],ymax=1)\n", " else:\n", " print ('theta1 must be less than theta2, and samples > 0')" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "746dca82f4bf4d3ba949aea898903c16", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Interactive function with 4 widgets\n", " replicates: EvalText(value='100', descriptio…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pylab\n", "@interact\n", "def _(replicates=input_box(100,label='replicates'), \\\n", " nToGen=slider(1,1500,1,100,label='sample size n'),\\\n", " my_lambda=input_box(0.1,label='lambda'),Bins=5):\n", " '''Interactive function to plot distribution of \\\n", " sample means for an Exponential(lambda) process.'''\n", " \n", " if my_lambda > 0 and replicates > 0:\n", " sampleMeans=[] # empty list\n", " for i in range(0, replicates, 1): \n", " thisMean = RR(sum(exponentialSample(nToGen, my_lambda)))/nToGen\n", " sampleMeans.append(thisMean)\n", " pylab.clf() # clear current figure\n", " n, bins, patches = pylab.hist(sampleMeans, Bins, density=true) \n", " pylab.ylabel('normalised count')\n", " pylab.title('Normalised histogram for Exponential sample means')\n", " pylab.savefig('myHist') # to actually display the figure\n", " pylab.show()\n", " #show(p, figsize=[5,3], axes_labels=['n','sample mean'],ymax=1)\n", " else:\n", " print ('lambda must be greater than 0, and samples > 0')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Properties of the MLE\n", "\n", "The LLN (law of large numbers) and CLT (central limit theorem) are statements about the limiting distribution of the sample mean of IID random variables whose expectation and variance exists. How does this apply to the MLE (maximum likelihood estimator)?\n", "\n", "Consider the following generic parametric model for our data or observations:\n", "\n", "$$\n", "X_1,X_2,\\ldots,X_n \\overset{IID}{\\sim} F(x; \\theta^*) \\ \\text{ or } \\ f(x; \\theta^*)\n", "$$\n", "\n", "We do not know the true parameter $\\theta^*$ under the model for our data. Our task is to estimate the unknown parameter $\\theta^*$ using the MLE:\n", "\n", "$$\\widehat{\\Theta}_n = argmax_{\\theta \\in \\mathbf{\\Theta}} l(\\theta)$$\n", "\n", "The amazing think about the MLE is its following properties:\n", "\n", "### 1. The MLE is *asymptotically consistent*\n", "\n", "$$\\boxed{\\widehat{\\Theta}_n \\overset{P}{\\rightarrow} \\theta^*}$$\n", "\n", "So when the number of observations (sample size $n$) goes to infinity, our MLE converges in probability to the true parameter $\\theta^* \\in \\mathbf{\\Theta}$.\n", "\n", "Interestingly, one can work out the details and find that the MLE $\\widehat{\\Theta}_n$, which is also a random variable based on $n$ IID samples that takes values in the parameter space $\\mathbf{\\Theta}$, is also normally distributed for large sample sizes.\n", "\n", "### 2. The MLE is *equivariant*\n", "\n", "$$\\boxed{\\text{If } \\ \\widehat{\\Theta}_n \\ \\text{ is the MLE of } \\ \\theta^* \\ \\text{ then } \\ g(\\widehat{\\Theta}_n) \\ \\text{ is the MLE of } \\ g(\\theta^*)}$$\n", "\n", "This is a very useful property, since any function $g : \\mathbf{\\Theta} \\to \\mathbb{R}$ of interest is at our disposal by merely applying $g$ to the the MLE. Often $g$ is some sort of reward that depends on the unknown parameter $\\theta^*$.\n", "\n", "### 3. The MLE is *asymptotically normal* \n", "\n", "$$\\boxed{ \\frac{\\left(\\widehat{\\Theta}_n - \\theta^*\\right)}{\\widehat{se}_n} \\overset{d}{\\rightarrow} Normal(0,1) }\n", "\\quad \\text{ or equivalently, } \\quad\n", "\\boxed{ \\widehat{\\Theta}_n \\overset{d}{\\rightarrow} Normal( \\theta^*, \\widehat{se}_n^2) }\n", "$$\n", "\n", "where, $\\widehat{se}_n$ is the *estimated standard error* of the MLE:\n", "\n", "$$\\boxed{ \\widehat{se}_n \\ \\text{ is an estimate of the } \\ \\sqrt{V\\left(\\widehat{\\Theta}_n \\right)}}$$\n", "\n", "We can compute $\\widehat{se}_n$ with the following formula:\n", "\n", "$$\\boxed{\\widehat{se}_n = \\sqrt{\\frac{1}{ \\left. n E \\left(-\\frac{\\partial^2 \\log f(X;\\theta)}{\\partial \\theta^2} \\right) \\right\\vert_{\\theta=\\widehat{\\theta}_n} } }}$$\n", "\n", "where, the expectation is called the *Fisher information* of one sample or $I_1$:\n", "\n", "$$\\boxed{ I_1 := E \\left(-\\frac{\\partial^2 \\log f(X;\\theta)}{\\partial \\theta^2} \\right) = \n", "\\begin{cases}\n", "\\displaystyle{\\int{\\left(-\\frac{\\partial^2 \\log f(x;\\theta)}{\\partial \\theta^2} \\right) f(x; \\theta)} dx} & \\text{ for continuous RV } X\\\\\n", "\\displaystyle{\\sum_x{\\left(-\\frac{\\partial^2 \\log f(x;\\theta)}{\\partial \\theta^2} \\right) f(x; \\theta)}}& \\text{ for discrete RV } X\n", "\\end{cases}\n", "}\n", "$$\n", "\n", "Other two properties (not needed for this course) include:\n", "\n", "- *asymptotic efficiency*, i.e., among a class of well-behaved estimators, the MLE has the smallest variance at least for large samples, and\n", "- *approximately Bayes*, i.e., the MLE is approximately the *Bayes estimator* (some of you may see Bayesian methods of estimation in advanced courses in statistical machine learning or in latest AI methods)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Confidence Interval and Set Estimation from MLE\n", "\n", "An immediate implication of the asymptotic normality of the MLE, which informally states that the distribution of the MLE can be approximated by a Normal random variable, is to obtain confidence intervals for the unkown parameter $\\theta^*$.\n", "\n", "Recall that in set estimation, as opposed to point estimation, we estimate the unknown parameter using a random set based on the data (typically intervals in 1D) that \"traps\" the true parameter $\\theta^*$ with a very high probability, say $0.95$. We typically express such probality in terms of $1-\\alpha$, so the $95\\%$ confidence interval is seen as a $1-\\alpha$ confidence interval with $\\alpha=0.05$. From the the asymptotic normality of the MLE, we get the following confidence interval for the unknown $\\theta^*$:\n", "\n", "\n", "$$\n", "\\boxed{\\text{If } \\quad \n", "\\displaystyle{C_n := \\left( \\widehat{\\Theta}_n - z_{\\alpha/2} \\widehat{se}_n, \\, \\widehat{\\Theta}_n + z_{\\alpha/2} \\widehat{se}_n \\right)} \\quad \\text{ then } \\quad P \\left( \\{ \\theta^* \\in C_n \\} ; \\theta^* \\right) \\underset{n \\to \\infty}{\\longrightarrow} 1-\\alpha , \\quad \\text{ where } z_{\\alpha/2} = \\Phi^{[-1]}(1-\\alpha/2).\n", "}\n", "$$\n", "\n", "Recall that $P \\left( \\{ \\theta^* \\in C_n \\} ; \\theta^* \\right)$ is simply the probability of the event that $\\theta^*$ will be in $C_n$, the $1-\\alpha$ confidence interval, given the data is distributed according to the model with true parameter $\\theta^*$.\n", "\n", "NOTE: $\\Phi^{[-1]}(1-\\alpha/2)$ is merely the inverse distribution function (CDF) of the standard normal RV. \n", "\n", "$$\n", "\\text{For } \\alpha=0.05, z_{\\alpha/2}=1.96 \\approxeq 2, \\text{ so: } \\quad \\boxed{\\widehat{\\Theta}_n \\pm 2 \\widehat{se}_n} \\quad \\text{ is an approximate 95% confidence interval.}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example of Confidence Interval for IID $Bernoulli(\\theta)$ Trials\n", "\n", "We already know that the MLE for the model with $n$ IID $Bernoulli(\\theta)$ Trials is the sample mean, i.e.,\n", "\n", "$$X_1,X_2,\\ldots, X_n \\overset{IID}{\\sim} Bernoulli(\\theta^*) \\implies \\widehat{\\Theta}_n = \\overline{X}_n$$\n", "\n", "Our task now is to obtain the $1-\\alpha$ confidence interval based on this MLE.\n", "\n", "To get the confidence interval we need to obtain $\\widehat{se}_n$ by computing the following:\n", "\n", "$$\n", "\\begin{array}{cc}\n", "\\widehat{se}_n &=& \\displaystyle{\\sqrt{\\frac{1}{ \\left. n E \\left(-\\frac{\\partial^2 \\log f(X;\\theta)}{\\partial \\theta^2} \\right) \\right\\vert_{\\theta=\\widehat{\\theta}_n} } }}\n", "\\end{array}\n", "$$\n", "$I_1 := E \\left(-\\frac{\\partial^2 \\log f(X;\\theta)}{\\partial \\theta^2} \\right)$ is called the Fisher Information of one sample.\n", "Since our IID samples are from a discrete distribution with \n", "\n", "$$\n", "\\begin{array}{cc}\n", "f(x; \\theta) = \\theta^x (1-\\theta)^{1-x} \n", "&\\implies& \\displaystyle{\\log \\left( f(x;\\theta) \\right) = x \\log(\\theta) +(1-x) \\log(1-\\theta)}\\\\\n", "&\\implies& \\displaystyle{\\frac{\\partial}{\\partial \\theta} \\left(\\log \\left( f(x;\\theta) \\right)\\right)} \n", "= \\displaystyle{\\frac{x}{\\theta} -\\frac{1-x}{1-\\theta}} \\\\\n", "&\\implies& \\displaystyle{\\frac{\\partial^2}{\\partial \\theta^2} \\left(\\log \\left( f(x;\\theta) \\right)\\right)} \n", "= \\displaystyle{-\\frac{x}{\\theta^2} - \\frac{1-x}{(1-\\theta)^2}}\\\\\n", "&\\implies& \\displaystyle{E \\left( - \\frac{\\partial^2}{\\partial \\theta^2} \\left(\\log \\left( f(x;\\theta) \\right)\\right) \\right)} \n", "= \\displaystyle{\\sum_{x\\in\\{0,1\\}} \\left( \\frac{x}{\\theta^2} + \\frac{1-x}{(1-\\theta)^2} \\right) f(x; \\theta) = \\frac{\\theta}{\\theta^2} + \\frac{1-\\theta}{(1-\\theta)^2} = \\frac{1}{\\theta(1-\\theta)}}\n", "\\end{array}\n", "$$\n", "\n", "Note that we have implicitly assumed that the $x$ values are only $0$ or $1$ by ignoring the indicator term $\\mathbf{1}_{\\{0,1\\}}(x)$ in $f(x;\\theta)$. But this is okay as we are carefully doing the sums over just $x \\in \\{0,1\\}$.\n", "\n", "Now, by using the formula for $\\widehat{se}_n$, we can obtain:\n", "\n", "$$\n", "\\begin{array}{cc}\n", "\\widehat{se}_n \n", "&=& \\displaystyle{\\sqrt{\\frac{1}{ \\left. n E \\left(-\\frac{\\partial^2 \\log f(X;\\theta)}{\\partial \\theta^2} \\right) \\right\\vert_{\\theta=\\widehat{\\theta}_n} } }}\\\\\n", "&=& \\displaystyle{\\sqrt{\\frac{1}{ \\left. n \\frac{1}{\\theta(1-\\theta)} \\right\\vert_{\\theta=\\widehat{\\theta}_n} } }}\\\\\n", "&=& \\displaystyle{\\sqrt{\\frac{\\widehat{\\theta}_n(1-\\widehat{\\theta}_n)}{n}}}\n", "\\end{array}\n", "$$\n", "\n", "Finally, we can complete our task by obtaining the 95% confidence interval for $\\theta^*$ as follows:\n", "\n", "$$\n", "\\displaystyle{ \\widehat{\\theta}_n \\pm 2 \\widehat{se}_n = \\widehat{\\theta}_n \\pm 2 \\sqrt{\\frac{\\widehat{\\theta}_n(1-\\widehat{\\theta}_n)}{n}} = \\overline{x}_n \\pm 2 \\sqrt{\\frac{\\overline{x}_n(1-\\overline{x}_n)}{n}} }\n", "$$" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics object consisting of 42 graphics primitives" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nToGenerate = 100\n", "replicates = 20\n", "xvalues = range(1, nToGenerate+1,1)\n", "for i in range(replicates):\n", " redshade = 0.5*(replicates - 1 - i)/replicates # to get different colours for the lines\n", " bRunningMeans = bernoulliSecretThetaRunningMeans(nToGenerate)\n", " pts = zip(xvalues,bRunningMeans)\n", " if (i == 0):\n", " p = line(pts, rgbcolor = (redshade,0,1))\n", " else:\n", " p += line(pts, rgbcolor = (redshade,0,1))\n", " mle=bRunningMeans[nToGenerate-1]\n", " se95Correction=2.0*sqrt(mle*(1-mle)/nToGenerate)\n", " lower95CI = mle-se95Correction\n", " upper95CI = mle+se95Correction\n", " p += line([(nToGenerate+i,lower95CI),(nToGenerate+i,upper95CI)], rgbcolor = (redshade,0,1), thickness=0.5)\n", "p += line([(1,0.3),(nToGenerate+replicates,0.3)], rgbcolor='black', thickness='2')\n", "p += text('sample mean up to n='+str(nToGenerate)+' and their 95% confidence intervals',(nToGenerate/1.5,1),fontsize=16)\n", "show(p, figsize=[10,6])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sample Exam Problem 5\n", "\n", "Obtain the 95% Confidence Interval for the $\\lambda^*$ from the experiment based on $n$ IID $Exponential(\\lambda)$ trials.\n", "\n", "Write down your answer by returning the right answer in the function `SampleExamProblem5` in the next cell.\n", "Your function call `SampleExamProblem5(sampleWaitingTimes)` on the Orbiter waiting times data should return the 95% confidence interval for the unknown parameter $\\lambda^*$." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'XXX' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0;31m# do NOT change anything below\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 17\u001b[0;31m \u001b[0mlowerCISampleExamProblem5\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mupperCISampleExamProblem5\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mSampleExamProblem5\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msampleWaitingTimes\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 18\u001b[0m \u001b[0mprint\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m\"The 95% CI for lambda in the Orbiter Waiting time experiment = \"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 19\u001b[0m \u001b[0mprint\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mlowerCISampleExamProblem5\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mupperCISampleExamProblem5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36mSampleExamProblem5\u001b[0;34m(exponentialSamples)\u001b[0m\n\u001b[1;32m 7\u001b[0m '''return the 95% confidence interval as a 2-tuple for the unknown rate parameter lambda* \n\u001b[1;32m 8\u001b[0m from n IID Exponential(lambda*) trials in the input numpy array called exponentialSamples'''\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0mXXX\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 10\u001b[0m \u001b[0mXXX\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mXXX\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'XXX' is not defined" ] } ], "source": [ "# Sample Exam Problem 5\n", "# Only replace the XXX below, do not change the function naemes or parameters\n", "import numpy as np\n", "sampleWaitingTimes = np.array([8,3,7,18,18,3,7,9,9,25,0,0,25,6,10,0,10,8,16,9,1,5,16,6,4,1,3,21,0,28,3,8,6,6,11,\\\n", " 8,10,15,0,8,7,11,10,9,12,13,8,10,11,8,7,11,5,9,11,14,13,5,8,9,12,10,13,6,11,13,0,\\\n", " 0,11,1,9,5,14,16,2,10,21,1,14,2,10,24,6,1,14,14,0,14,4,11,15,0,10,2,13,2,22,10,5,\\\n", " 6,13,1,13,10,11,4,7,9,12,8,16,15,14,5,10,12,9,8,0,5,13,13,6,8,4,13,15,7,11,6,23,1])\n", "\n", "def SampleExamProblem5(exponentialSamples):\n", " '''return the 95% confidence interval as a 2-tuple for the unknown rate parameter lambda* \n", " from n IID Exponential(lambda*) trials in the input numpy array called exponentialSamples'''\n", " XXX\n", " XXX\n", " XXX\n", " lower95CI=XXX\n", " upper95CI=XXX\n", " return (lower95CI,upper95CI)\n", "\n", "# do NOT change anything below\n", "lowerCISampleExamProblem5,upperCISampleExamProblem5 = SampleExamProblem5(sampleWaitingTimes)\n", "print (\"The 95% CI for lambda in the Orbiter Waiting time experiment = \")\n", "print (lowerCISampleExamProblem5,upperCISampleExamProblem5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sample Exam Problem 5 Solution\n", "\n", "We can obtain the 95% Confidence Interval for the $\\lambda^*$ for the experiment based on $n$ IID $Exponential(\\lambda)$ trials, by hand or using SageMath symbolic computations (typically both).\n", "\n", "Let $X_1,X_2,\\ldots,X_n \\overset{IID}{\\sim} Exponential(\\lambda^*)$. \n", "\n", "We saw that the ML estimator of $\\lambda^* \\in (0,\\infty)$ is $\\widehat{\\Lambda}_n = 1/\\, \\overline{X}_n$ and its ML estimate is $\\widehat{\\lambda}_n=1/\\, \\overline{x}_n$, where $x_1,x_2,\\ldots,x_n$ are our observed data.\n", "\n", "Let us obtain $I_1$, the Fisher Information of one sample, for this experiment to find the standard error:\n", "\n", "$$\n", "\\widehat{\\mathsf{se}}_n(\\widehat{\\Lambda}_n) = \\frac{1}{\\sqrt{n \\left. I_1 \\right\\vert_{\\lambda=\\widehat{\\lambda}_n}}}\n", "$$\n", "\n", "and construct an approximate $95\\%$ confidence interval for $\\lambda^*$ using the asymptotic normality of its ML estimator $\\widehat{\\Lambda}_n$.\n", "\n", "Since the probability density function $f(x;\\lambda)=\\lambda e^{-\\lambda x}$, for $x\\in [0,\\infty)$, we have,\n", "\n", "$$\n", "\\begin{align}\n", "I_1 &= - E \\left( \\frac{\\partial^2 \\log f(X;\\lambda)}{\\partial^2 \\lambda} \\right) = - \\int_{x \\in [0,\\infty)} \\left( \\frac{\\partial^2 \\log \\left( \\lambda e^{-\\lambda x} \\right)}{\\partial^2 \\lambda} \\right) \\lambda e^{-\\lambda x} \\ dx\n", "\\end{align}\n", "$$\n", "\n", "Let us compute the above integrand next.\n", "$$\n", "\\begin{align}\n", "\\frac{\\partial^2 \\log \\left( \\lambda e^{-\\lambda x} \\right)}{\\partial^2 \\lambda}\n", "&:=\n", "\\frac{\\partial}{\\partial \\lambda} \\left( \\frac{\\partial}{\\partial \\lambda} \\left( \\log \\left( \\lambda e^{-\\lambda x} \\right) \\right) \\right)\n", "= \\frac{\\partial}{\\partial \\lambda} \\left( \\frac{\\partial}{\\partial \\lambda} \\left( \\log(\\lambda) + \\log(e^{-\\lambda x} \\right) \\right) \\\\\n", "&= \\frac{\\partial}{\\partial \\lambda} \\left( \\frac{\\partial}{\\partial \\lambda} \\left( \\log(\\lambda) -\\lambda x \\right) \\right)\n", "= \\frac{\\partial}{\\partial \\lambda} \\left( {\\lambda}^{-1} - x \\right) = - \\lambda^{-2} - 0 = -\\frac{1}{\\lambda^2}\n", "\\end{align}\n", "$$\n", "Now, let us evaluate the integral by recalling that the expectation of the constant $1$ is 1 for any RV $X$ governed by some parameter, say $\\theta$. For instance when $X$ is a continuous RV, $E_{\\theta}(1) = \\int_{x \\in \\mathbb{X}} 1 \\ f(x;\\theta) = \\int_{x \\in \\mathbb{X}} \\ f(x;\\theta) = 1$. Therefore, the Fisher Information of one sample is\n", "$$\n", "\\begin{align}\n", "I_1(\\theta) = - \\int_{x \\in \\mathbb{X} = [0,\\infty)} \\left( \\frac{\\partial^2 \\log \\left( \\lambda e^{-\\lambda x} \\right)}{\\partial^2 \\lambda} \\right) \\lambda e^{-\\lambda x} \\ dx\n", " &= - \\int_{0}^{\\infty} \\left(-\\frac{1}{\\lambda^2} \\right) \\lambda e^{-\\lambda x} \\ dx \\\\\n", "& = - \\left(-\\frac{1}{\\lambda^2} \\right) \\int_{0}^{\\infty} \\lambda e^{-\\lambda x} \\ dx = \\frac{1}{\\lambda^2} \\ 1 = \\frac{1}{\\lambda^2}\n", "\\end{align}\n", "$$\n", "Now, we can compute the desired estimated standard error, by substituting in the ML estimate $\\widehat{\\lambda}_n = 1/(\\overline{x}_n) := 1 / \\left( \\sum_{i=1}^n x_i \\right)$ of $\\lambda^*$, as follows:\n", "$$\n", "\\widehat{\\mathsf{se}}_n(\\widehat{\\Lambda}_n) \n", "= \\frac{1}{\\sqrt{n \\left. I_1 \\right\\vert_{\\lambda=\\widehat{\\lambda}_n}}}\n", "= \\frac{1}{\\sqrt{n \\frac{1}{\\widehat{\\lambda}_n^2} }}\n", "= \\frac{\\widehat{\\lambda}_n}{\\sqrt{n}}\n", "= \\frac{1}{\\sqrt{n} \\ \\overline{x}_n}\n", "$$\n", "Using $\\widehat{\\mathsf{se}}_n(\\widehat{\\lambda}_n)$ we can construct an approximate $95\\%$ confidence interval $C_n$ for $\\lambda^*$, due to the asymptotic normality of the ML estimator of $\\lambda^*$, as follows:\n", "$$\n", "C_n\n", "= \\widehat{\\lambda}_n \\pm 2 \\frac{\\widehat{\\lambda}_n}{\\sqrt{n}}\n", "= \\frac{1}{\\overline{x}_n} \\pm 2 \\frac{1}{\\sqrt{n} \\ \\overline{x}_n} .\n", "$$\n", "Let us compute the ML estimate and the $95\\%$ confidence interval for the rate parameter for the waiting times at the Orbiter bus-stop. The sample mean $\\overline{x}_{132}=9.0758$ and the ML estimate is:\n", "$$\\widehat{\\lambda}_{132}=1/\\,\\overline{x}_{132}=1/9.0758=0.1102 ,$$\n", "and the $95\\%$ confidence interval is:\n", "$$\n", "C_n\n", "= \\widehat{\\lambda}_{132} \\pm 2 \\frac{\\widehat{\\lambda}_{132}}{\\sqrt{132}}\n", "= \\frac{1}{\\overline{x}_{132}} \\pm 2 \\frac{1}{\\sqrt{132} \\, \\overline{x}_{132}} = 0.1102 \\pm 2 \\cdot 0.0096 = [0.091, 0.129] .\n", "$$\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "logfx = log(lam*e^(-lam*x))\n", "d2logfx = -1/lam^2\n", "FisherInformation1 = lam^(-2)\n", "StdErr = lam/sqrt(n)\n", "EstStdErr = 1/(sqrt(n)*sampMean)\n" ] }, { "data": { "text/plain": [ "(1/sampMean - 2/(sqrt(n)*sampMean), 1/sampMean + 2/(sqrt(n)*sampMean))" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sample Exam Problem 5 Solution\n", "# solution is straightforward by following these steps symbolically\n", "# or you can do it by hand with pen/paper or do both to be safe\n", "\n", "## STEP 1 - define the variables you need\n", "lam,x,n = var('lam','x','n')\n", "\n", "## STEP 2 - get symbolic expression for the likelihood of one sample\n", "logfx = log(lam*exp(-lam*x)).full_simplify()\n", "print (\"logfx = \", logfx)\n", "\n", "## STEP 3 - find second derivate of expression from STEP 2 w.r.t. parameter\n", "d2logfx = logfx.diff(lam,2).full_simplify()\n", "print (\"d2logfx = \", d2logfx)\n", "\n", "## STEP 4 - to get Fisher Information of one sample\n", "## integrate d2logfx * f(x) over x in [0,Infinity), f(x) id PDF lam*exp(-lam*x)\n", "assume(lam>0) # usually you need make such assume's for integrate to work - see suggestions in error messages\n", "FisherInformation1 = -integrate(d2logfx*lam*exp(-lam*x),x,0,Infinity)\n", "print (\"FisherInformation1 = \",FisherInformation1)\n", "\n", "## STEP 5 - get Standard Error from FisherInformation1\n", "StdErr = 1/sqrt(n*FisherInformation1)\n", "print (\"StdErr = \",StdErr)\n", "\n", "## STEP 6 - get Standard Error from Standard Error and MLE or lamHat\n", "# lamHat = 1/xBar = 1/sampleMean; know from before\n", "lamHat,sampMean = var('lamHat','sampMean')\n", "lamHat = 1/sampMean\n", "EstStdErr = StdErr.subs(lam=lamHat)\n", "print (\"EstStdErr = \",EstStdErr)\n", "\n", "## STEP 7 - Get lower and upper 95% CI\n", "(lamHat-2*EstStdErr, lamHat+2*EstStdErr)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The 95% CI for lambda in the Orbiter Waiting time experiment = \n", "0.09100312972775282 0.12936414907024382\n" ] } ], "source": [ "# Sample Exam Problem 5 Solution\n", "# Only replace the XXX below, do not change the function naemes or parameters\n", "import numpy as np\n", "sampleWaitingTimes = np.array([8,3,7,18,18,3,7,9,9,25,0,0,25,6,10,0,10,8,16,9,1,5,16,6,4,1,3,21,0,28,3,8,6,6,11,\\\n", " 8,10,15,0,8,7,11,10,9,12,13,8,10,11,8,7,11,5,9,11,14,13,5,8,9,12,10,13,6,11,13,0,\\\n", " 0,11,1,9,5,14,16,2,10,21,1,14,2,10,24,6,1,14,14,0,14,4,11,15,0,10,2,13,2,22,10,5,\\\n", " 6,13,1,13,10,11,4,7,9,12,8,16,15,14,5,10,12,9,8,0,5,13,13,6,8,4,13,15,7,11,6,23,1])\n", "\n", "def SampleExamProblem5(exponentialSamples):\n", " '''return the 95% confidence interval as a 2-tuple for the unknown rate parameter lambda* \n", " from n IID Exponential(lambda*) trials in the input numpy array called exponentialSamples'''\n", " sampleMean = exponentialSamples.mean()\n", " n=len(exponentialSamples)\n", " correction=RR(2/(sqrt(n)*sampleMean)) # you can also replace RR by float here or you get expressions\n", " lower95CI=1.0/sampleMean - correction\n", " upper95CI=1.0/sampleMean + correction\n", " return (lower95CI,upper95CI)\n", "\n", "# do NOT change anything below\n", "lowerCISampleExamProblem5,upperCISampleExamProblem5 = SampleExamProblem5(sampleWaitingTimes)\n", "print (\"The 95% CI for lambda in the Orbiter Waiting time experiment = \")\n", "print (lowerCISampleExamProblem5,upperCISampleExamProblem5)" ] }, { "cell_type": "markdown", "metadata": { "lx_assignment_number": "3", "lx_problem_cell_type": "PROBLEM" }, "source": [ "---\n", "## Assignment 3, PROBLEM 5\n", "Maximum Points = 3" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "lx_assignment_number": "3", "lx_assignment_type": "ASSIGNMENT", "lx_assignment_type2print": "Assignment", "lx_problem_cell_type": "PROBLEM", "lx_problem_number": "5", "lx_problem_points": "3" }, "source": [ "\n", "Obtain the 95% CI based on the asymptotic normality of the MLE for the mean paramater $\\lambda$ based on $n$ IID $Poisson(\\lambda^*)$ trials.\n", "\n", "Recall that a random variable $X \\sim Poisson(\\lambda)$ if its probability mass function is:\n", "\n", "$$\n", "f(x; \\lambda) = \\exp{(-\\lambda)} \\frac{\\lambda^x}{x!}, \\quad \\lambda > 0, \\quad x \\in \\{0,1,2,\\ldots\\}\n", "$$\n", "\n", "The MLe $\\widehat{\\lambda}_n = \\overline{x}_n$, the sample mean.\n", "\n", "Work out your answer and express it in the next cell by replacing `XXX`s." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "deletable": false, "lx_assignment_number": "3", "lx_assignment_type": "ASSIGNMENT", "lx_assignment_type2print": "Assignment", "lx_problem_cell_type": "PROBLEM", "lx_problem_number": "5", "lx_problem_points": "3" }, "outputs": [ { "ename": "NameError", "evalue": "name 'XXX' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 16\u001b[0m \u001b[0;31m# do NOT change anything below\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 17\u001b[0;31m \u001b[0mlowerCISampleExamProblem5\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mupperCISampleExamProblem5\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mAssignment3Problem5\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msamplePoissonCounts\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 18\u001b[0m \u001b[0mprint\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m\"The 95% CI for lambda based on IID Poisson(lambda) data in samplePoissonCounts = \"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 19\u001b[0m \u001b[0mprint\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mlowerCISampleExamProblem5\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mupperCISampleExamProblem5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36mAssignment3Problem5\u001b[0;34m(poissonSamples)\u001b[0m\n\u001b[1;32m 7\u001b[0m '''return the 95% confidence interval as a 2-tuple for the unknown parameter lambda* \n\u001b[1;32m 8\u001b[0m from n IID Poisson(lambda*) trials in the input numpy array called samplePoissonCounts'''\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0mXXX\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 10\u001b[0m \u001b[0mXXX\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mXXX\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'XXX' is not defined" ] } ], "source": [ "# Only replace the XXX below, do not change the function naemes or parameters\n", "import numpy as np\n", "samplePoissonCounts = np.array([0,5,11,5,6,8,9,0,1,14,2,4,4,11,2,12,10,5,6,1,7,9,8,0,5,7,11,6,0,1])\n", "\n", "def Assignment3Problem5(poissonSamples):\n", " '''return the 95% confidence interval as a 2-tuple for the unknown parameter lambda* \n", " from n IID Poisson(lambda*) trials in the input numpy array called samplePoissonCounts'''\n", " XXX\n", " XXX\n", " XXX\n", " lower95CI=XXX\n", " upper95CI=XXX\n", " return (lower95CI,upper95CI)\n", "\n", "# do NOT change anything below\n", "lowerCISampleExamProblem5,upperCISampleExamProblem5 = Assignment3Problem5(samplePoissonCounts)\n", "print (\"The 95% CI for lambda based on IID Poisson(lambda) data in samplePoissonCounts = \")\n", "print (lowerCISampleExamProblem5,upperCISampleExamProblem5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Hypothesis Testing\n", "\n", "The subset of *all posable hypotheses* that have the property of *[falsifiability](https://en.wikipedia.org/wiki/Falsifiability)* constitute the space of *scientific hypotheses*. \n", "Roughly, a falsifiable statistical hypothesis is one for which a statistical experiment can be designed to produce data or empirical observations that an experimenter can use to falsify or reject it. \n", "In the *statistical decision problem of hypothesis testing*, we are interested in empirically falsifying a scientific hypothesis, i.e. we attempt to reject a hypothesis on the basis of empirical observations or data. \n", "Thus, hypothesis testing has its roots in the *philosophy of science* and is based on *Karl Popper's falsifiability criterion for demarcating scientific hypotheses from the set of all posable hypotheses*.\n", "\n", "## Introduction\n", "Usually, the hypothesis we **attempt to reject or falsify** is called the **null hypothesis** or $H_0$ and its complement is called the **alternative hypothesis** or $H_1$. \n", "For example, consider the following two hypotheses:\n", "\n", "- $H_0$: The average waiting time at an Orbiter bus stop *is less than or equal to* $10$ minutes.\n", "- $H_1$: The average waiting time at an Orbiter bus stop *is more than* $10$ minutes.\n", "\n", "If the sample mean $\\overline{x}_n$ is much larger than $10$ minutes then we may be inclined to reject the null hypothesis that the average waiting time is less than or equal to $10$ minutes. \n", "\n", "Suppose we are interested in the following slightly different hypothesis test for the Orbiter bus stop problem:\n", "\n", "- $H_0$: The average waiting time at an Orbiter bus stop *is equal to* $10$ minutes.\n", "- $H_1$: The average waiting time at an Orbiter bus stop *is not* $10$ minutes.\n", "\n", "Once again we can use the sample mean as the test statistic, but this time we may be inclined to reject the null hypothesis if the sample mean $\\overline{x}_n$ is much larger than *or* much smaller than $10$ minutes. \n", "The procedure for rejecting such a null hypothesis is called the **Wald test** we are about to see.\n", "\n", "More generally, suppose we have the following parametric experiment based on $n$ IID trials:\n", "$$\n", "X_1,X_2,\\ldots,X_n \\overset{IID}{\\sim} F(x_1;\\theta^*), \\quad \\text{ with an unknown (and fixed) } \\theta^* \\in \\mathbf{\\Theta} \\ .\n", "$$\n", "\n", "Let us partition the parameter space $\\mathbf{\\Theta}$ into $\\mathbf{\\Theta}_0$, the null parameter space, and $\\mathbf{\\Theta}_1$, the alternative parameter space, i.e.,\n", "$$\\mathbf{\\Theta}_0 \\cup \\mathbf{\\Theta}_1 = \\mathbf{\\Theta}, \\qquad \\text{and} \\qquad \\mathbf{\\Theta}_0 \\cap \\mathbf{\\Theta}_1 = \\emptyset \\ .$$\n", "\n", "Then, we can formalise testing the null hypothesis versus the alternative as follows:\n", "$$\n", "H_0 : \\theta^* \\in \\mathbf{\\Theta}_0 \\qquad \\text{versus} \\qquad H_1 : \\theta^* \\subset \\mathbf{\\Theta}_1 \\ .\n", "$$\n", "\n", "The basic idea involves finding an appropriate **rejection region** $\\mathbb{X}_R$ within the **data space** $\\mathbb{X}$ and rejecting $H_0$ if the observed data $x:=(x_1,x_2,\\ldots,x_n)$ falls inside the rejection region $\\mathbb{X}_R$,\n", "$$\n", "\\text{If $x:=(x_1,x_2,\\ldots,x_n) \\in \\mathbb{X}_R \\subset \\mathbb{X}$, then reject $H_0$, else do not reject $H_0$.}\n", "$$\n", "Typically, the rejection region $\\mathbb{X}_R$ is of the form:\n", "$$\n", "\\mathbb{X}_R := \\{ x:=(x_1,x_2,\\ldots,x_n) : T(x) > c \\}\n", "$$\n", "where, $T$ is the **test statistic** and $c$ is the **critical value**. Thus, the problem of finding $\\mathbb{X}_R$ boils down to that of finding $T$ and $c$ that are appropriate. Once the rejection region is defined, the possible outcomes of a hypothesis test are summarised in the following table.\n", "\n", "\n", "The outcomes of a hypothesis test, in general, are:\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
'true state of nature'Do not reject $H_0$
Reject $H_0$
\n", "

$H_0$ is true

\n", "

 

\n", "
\n", "

OK 

\n", "
\n", "

Type I error

\n", "
\n", "

$H_0$ is false

\n", "
Type II errorOK
\n", "\n", "So, intuitively speaking, we want a small probability that we reject $H_0$ when $H_0$ is true (minimise Type I error). Similarly, we want to minimise the probability that we fail to reject $H_0$ when $H_0$ is false (type II error). Let us formally see how to achieve these goals.\n", "\n", "## Power, Size and Level of a Test\n", "\n", "### Power Function \n", "\n", "The **power function** of a test with rejection region $\\mathbb{X}_R$ is\n", "$$\n", "\\boxed{\n", "\\beta(\\theta) := P_{\\theta}(x \\in \\mathbb{X}_R)\n", "}\n", "$$\n", "So $\\beta(\\theta)$ is the power of the test if the data were generated under the parameter value $\\theta$, i.e. the probability that the observed data $x$, sampled from the distribution specified by $\\theta$, falls in the rejection region $\\mathbb{X}_R$ and thereby leads to a rejection of the null hypothesis.\n", "\n", "### Size of a test\n", "The $\\mathsf{size}$ of a test with rejection region $\\mathbb{X}_R$ is the supreme power under the null hypothesis, i.e.~the supreme probability of rejecting the null hypothesis when the null hypothesis is true:\n", "$$\n", "\\boxed{\n", "\\mathsf{size} := \\sup_{\\theta \\in \\mathbf{\\Theta}_0} \\beta(\\theta) := \\sup_{\\theta \\in \\mathbf{\\Theta}_0} P{\\theta}(x \\in \\mathbb{X}_R) \\ .\n", "}\n", "$$\n", "The $\\mathsf{size}$ of a test is often denoted by $\\alpha$. A test is said to have $\\mathsf{level}$ $\\alpha$ if its $\\mathsf{size}$ is less than or equal to $\\alpha$.\n", "\n", "\n", "## Wald test\n", "\n", "The Wald test is based on a direct relationship between the $1-\\alpha$ confidence interval and a $\\mathsf{size}$ $\\alpha$ test. It can be used for testing simple hypotheses involving a scalar parameter.\n", "\n", "### Definition\n", "\n", "Let $\\widehat{\\Theta}_n$ be an asymptotically normal estimator of the fixed and possibly unknown parameter $\\theta^* \\in \\mathbf{\\Theta} \\subset \\mathbb{X}$ in the parametric IID experiment:\n", "\n", "$$\n", "X_1,X_2,\\ldots,X_n \\overset{IID}{\\sim} F(x_1;\\theta^*) \\enspace .\n", "$$\n", "\n", "Consider testing:\n", "\n", "$$\n", "\\boxed{H_0: \\theta^* = \\theta_0 \\qquad \\text{versus} \\qquad H_1: \\theta^* \\neq \\theta_0 \\enspace .}\n", "$$\n", "\n", "Suppose that the null hypothesis is true and the estimator $\\widehat{\\Theta}_n$ of $\\theta^*=\\theta_0$ is asymptotically normal:\n", "\n", "$$\n", "\\boxed{\n", "\\theta^*=\\theta_0, \\qquad \\frac{\\widehat{\\Theta}_n - \\theta_0}{\\widehat{\\mathsf{se}}_n} \\overset{d}{\\to} Normal(0,1) \\enspace .}\n", "$$\n", "\n", "Then, **the Wald test based on the test statistic $W$** is:\n", "$$\n", "\\boxed{\n", "\\text{Reject $H_0$ when $|W|>z_{\\alpha/2}$, where $W:=W((X_1,\\ldots,X_n))=\\frac{\\widehat{\\Theta}_n ((X_1,\\ldots,X_n)) - \\theta_0}{\\widehat{\\mathsf{se}}_n}$.}\n", "}\n", "$$\n", "The rejection region for the Wald test is:\n", "$$\n", "\\boxed{\n", "\\mathbb{X}_R = \\{ x:=(x_1,\\ldots,x_n) : |W (x_1,\\ldots,x_n) | > z_{\\alpha/2} \\} \\enspace .\n", "}\n", "$$\n", "\n", "### Asymptotic $\\mathsf{size}$ of a Wald test\n", "\n", "As the sample size $n$ approaches infinity, the $\\mathsf{size}$ of the Wald test approaches $\\alpha$ :\n", "\n", "$$\n", "\\boxed{\n", "\\mathsf{size} = P_{\\theta_0} \\left( |W| > z_{\\alpha/2} \\right) \\to \\alpha \\enspace .}\n", "$$\n", "\n", "**Proof:** Let $Z \\sim Normal(0,1)$. The $\\mathsf{size}$ of the Wald test, i.e.~the supreme power under $H_0$ is:\n", "\n", "$$\n", "\\begin{align}\n", "\\mathsf{size}\n", "& := \\sup_{\\theta \\in \\mathbf{\\Theta}_0} \\beta(\\theta) := \\sup_{\\theta \\in \\{\\theta_0\\}} P_{\\theta}(x \\in \\mathbb{X}_R) = P_{\\theta_0}(x \\in \\mathbb{X}_R) \\\\\n", "& = P_{\\theta_0} \\left( |W| > z_{\\alpha/2} \\right) = P_{\\theta_0} \\left( \\frac{|\\widehat{\\theta}_n - \\theta_0|}{\\widehat{\\mathsf{se}}_n} > z_{\\alpha/2} \\right) \\\\\n", "& \\to P \\left( |Z| > z_{\\alpha/2} \\right)\\\\\n", "& = \\alpha \\enspace .\n", "\\end{align}\n", "$$\n", "\n", "Next, let us look at the power of the Wald test when the null hypothesis is false.\n", "\n", "### Asymptotic power of a Wald test\n", "\n", "Suppose $\\theta^* \\neq \\theta_0$. The power $\\beta(\\theta^*)$, which is the probability of correctly rejecting the null hypothesis, is approximately equal to:\n", "\n", "$$\n", "\\boxed{\n", "\\Phi \\left( \\frac{\\theta_0-\\theta^*}{\\widehat{\\mathsf{se}}_n} - z_{\\alpha/2} \\right) +\n", "\\left( 1- \\Phi \\left( \\frac{\\theta_0-\\theta^*}{\\widehat{\\mathsf{se}}_n} + z_{\\alpha/2} \\right) \\right) \\enspace ,\n", "}\n", "$$\n", "where, $\\Phi$ is the DF of $Normal(0,1)$ RV. Since ${\\widehat{\\mathsf{se}}_n} \\to 0$ as $n \\to 0$ the power increase with sample $\\mathsf{size}$ $n$. Also, the power increases when $|\\theta_0-\\theta^*|$ is large.\n", "\n", "Now, let us make the connection between the $\\mathsf{size}$ $\\alpha$ Wald test and the $1-\\alpha$ confidence interval explicit.\n", "\n", "### The $\\mathsf{size}$ Wald test\n", "\n", "The $\\mathsf{size}$ $\\alpha$ Wald test rejects:\n", "\n", "$$\n", "\\boxed{\n", "\\text{ $H_0: \\theta^*=\\theta_0$ versus $H_1: \\theta^* \\neq \\theta_0$ if and only if $\\theta_0 \\notin C_n := (\\widehat{\\theta}_n-{\\widehat{\\mathsf{se}}_n} z_{\\alpha/2}, \\widehat{\\theta}_n+{\\widehat{\\mathsf{se}}_n} z_{\\alpha/2})$.\n", "}}\n", "$$\n", "\n", "$$\\boxed{\\text{Therefore, testing the hypothesis is equivalent to verifying whether the null value $\\theta_0$ is in the confidence interval.}}$$\n", "\n", "\n", "### Example: Wald test for the mean waiting times at our Orbiter bus-stop\n", "\n", "Let us use the Wald test to attempt to reject the null hypothesis that the mean waiting time at our Orbiter bus-stop is $10$ minutes under an IID $Exponential(\\lambda^*)$ model. Let $\\alpha=0.05$ for this test. We can formulate this test as follows:\n", "$$\n", "H_0: \\lambda^* = \\lambda_0= \\frac{1}{10} \\quad \\text{versus} \\quad H_1: \\lambda^* \\neq \\frac{1}{10}, \\quad \\text{where, } \\quad X_1\\ldots,X_{132} \\overset{IID}{\\sim} Exponential(\\lambda^*) \\enspace .\n", "$$\n", "We already obtained the $95\\%$ confidence interval based on its MLE's asymptotic normality property to be $[0.0914, 0.1290]$. \n", "\n", "$$\\boxed{\\text{Since our null value $\\lambda_0=0.1$ belongs to this confidence interval, we fail to reject the null hypothesis from a $\\mathsf{size}$ $\\alpha=0.05$ Wald test.}}$$\n", "\n", "We will revisit this example in a more computationally explicit fasion soon below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A Live Example: Simulating Bernoulli Trials to understand Wald Tests\n", "\n", "Let's revisit the MLE for the $Bernoulli(\\theta^*)$ model with $n$ IID trails, we have already seen, and test the null hypothesis that the unknown $\\theta^* = \\theta_0 = 0.5$.\n", "\n", "Thus, we are interested in the null hypothesis $H_0$ versus the alternative hypothesis $H_1$:\n", "\n", "$$\\displaystyle{H_0: \\theta^*=\\theta_0 \\quad \\text{ versus } \\quad H_1: \\theta^* \\neq \\theta_0, \\qquad \\text{ with }\\theta_0=0.5}$$\n", "\n", "We can test this hypothesis with Type I error at $\\alpha$ using the **size-$\\alpha$ Wald Test** that builds on the asymptotic normality of the MLE, i.e., \n", "$$\\displaystyle{ \\frac{\\widehat{\\theta}_n - \\theta_0}{\\widehat{se}_n} \\overset{d}{\\to} Normal(0,1)}$$\n", "\n", "The size-$\\alpha$ Wald test is:\n", "\n", "$$\n", "\\boxed{\n", "\\text{Reject } \\ H_0 \\quad \\text{ when } |W| > z_{\\alpha/2}, \\quad \\text{ where, } \\quad W = \\frac{\\widehat{\\theta}_n - \\theta_0}{\\widehat{se}_n}\n", "}\n", "$$" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.4064216928154865\n" ] }, { "data": { "text/plain": [ "False" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "# do a live simulation ... to implement this test...\n", "# simulate from Bernoulli(theta0) n samples\n", "# make mle\n", "# construct Wald test\n", "# make a decision - i.e., decide if you will reject or fail to reject the H0: theta0=0.5\n", "trueTheta=0.45\n", "n=20\n", "myBernSamples=np.array([floor(random()+trueTheta) for i in range(0,n)])\n", "#myBernSamples\n", "mle=myBernSamples.mean() # 1/mean\n", "mle\n", "NullTheta=0.5\n", "se=sqrt(mle*(1.0-mle)/n)\n", "W=(mle-NullTheta)/se\n", "print (abs(W))\n", "alpha = 0.05\n", "abs(W) > 2 # alpha=0.05, so z_{alpha/2} =1.96 approx=2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sample Exam Problem 6 \n", "\n", "Consider the following model for the parity (odd=1, even=0) of the first Lotto ball to pop out of the NZ Lotto machine. We had $n=1114$ IID trials:\n", "\n", "$$\\displaystyle{X_1,X_2,\\ldots,X_{1114} \\overset{IID}{\\sim} Bernoulli(\\theta^*)}$$\n", "\n", "and know from this dataset that the number of odd balls is $546=\\sum_{i=1}^{1114} x_i$.\n", "\n", "Your task is to perform a Wald Test of size $\\alpha=0.05$ to try to reject the null hypothesis that the chance of seeing an odd ball out of the NZ Lotto machine is exactly $1/2$, i.e.,\n", "\n", "$$\\displaystyle{H_0: \\theta^*=\\theta_0 \\quad \\text{ versus } \\quad H_1: \\theta^* \\neq \\theta_0, \\qquad \\text{ with }\\theta_0=0.5}$$\n", "\n", "Show you work by replacing `XXX`s with the right expressions in the next cell." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'XXX' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;31m## STEP 1: get the MLE thetaHat\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mthetaHat\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mXXX\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m\"mle thetaHat = \"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mthetaHat\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'XXX' is not defined" ] } ], "source": [ "# Sample Exam Problem 6 Problem\n", "\n", "## STEP 1: get the MLE thetaHat\n", "thetaHat=XXX \n", "print (\"mle thetaHat = \",thetaHat)\n", "\n", "## STEP 2: get the NullTheta or theta0\n", "NullTheta=XXX\n", "print (\"Null value of theta under H0 = \", NullTheta)\n", "\n", "## STEP 3: get estimated standard error\n", "seTheta=XXX # for Bernoulli trials from earleir in 10.ipynb\n", "print (\"estimated standard error\",seTheta)\n", "\n", "# STEP 4: get Wald Statistic\n", "W=XXX\n", "print (\"Wald staatistic = \",W)\n", "\n", "# STEP 5: conduct the size alpha=0.05 Wald test\n", "# do NOT change anything below\n", "rejectNullSampleExamProblem6 = abs(W) > 2.0 # alpha=0.05, so z_{alpha/2} =1.96 approx=2.0\n", "if (rejectNullSampleExamProblem6):\n", " print (\"we reject the null hypothesis that theta_0=0.5\")\n", "else:\n", " print (\"we fail to reject the null hypothesis that theta_0=0.5\")" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mle thetaHat = 273/557\n", "Null value of theta under H0 = 0.500000000000000\n", "estimated standard error 0.0149776163832414\n", "Wald staatistic = -0.659272243178650\n", "we fail to reject the null hypothesis that theta_0=0.5\n" ] } ], "source": [ "# Sample Exam Problem 6 Solution\n", "\n", "## STEP 1: get the MLE thetaHat\n", "n=1114 # sample size\n", "thetaHat=546/n # MLE is sample mean for IID Bernoulli trials\n", "print (\"mle thetaHat = \",thetaHat)\n", "\n", "## STEP 2: get the NullTheta or theta0\n", "NullTheta=0.5\n", "print (\"Null value of theta under H0 = \", NullTheta)\n", "\n", "## STEP 3: get estimated standard error\n", "seTheta=sqrt(thetaHat*(1.0-thetaHat)/n) # for Bernoulli trials from earleir in 10.ipynb\n", "print (\"estimated standard error\",seTheta)\n", "\n", "# STEP 4: get Wald Statistic\n", "W=(thetaHat-NullTheta)/seTheta\n", "print (\"Wald staatistic = \",W)\n", "\n", "# STEP 5: conduct the size alpha=0.05 Wald test\n", "rejectNullSampleExamProblem6 = abs(W) > 2.0 # alpha=0.05, so z_{alpha/2} =1.96 approx=2.0\n", "if (rejectNullSampleExamProblem6):\n", " print (\"we reject the null hypothesis that theta_0=0.5\")\n", "else:\n", " print (\"we fail to reject the null hypothesis that theta_0=0.5\")" ] }, { "cell_type": "markdown", "metadata": { "lx_assignment_number": "3", "lx_problem_cell_type": "PROBLEM" }, "source": [ "---\n", "## Assignment 3, PROBLEM 6\n", "Maximum Points = 3" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "lx_assignment_number": "3", "lx_assignment_type": "ASSIGNMENT", "lx_assignment_type2print": "Assignment", "lx_problem_cell_type": "PROBLEM", "lx_problem_number": "6", "lx_problem_points": "3" }, "source": [ "\n", "For the Orbiter waiting time problem, assuming IID trials as follows: \n", "\n", "$$\\displaystyle{X_1,X_2,\\ldots,X_{n} \\overset{IID}{\\sim} Exponential(\\lambda^*)}$$\n", "\n", "Your task is to perform a Wald Test of size $\\alpha=0.05$ to try to reject the null hypothesis that the waiting time at the Orbiter bus-stop, i.e., the inter-arrival time between buses, is exactly $10$ minutes:\n", "\n", "$$\\displaystyle{H_0: \\lambda^*=\\lambda_0 \\quad \\text{ versus } \\quad H_1: \\lambda^* \\neq \\lambda_0, \\qquad \\text{ with }\\lambda_0=0.1}$$\n", "\n", "Show you work by replacing `XXX`s with the right expressions in the next cell." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "deletable": false, "lx_assignment_number": "3", "lx_assignment_type": "ASSIGNMENT", "lx_assignment_type2print": "Assignment", "lx_problem_cell_type": "PROBLEM", "lx_problem_number": "6", "lx_problem_points": "3" }, "outputs": [ { "ename": "NameError", "evalue": "name 'XXX' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;31m#test H0: lambda=0.1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;31m## STEP 1: get the MLE thetaHat\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0mlambdaHat\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mXXX\u001b[0m \u001b[0;31m# you need to use sampleWaitingTimes here!\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0mprint\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m\"mle lambdaHat = \"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mlambdaHat\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'XXX' is not defined" ] } ], "source": [ "import numpy as np\n", "sampleWaitingTimes = np.array([8,3,7,18,18,3,7,9,9,25,0,0,25,6,10,0,10,8,16,9,1,5,16,6,4,1,3,21,0,28,3,8,6,6,11,\\\n", " 8,10,15,0,8,7,11,10,9,12,13,8,10,11,8,7,11,5,9,11,14,13,5,8,9,12,10,13,6,11,13,0,\\\n", " 0,11,1,9,5,14,16,2,10,21,1,14,2,10,24,6,1,14,14,0,14,4,11,15,0,10,2,13,2,22,10,5,\\\n", " 6,13,1,13,10,11,4,7,9,12,8,16,15,14,5,10,12,9,8,0,5,13,13,6,8,4,13,15,7,11,6,23,1])\n", "\n", "#test H0: lambda=0.1\n", "## STEP 1: get the MLE thetaHat\n", "lambdaHat=XXX # you need to use sampleWaitingTimes here!\n", "print (\"mle lambdaHat = \",lambdaHat)\n", "\n", "## STEP 2: get the NullLambda or lambda0\n", "NullLambda=XXX\n", "print (\"Null value of lambda under H0 = \", NullLambda)\n", "\n", "## STEP 3: get estimated standard error\n", "seLambda=XXX # see Sample Exam Problem 5 in 10.ipynb\n", "print (\"estimated standard error\",seLambda)\n", "\n", "# STEP 4: get Wald Statistic\n", "W=XXX\n", "print (\"Wald statistic = \",W)\n", "\n", "# STEP 5: conduct the size alpha=0.05 Wald test\n", "# do NOT change anything below\n", "rejectNullAssignment3Problem6 = abs(W) > 2.0 # alpha=0.05, so z_{alpha/2} =1.96 approx=2.0\n", "if (rejectNullAssignment3Problem6):\n", " print (\"we reject the null hypothesis that lambda0=0.1\")\n", "else:\n", " print (\"we fail to reject the null hypothesis that lambda0=0.1\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## P-value\n", "\n", "It is desirable to have a more informative decision than simply reporting \"reject $H_0$\" or \"fail to reject $H_0$.\"\n", "\n", "For instance, we could ask whether the test rejects $H_0$ for each $\\mathsf{size}=\\alpha$. \n", "Typically, if the test rejects at $\\mathsf{size}$ $\\alpha$ it will also reject at a larger $\\mathsf{size}$ $\\alpha' > \\alpha$. \n", "Therefore, there is a smallest $\\mathsf{size}$ $\\alpha$ at which the test rejects $H_0$ and we call this $\\alpha$ the $\\text{p-value}$ of the test.\n", "\n", "$$\\boxed{\\text{The smallest $\\alpha$ at which a $\\mathsf{size}$ $\\alpha$ test rejects the null hypothesis $H_0$ is the $\\text{p-value}$.}}$$\n" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "Graphics object consisting of 11 graphics primitives" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "p=text('Reject $H_0$?',(12,12)); p+=text('No',(30,10)); p+=text('Yes',(30,15)); p+=text('p-value',(70,10))\n", "p+=text('size',(65,4)); p+=text('$0$',(40,4)); p+=text('$1$',(90,4)); p+=points((59,5),rgbcolor='red',size=50)\n", "p+=line([(40,17),(40,5),(95,5)]); p+=line([(40,10),(59,10),(59,15),(90,15)]);\n", "p+=line([(68,9.5),(59.5,5.5)],rgbcolor='red'); p.show(axes=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Definition of p-value\n", "Suppose that for every $\\alpha \\in (0,1)$ we have a $\\mathsf{size}$ $\\alpha$ test with rejection region $\\mathbb{X}_{R,\\alpha}$ and test statistic $T$. Then,\n", "$$\n", "\\text{p-value} := \\inf \\{ \\alpha: T(X) \\in \\mathbb{X}_{R,\\alpha} \\} \\enspace .\n", "$$\n", "That is, the p-value is the smallest $\\alpha$ at which a $\\mathsf{size}$ $\\alpha$ test rejects the null hypothesis.\n", "\n", "### Understanding p-value\n", "If the evidence against $H_0$ is strong then the p-value will be small. However, a large p-value is not strong evidence in favour of $H_0$. This is because a large p-value can occur for two reasons:\n", "\n", "- $H_0$ is true.\n", "- $H_0$ is false but the test has low power (i.e., high Type II error).\n", "\n", "Finally, it is important to realise that *p-value is not the probability that the null hypothesis is true*, i.e. $\\text{p-value} \\, \\neq P(H_0|x)$, where $x$ is the data. The following itemisation of implications for the evidence scale is useful.\n", "\n", "The scale of the evidence against the null hypothesis $H_0$ in terms of the range of the p-values has the following interpretation that is commonly used:\n", "\n", "- P-value $\\in (0.00, 0.01]$ $\\implies$ Very strong evidence against $H_0$\n", "- P-value $\\in (0.01, 0.05]$ $\\implies$ Strong evidence against $H_0$\n", "- P-value $\\in (0.05, 0.10]$ $\\implies$ Weak evidence against $H_0$\n", "- P-value $\\in (0.10, 1.00]$ $\\implies$ Little or no evidence against $H_0$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we will see a convenient expression for the p-value for certain tests.\n", "\n", "### The p-value of a hypothesis test\n", "\n", "Suppose that the $\\mathsf{size}$ $\\alpha$ test based on the test statistic $T$ and critical value $c_{\\alpha}$ is of the form:\n", "\n", "$$\n", "\\text{Reject $H_0$ if and only if $T:=T((X_1,\\ldots,X_n))> c_{\\alpha}$,}\n", "$$\n", "\n", "then\n", "\n", "$$\n", "\\boxed{\n", "\\text{p-value} \\, = \\sup_{\\theta \\in \\mathbf{\\Theta}_0} P_{\\theta}(T((X_1,\\ldots,X_n)) \\geq t:=T((x_1,\\ldots,x_n))) \\enspace ,}\n", "$$\n", "\n", "where, $(x_1,\\ldots,x_n)$ is the observed data and $t$ is the observed value of the test statistic $T$. \n", "\n", "In words, **the p-value is the supreme probability under $H_0$ of observing a value of the test statistic the same as or more extreme than what was actually observed.**\n", "\n", "\n", "Let us revisit the Orbiter waiting times example from the p-value perspective.\n", "\n", "### Example: p-value for the parametric Orbiter bus waiting times experiment\n", "\n", "Let the waiting times at our bus-stop be $X_1,X_2,\\ldots,X_{132} \\overset{IID}{\\sim} Exponential(\\lambda^*)$. Consider the following testing problem:\n", "\n", "$$\n", "H_0: \\lambda^*=\\lambda_0=\\frac{1}{10} \\quad \\text{versus} \\quad H_1: \\lambda^* \\neq \\lambda_0 \\enspace .\n", "$$\n", "\n", "We already saw that the Wald test statistic is:\n", "\n", "$$\n", "W:=W(X_1,\\ldots,X_n)= \\frac{\\widehat{\\Lambda}_n-\\lambda_0}{\\widehat{\\mathsf{se}}_n(\\widehat{\\Lambda}_n)} = \\frac{\\frac{1}{\\overline{X}_n}-\\lambda_0}{\\frac{1}{\\sqrt{n}\\overline{X}_n}} \\enspace .\n", "$$\n", "\n", "The observed test statistic is:\n", "\n", "$$\n", "w=W(x_1,\\ldots,x_{132})=\n", "\\frac{\\frac{1}{\\overline{X}_{132}}-\\lambda_0}{\\frac{1}{\\sqrt{132}\\overline{X}_{132}}}\n", "= \\frac{\\frac{1}{9.0758}-\\frac{1}{10}}{\\frac{1}{\\sqrt{132} \\times 9.0758}} = 1.0618 \\enspace .\n", "$$\n", "Since, $W \\overset{d}{\\to} Z \\sim Normal(0,1)$, the p-value for this Wald test is:\n", "\n", "$$\n", "\\begin{align}\n", "\\text{p-value} \\, \n", "&= \\sup_{\\lambda \\in \\mathbf{\\Lambda}_0} P_{\\lambda} (|W|>|w|)= \\sup_{\\lambda \\in \\{\\lambda_0\\}} P_{\\lambda} (|W|>|w|) = P_{\\lambda_0} (|W|>|w|) \\\\\n", "& \\to P (|Z|>|w|)=2 \\Phi(-|w|)=2 \\Phi(-|1.0618|)=2 \\times 0.1442=0.2884 \\enspace .\n", "\\end{align}\n", "$$\n", "\n", "Therefore, there is little or no evidence against $H_0$ that the mean waiting time under an IID $Exponential$ model of inter-arrival times is exactly ten minutes.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preparation for Nonparametric Estimation and Testing\n", "### YouTry Later\n", "\n", "Python's `random` for sampling and sequence manipulation\n", "\n", "The Python `random` module, available in SageMath, provides a useful way of taking samples if you have already generated a 'population' to sample from, or otherwise playing around with the elements in a sequence. See http://docs.python.org/library/random.html for more details. Here we will try a few of them.\n", "\n", "The aptly-named sample function allows us to take a sample of a specified size from a sequence. We will use a list as our sequence:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[16, 95, 54, 24, 19, 51, 5, 50, 74, 70]" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "popltn = range(1, 101, 1) # make a population\n", "sample(popltn, 10) # sample 10 elements from it at random" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each call to sample will select unique elements in the list (note that 'unique' here means that it will not select the element at any particular position in the list more than once, but if there are duplicate elements in the list, such as with a list [1,2,4,2,5,3,1,3], then you may well get any of the repeated elements in your sample more than once). sample samples with replacement, which means that repeated calls to sample may give you samples with the same elements in." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n" ] } ], "source": [ "popltnWithDuplicates = list(range(1, 11, 1))*4 # make a population with repeated elements\n", "print(popltnWithDuplicates)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[7, 6, 3, 5, 8, 7, 4, 7, 1, 4]\n", "[1, 2, 8, 3, 5, 2, 4, 9, 1, 4]\n", "[8, 5, 9, 10, 6, 1, 7, 3, 2, 3]\n", "[9, 1, 10, 7, 8, 7, 5, 5, 3, 4]\n", "[7, 4, 5, 10, 1, 1, 10, 2, 2, 8]\n" ] } ], "source": [ "for i in range (5):\n", " print( sample(popltnWithDuplicates, 10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try experimenting with choice, which allows you to select one element at random from a sequence, and shuffle, which shuffles the sequence in place (i.e, the ordering of the sequence itself is changed rather than you being given a re-ordered copy of the list). It is probably easiest to use lists for your sequences. See how `shuffle` is creating permutations of the list. You could use `sample` and `shuffle` to emulate *permuations of k objects out of n* ...\n", "\n", "You may need to check the documentation to see how use these functions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#?sample" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#?shuffle" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#?choice" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "SageMath 9.1", "language": "sage", "name": "sagemath" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" }, "lx_course_instance": "2020", "lx_course_name": "Introduction to Data Science: A Comp-Math-Stat Approach", "lx_course_number": "1MS041" }, "nbformat": 4, "nbformat_minor": 2 }