Introduction to Data Science: A Comp-Math-Stat Approach

1MS041, 2021

©2021 Raazesh Sainudiin, Benny Avelin. Attribution 4.0 International (CC BY 4.0)

08. Pseudo-Random Numbers, Simulating from Some Discrete and Continuous Random Variables

In the last notebook, we started to look at how we can produce realisations from the most elementary $Uniform(0,1)$ random variable.

i.e., how can we produce samples $(x_1, x_2, \ldots, x_n)$ from $X_1, X_2, \ldots, X_n$ $\overset{IID}{\thicksim}$ $Uniform(0,1)$?

What is SageMath doing when we ask for random()?

We looked at how Modular arithmetic and number theory gives us pseudo-random number generators.

We used linear congruential generators (LCG) as simple pseudo-random number generators.

Remember that "pseudo-random" means that the numbers are not really random. We saw that some linear congruential generators (LCG) have much shorter, more predictable, patterns than others and we learned what makes a good LCG.

We introduced the pseudo-random number generator (PRNG) called the Mersenne Twister that we will use for simulation purposes in this course. It is based on more sophisticated theory than that of LCG but the basic principles of recurrence relations are the same.

The $Uniform(0,1)$ Random Variable

Recall that the $Uniform(0,1)$ random variable is the fundamental model as we can transform it to any other random variable, random vector or random structure. The PDF $f$ and DF $F$ of $X \sim Uniform(0,1)$ are:

$f(x) = \begin{cases} 0 & \text{if} \ x \notin [0,1] \\ 1 & \text{if} \ x \in [0,1] \end{cases}$

$F(x) = \begin{cases} 0 & \text{if} \ x < 0 \\ 1 & \text{if} \ x > 1 \\ x & \text{if} \ x \in [0,1] \end{cases}$

We use the Mersenne twister pseudo-random number generator to mimic independent and identically distributed draws from the $uniform(0,1)$ RV.

In Sage, we use the python random module to generate pseudo-random numbers for us. (We have already used it: remember randint?)

random() will give us one simulation from the $Uniform(0,1)$ RV:

If we want a whole simulated sample we can use a list comprehension. We will be using this technique frequently so make sure you understand what is going on. "for i in range(3)" is acting like a counter to give us 3 simulated values in the list we are making

If we do this again, we will get a different sample:

Often is it useful to be able to replicate the same random sample. For example, if we were writing some code to do some simulations using samples from a PRNG, and we "improved" the way that we were doing it, how would we want to test our improvement? If we could replicate the same samples then we could show that our new code was equivalent to our old code, just more efficient.

Remember when we were using the LCGs, and we could set the seed $x_0$? More sophisticated PRNGs like the Mersenne Twister also have a seed. By setting this seed to a specified value we can make sure that we can replicate samples.

Now we can replicate the same sample again by setting the seed to the same value:

We can compare some samples visually by plotting them:

YouTry

Try looking at the more advanced documentation and play a bit.

(end of You Try)



Question:

What can we do with samples from a $Uniform(0,1)$ RV? Why bother?

Answer:

We can use them to sample or simulate from other, more complex, random variables.

The $Bernoulli(\theta)$ Random Variable

The $Bernoulli(\theta)$ RV $X$ with PMF $f(x;\theta)$ and DF $F(x;\theta)$ parameterised by some real $\theta\in [0,1]$ is a discrete random variable with only two possible outcomes.

$f(x;\theta)= \theta^x (1-\theta)^{1-x} \mathbf{1}_{\{0,1\}}(x) = \begin{cases} \theta & \text{if} \ x=1,\\ 1-\theta & \text{if} \ x=0,\\ 0 & \text{otherwise} \end{cases}$

$F(x;\theta) = \begin{cases} 1 & \text{if} \ 1 \leq x,\\ 1-\theta & \text{if} \ 0 \leq x < 1,\\ 0 & \text{otherwise} \end{cases}$

Here are some functions for the PMF and DF for a $Bernoulli$ RV along with various useful functions for us in the sequel. Let's take a quick look at them.

We can see the effect of varying $\theta$ interactively:

Don't worry about how these plots are done: you are not expected to be able to understand all of these details now.

Just use them to see the effect of varying $\theta$.

Simulating a sample from the $Bernoulli(\theta)$ RV

We can simulate a sample from a $Bernoulli$ distribution by transforming input from a $Uniform(0,1)$ distribution using the floor() function in Sage. In maths, $\lfloor x \rfloor$, the 'floor of $x$' is the largest integer that is smaller than or equal to $x$. For example, $\lfloor 3.8 \rfloor = 3$.

Using floor, we can do inversion sampling from the $Bernoulli(\theta)$ RV using the the $Uniform(0,1)$ random variable that we said is the fundamental model.

We will introduce inversion sampling more formally later. In general, inversion sampling means using the inverse of the CDF $F$, $F^{[-1]}$, to transform input from a $Uniform(0,1)$ distribution.

To simulate from the $Bernoulli(\theta)$, we can use the following algorithm:

Input:

Output:

$x \thicksim Bernoulli(\theta)$

Steps:

To make a number of simulations, we can use list comprehensions again:

To make modular reusable code we can package up what we have done as functions.

The function bernoulliFInverse(u, theta) codes the inverse of the CDF of a Bernoulli distribution parameterised by theta. The function bernoulliSample(n, theta) uses bernoulliFInverse(...) in a list comprehension to simulate n samples from a Bernoulli distribution parameterised by theta, i.e., the distribution of our $Bernoulli(\theta)$ RV.

Note that we are using a list comprehension and the built-in SageMath random() function to make a list of pseudo-random simulations from the $Uniform(0,1)$. The length of the list is determined by the value of n. Inside the body of the function we assign this list to a variable named us (i.e., u plural). We then use another list comprehension to make our simulated sample. This list comprehension works by calling our function bernoulliFInverse(...) and passing in values for theta together with each u in us in turn.

Let's try a small number of samples:

Now lets explore the effect of interactively varying n and $\theta$:

You can vary $\theta$ and $n$ on the interactive plot. You should be able to see that as $n$ increases, the empirical plots should get closer to the theoretical $f$ and $F$.

YouTry

Check that you understand what floor is doing. We have put some extra print statements into our demonstration of floor so that you can see what is going on in each step. Try evaluating this cell several times so that you see what happens with different values of u.

In the cell below we use floor to get 1's and 0's from the pseudo-random u's given by random(). It is effectively doing exactly the same thing as the functions above that we use to simulate a specified number of $Bernoulli(\theta)$ RVs, but the why that it is written may be easier to understand. If floor is doing what we want it to, then when n is sufficiently large, we'd expect our proportion of 1s to be close to theta (remember Kolmogorov's axiomatic motivations for probability!). Try changing the value assigned to the variable theta and re-evaluting the cell to check this.

The equi-probable $de~Moivre(\theta)$ Random Variable

The $de~Moivre(\theta_1,\theta_2,\ldots,\theta_k)$ RV is the natural generalisation of the $Bernoulli (\theta)$ RV to more than two outcomes. Take a die (i.e. one of a pair of dice): there are 6 possible outcomes from tossing a die if the die is a normal six-sided one (the outcome is which face is the on the top). To start with we can allow the possibility that the different faces could be loaded so that they have different probabilities of being the face on the top if we throw the die. In this case, k=6 and the parameters $\theta_1$, $\theta_2$, ...$\theta_6$ specify how the die is loaded, and the number on the upper-most face if the die is tossed is a $de\,Moivre$ random variable parameterised by $\theta_1,\theta_2,\ldots,\theta_6$.

If $\theta_1=\theta_2=\ldots=\theta_6= \frac{1}{6}$ then we have a fair die.

Here are some functions for the equi-probable $de\, Moivre$ PMF and CDF where we code the possible outcomes as the numbers on the faces of a k-sided die, i.e, 1,2,...k.

YouTry

Try changing the value of k in the above interact.

Simulating a sample from the equi-probable $de\,Moivre(k)$ random variable

We use floor ($\lfloor \, \rfloor$) again for simulating from the equi-probable $de \, Moivre(k)$ RV, but because we are defining our outcomes as 1, 2, ... k, we just add 1 to the result.

To simulate from the equi-probable $de\,Moivre(k)$, we can use the following algorithm:

Input:

Output:

Steps:

We can illustrate this with SageMath:

A small sample:

You should understand the deMoivreFInverse and deMoivreSample functions and be able to write something like them if you were asked to.

You are not expected to be to make the interactive plots below (but this is not too hard to do by syntactic mimicry and google searches!).

Now let's do some interactive sampling where you can vary $k$ and the sample size $n$:

Try changing $n$ and/or $k$. With $k = 40$ for example, you could be simulating the number on the first ball for $n$ Lotto draws.

YouTry

A useful counterpart to the floor of a number is the ceiling, denoted $\lceil \, \rceil$. In maths, $\lceil x \rceil$, the 'ceiling of $x$' is the smallest integer that is larger than or equal to $x$. For example, $\lceil 3.8 \rceil = 4$. We can use the ceil function to do this in Sage:

Try using ceil to check that you understand what it is doing. What would ceil(0) be?

Inversion Sampler for Continuous Random Variables

When we simulated from the discrete RVs above, the $Bernoulli(\theta)$ and the equi-probable $de\,Moivre(k)$, we transformed some $u \thicksim Uniform(0,1)$ into some value for the RV.

Now we will look at the formal idea of an inversion sampler for continuous random variables. Inversion sampling for continuous random variables is a way to simulate values for a continuous random variable $X$ using $u \thicksim Uniform(0,1)$.

The idea of the inversion sampler is to treat $u \thicksim Uniform(0,1)$ as some value taken by the CDF $F$ and find the value $x$ at which $F(X \le x) = u$.

To find x where $F(X \le x) = u$ we need to use the inverse of $F$, $F^{[-1]}$. This is why it is called an inversion sampler.

Formalising this,

Proposition

Let $F(x) := \int_{- \infty}^{x} f(y) \,d y : \mathbb{R} \rightarrow [0,1]$ be a continuous DF with density $f$, and let its inverse $F^{[-1]} $ be:

$$ F^{[-1]}(u) := \inf \{ x : F(x) = u \} : [0,1] \rightarrow \mathbb{R} $$

Then, $F^{[-1]}(U)$ has the distribution function $F$, provided $U \thicksim Uniform(0,1)$ ($U$ is a $Uniform(0,1)$ RV).

Note:

The infimum of a set A of real numbers, denoted by $\inf(A)$, is the greatest lower bound of every element of $A$.

Proof

The "one-line proof" of the proposition is due to the following equalities:

$$P(F^{[-1]}(U) \leq x) = P(\inf \{ y : F(y) = U)\} \leq x ) = P(U \leq F(x)) = F(x), \quad \text{for all } x \in \mathbb{R} . $$

Algorithm for Inversion Sampler

Input:

Output:

Algorithm steps:

The $Uniform(\theta_1, \theta_2)$RV

We have already met the $Uniform(\theta_1, \theta_2)$ RV.

Given two real parameters $\theta_1,\theta_2 \in \mathbb{R}$, such that $\theta_1 < \theta_2$, the PDF of the $Uniform(\theta_1,\theta_2)$ RV $X$ is:

$$f(x;\theta_1,\theta_2) = \begin{cases} \frac{1}{\theta_2 - \theta_1} & \text{if }\theta_1 \leq x \leq \theta_2\text{,}\\ 0 & \text{otherwise} \end{cases} $$

and its DF given by $F(x;\theta_1,\theta_2) = \int_{- \infty}^x f(y; \theta_1,\theta_2) \, dy$ is:

$$ F(x; \theta_1,\theta_2) = \begin{cases} 0 & \text{if }x < \theta_1 \\ \frac{x-\theta_1}{\theta_2-\theta_1} & \text{if}~\theta_1 \leq x \leq \theta_2,\\ 1 & \text{if} x > \theta_2 \end{cases} $$

For example, here are the PDF, CDF and inverse CDF for the $Uniform(-1,1)$:

As usual, we can make some SageMath functions for the PDF and CDF:

Using these functions in an interactive plot, we can see the effect of changing the distribution parameters $\theta_1$ and $\theta_2$.

Simulating from the $Uniform(\theta_1, \theta_2)$ RV

We can simulate from the $Uniform(\theta_1,\theta_2)$ using the inversion sampler, provided that we can get an expression for $F^{[-1]}$ that can be implemented as a procedure.

We can get this by solving for $x$ in terms of $u=F(x;\theta_1,\theta_2)$:

$$ u = \frac{x-\theta_1}{\theta_2-\theta_1} \quad \iff \quad x = (\theta_2-\theta_1)u+\theta_1 \quad \iff \quad F^{[-1]}(u;\theta_1,\theta_2) = \theta_1+(\theta_2-\theta_1)u $$

Algorithm for Inversion Sampler for the $Uniform(\theta_1, \theta_2)$ RV

Input:

Output:

Algorithm steps:

We can illustrate this with SageMath by writing a function to calculate the inverse of the CDF of a uniform distribution parameterised by theta1 and theta2. Given a value between 0 and 1 for the parameter u, it returns the height of the inverse CDF at this point, i.e. the value in the range theta1 to theta2 where the CDF evaluates to u.

This function transforms a single $u$ into a single simulated value from the $Uniform(\theta_1, \theta_2)$, for example:

Then we can use this function inside another function to generate a number of samples:

The basic strategy is the same as for simulating $Bernoulli$ and $de \, Moirve$ samples: we are using a list comprehension and the built-in SAGE random() function to make a list of pseudo-random simulations from the $Uniform(0,1)$. The length of the list is determined by the value of n. Inside the body of the function we assign this list to a variable named us (i.e., u plural). We then use another list comprehension to make our simulated sample. This list comprehension works by calling our function uniformFInverse(...) and passing in values for theta1 and theta2 together with each u in us in turn.

You should be able to write simple functions like uniformFinverse and uniformSample yourself.

Try this for a small sample:

Much more fun, we can make an interactive plot which uses the uniformSample(...) function to generate and plot while you choose the parameters and number to generate (you are not expected to be able to make interactive plots like this):

We can get a better idea of the distribution of our sample using a histogram (the minimum sample size has been set to 50 here because the automatic histogram generation does not do a very good job with small samples).

The $Exponential(\lambda)$ Random Variable

For a given $\lambda$ > 0, an $Exponential(\lambda)$ Random Variable has the following PDF $f$ and DF $F$:

$$ f(x;\lambda) =\begin{cases}\lambda e^{-\lambda x} & \text{if }x \ge 0\text{,}\\ 0 & \text{otherwise}\end{cases} $$$$ F(x;\lambda) =\begin{cases}1 - e^{-\lambda x} & \text{if }x \ge 0\text{,}\\ 0 & \text{otherwise}\end{cases} $$

An exponential distribution is useful because is can often be used to model inter-arrival times or making inter-event measurements (if you are familiar with the $Poisson$ distribution, a discrete distribution, you may have also met the $Exponential$ distribution as the time between $Poisson$ events). Here are some examples of random variables which are sometimes modelled with an exponential distribution:

time between the arrival of buses at a bus-stop distance between roadkills on a stretch of highway In SageMath, the we can use exp(x) to calculate $e^x$, for example:

We can code some functions for the PDF and DF of an $Exponential$ parameterised by lambda like this $\lambda$.

Note that we cannot or should not use the name lambda for the parameter because in SageMath (and Python), the term lambda has a special meaning. Do you recall lambda expressions?

You should be able to write simple functions like exponentialPDF and exponentialCDF yourself, but you are not expected to be able to make the interactive plots.

You can see the shapes of the PDF and CDF for different values of $\lambda$ using the interactive plot below.

We are going to write some functions to help us to do inversion sampling from the $Exponential(\lambda)$ RV.

As before, we need an expression for $F^{[-1]}$ that can be implemented as a procedure.

We can get this by solving for $x$ in terms of $u=F(x;\lambda)$

YouTry later

Show that

$$ F^{[-1]}(u;\lambda) =\frac{-1}{\lambda} \ln(1-u) $$

$\ln = \log_e$ is the natural logarthim.

(end of You try)



Simulating from the $Exponential(\lambda)$ RV

Algorithm for Inversion Sampler for the $Exponential(\lambda)$ RV

Input:

Output:

Algorithm steps:

The function exponentialFInverse(u, lam) codes the inverse of the CDF of an exponential distribution parameterised by lam. Given a value between 0 and 1 for the parameter u, it returns the height of the inverse CDF of the exponential distribution at this point, i.e. the value where the CDF evaluates to u. The function exponentialSample(n, lam) uses exponentialFInverse(...) to simulate n samples from an exponential distribution parameterised by lam.

We can have a look at a small sample:

You should be able to write simple functions like exponentialFinverse and exponentialSample yourself by now.

The best way to visualise the results is to use a histogram. With this interactive plot you can explore the effect of varying lambda and n:

The Standard $Cauchy$ Random Variable

A standard $Cauchy$ Random Variable has the following PDF $f$ and DF $F$:

$$ f(x) =\frac{1}{\pi(1+x^2)}\text{,}\,\, -\infty < x < \infty $$$$ F(x) = \frac{1}{\pi}\tan^{-1}(x) + 0.5 $$

The $Cauchy$ distribution is an interesting distribution because the expectation does not exist:

$$ \int \left|x\right|\,dF(x) = \frac{2}{\pi} \int_0^{\infty} \frac{x}{1+x^2}\,dx = \left(x \tan^{-1}(x) \right]_0^{\infty} - \int_0^{\infty} \tan^{-1}(x)\, dx = \infty \ . $$

In SageMath, we can use the arctan function for $tan^{-1}$, and pi for $\pi$ and code some functions for the PDF and DF of the standard Cauchy as follows.

You can see the shapes of the PDF and CDF using the plot below. Note from the PDF $f$ above is defined for $-\infty < x < \infty$. This means we should set some arbitrary limits on the minimum and maximum values to use for the x-axis on the plots. You can change these limits interactively.

Constructing a standard $Cauchy$ RVs

You can see that we are equally likely to get positive and negative values (the density function of the standard $Cauchy$ RV is symmetrical about 0) and whenever the spin angle is close to $\frac{\pi}{4}$ ($90^{\circ}$) or $\frac{3\pi}{2}$ ($270^{\circ}$), the intersections will be a long way out up or down the y-axis, i.e. very negative or very positive values. If the light sabre is exactly parallel to the y-axis there will be no intersection: a $Cauchy$ RV $X$ can take values $-\infty < x < \infty$

Simulating from the standard $Cauchy$

We can perform inversion sampling on the $Cauchy$ RV by transforming a $Uniform(0,1)$ random variable into a $Cauchy$ random variable using the inverse CDF.

We can get this by replacing $F(x)$ by $u$ in the expression for $F(x)$:

$$ \frac{1}{\pi}tan^{-1}(x) + 0.5 = u $$

and solving for $x$:

$$ \begin{array}{lcl} \frac{1}{\pi}tan^{-1}(x) + 0.5 = u & \iff & \frac{1}{\pi} tan^{-1}(x) = u - \frac{1}{2}\\ & \iff & tan^{-1}(x) = (u - \frac{1}{2})\pi\\ & \iff & tan(tan^{-1}(x)) = tan((u - \frac{1}{2})\pi)\\ & \iff & x = tan((u - \frac{1}{2})\pi) \end{array} $$

Inversion Sampler for the standard $Cauchy$ RV

Input:

Output:

Algorithm steps:

The function cauchyFInverse(u) codes the inverse of the CDF of the standard Cauchy distribution. Given a value between 0 and 1 for the parameter u, it returns the height of the inverse CDF of the standard $Cauchy$ at this point, i.e. the value where the CDF evaluates to u. The function cauchySample(n) uses cauchyFInverse(...) to simulate n samples from a standard Cauchy distribution.

And we can visualise these simulated samples with an interactive plot:

Notice how we can get some very extreme values This is because of the 'thick tails' of the density function of the $Cauchy$ RV. Think about this in relation to the double light sabre visualisation. We can see effect of the extreme values with a histogram visualisation as well. The interactive plot below will only use values between lower and upper in the histogram. Try increasing the sample size to something like 1000 and then gradually widening the limits:

Running means

When we introduced the $Cauchy$ distribution, we noted that the expectation of the $Cauchy$ RV does not exist. This means that attempts to estimate the mean of a $Cauchy$ RV by looking at a sample mean will not be successful: as you take larger and larger samples, the effect of the extreme values will still cause the sample mean to swing around wildly (we will cover estimation properly soon). You are going to investigate the sample mean of simulated $Cauchy$ samples of steadily increasing size and show how unstable this is. A convenient way of doing this is to look at a running mean. We will start by working through the process of calculating some running means for the $Uniform(0,10)$, which do stabilise. You will then do the same thing for the $Cauchy$ and be able to see the instability.

We will be using the pylab.cumsum function, so we make sure that we have it available. We then generate a sample from the $Uniform(0,10)$

We are going to treat this sample as though it is actually 10 samples of increasing size:

We know that a sample mean is the sum of the elements in the sample divided by the number of elements in the sample $n$:

$$ \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i $$

We can get the sum of the elements in each of our 10 samples with the cumulative sum of uSample.

We use cumsum to get the cumulative sum. This will be a pylab.array (or numpy.arrat) type, so we use the list function to turn it back into a list:

What we have now is effectively a list

$$\left[\displaystyle\sum_{i=1}^1x_i, \sum_{i=1}^2x_i, \sum_{i=1}^3x_i, \ldots, \sum_{i=1}^{10}x_i\right]$$

So all we have to do is divide each element in csUSample by the number of elements that were summed to make it, and we have a list of running means

$$\left[\frac{1}{1}\displaystyle\sum_{i=1}^1x_i, \frac{1}{2}\sum_{i=1}^2x_i, \frac{1}{3}\sum_{i=1}^3x_i, \ldots, \frac{1}{10}\sum_{i=1}^{10}x_i\right]$$

We can get the running sample sizes using the range function:

And we can do the division with list comprehension:

We could pull all of this together into a function which produced a list of running means for sample sizes 1 to $n$.

Have a look at the running means of 10 incrementally-sized samples:

Recall that the expectation $E_{(\theta_1, \theta_2)}(X)$ of a $X \thicksim Uniform(\theta_1, \theta_2) = \frac{(\theta_1 +\theta_2)}{2}$

In our simulations we are using $\theta_1 = 0$, $\theta_2 = 10$, so if $X \thicksim Uniform(0,10)$, $E(X) = 5$

To show that the running means of different simulations from a $Uniform$ distribution settle down to be close to the expectation, we can plot say 5 different groups of running means for sample sizes $1, \ldots, 1000$. We will use a line plot rather than plotting individual points.

YouTry!

Your task is to now do the same thing for some standard Cauchy running means.

To start with, do not put everything into a function, just put statements into the cell(s) below to:

Make variable for the number of running means to generate; assign it a small value like 10 at this stage Use the cauchySample function to generate the sample from the standard $Cauchy$; have a look at your sample Make a named list of cumulative sums of your $Cauchy$ sample using list and cumsum, as we did above; have a look at your cumulative sums Make a named list of sample sizes, as we did above Use a list comprehension to turn the cumulative sums and sample sizes into a list of running means, as we did above Have a look at your running means; do they make sense to you given the individual sample values? Add more cells as you need them.

When you are happy that you are doing the right things, write a function, parameterised by the number of running means to do, that returns a list of running means. Try to make your own function rather than copying and changing the one we used for the $Uniform$: you will learn more by trying to do it yourself. Please call your function cauchyRunningMeans, so that (if you have done everything else right), you'll be able to use some code we will supply you with to plot the results.

Try checking your function by using it to create a small list of running means. Check that the function does not report an error and gives you the kind of list you expect.

When you think that your function is working correctly, try evaluating the cell below: this will put the plot of 5 groups of $Uniform(0,10)$ running means beside a plot of 5 groups of standard $Cauchy$ running means produced by your function.

Replicable samples

Remember that we know how to set the seed of the PRNG used by random() with set_random_seed? If we wanted our sampling functions to give repeatable samples, we could also pass the functions the seed to use. Try making a new version of uniformSample which has a parameter for a value to use as the random number generator seed. Call your new version uniformSampleSeeded to distinguish it from the original one.

Try out your new uniformSampleSeeded function: if you generate two samples using the same seed they should be exactly the same. You could try using a large sample and checking on sample statistics such as the mean, min, max, variance etc, rather than comparing small samples by eye.

Recall that you can also give parameters default values in SageMath. Using a default value means that if no value is passed to the function for that parameter, the default value is used. Here is an example with a very simple function:

Note that parameters with default values need to come after parameters without default values when we define the function.

Now you can try the function - evaluate the following cells to see what you get:

Try making yet another version of the uniform sampler which takes a value to be used as a random number generator seed, but defaults to None if no value is supplied for that parameter. None is a special Python type.

Using set_random_seed(None) will mean that the random seed is actually reset to a new ('random') value. You can see this by testing what happens when you do this twice in succession and then check what seed is being used with initial_seed:

Do another version of the uniformSampleSeeded function with a default value for the seed of None.

Check your function again by testing with both when you supply a value for the seed and when you don't.

Quizz assignment 2

First read and understand the following simple simulation (originally written by Jenny Harlow). Then you will modify the simulation to find the solution to this problem.

A Simple Simulation

We could use the samplers we have made to do a very simple simulation. Suppose the inter-arrival times, in minutes, of Orbiter buses at an Orbiter stop in Christchurch follows an $Exponential(\lambda = 0.1)$ distribution. Also suppose that this is quite a popular bus stop, and the arrival of people is very predictable: one new person will arrive in each whole minute. This means that the longer another bus takes in coming, the more people arrive to join the queue. Also suppose that the number of free seats available on any bus follows a $de\, Moivre(k=40)$ distribution, i.e, there are equally like to to be 1, or 2, or 3 ... or 40 spare seats. If there are more spare seats than people in the queue, everyone can get onto the bus and nobody is left waiting, but if there are not enough spare seats some people will be left waiting for the next bus. As they wait, more people arrive to join the queue....

This is not very realistic - we would want a better model for how many people arrive at the stop at least, and for the number of spare seats there will be on the bus. However, we are just using this as a simple example that you can do using the random variables you already know how to simulate samples from.

Try to code this example yourself, using our suggested steps. We have put our version the code into a cell below, but you will get more out of this example by trying to do it yourself first.

Suggested steps:

YouTry

Quizz assignment 2

Here is our code to do the bus stop simulation. Yours may be different - maybe it will be better!

You are expected to find the needed functions from the latest notebook this assignment came from and be able to answer this question. Unless you can do it in your head.

Quizz assignment 2

We could do an interactive visualisation of this by evaluating the next cell. This will be showing the number of people able to board the bus and the number of people left waiting at the bus stop by the height of lines on the plot.

Quizz assignment 2

Very briefly explain the effect of varying one of the three parameters:

while holding the other two parameters fixed on:

by using the dropdown menus in the @interact above. Think if the simulation makes sense and explain why. Write your answer in the Assignment 2 Quizz on Studium.

Solution for CauchyRunningMeans