{
"cells": [
{
"cell_type": "markdown",
"id": "3be54fb3",
"metadata": {
"deletable": false
},
"source": [
"# [Introduction to Data Science: A Comp-Math-Stat Approach](http://datascience-intro.github.io/1MS041-2021/) \n",
"## 1MS041, 2021 \n",
"©2021 Raazesh Sainudiin, Benny Avelin. [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# 03. Map, Function, Collection, and Probability \n",
"\n",
"1. Maps and Functions\n",
"* Collections in Sage\n",
" - List\n",
" - Set\n",
" - Tuple\n",
" - Dictionary\n",
"* Probability"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Maps and Functions\n",
"\n",
"In the last notebook we understood sets. We can now understand functions which can be thought intuitively as a \"black box\" named $f$ that takes an input $x$ and returns an output $y=f(x)$.\n",
"\n",
"
\n",
" \n",
"  | \n",
"  | \n",
"
\n",
"
\n",
"\n",
"\n",
"More formally, a function is a [relation](https://en.wikipedia.org/wiki/Binary_relation) between a set of inputs and a set of permissible outputs with the property that each input is related to exactly one output.\n",
"\n",
"The two sets for inputs and outputs are traditionally called *domain* and *range* or *codomain*, respectively. \n",
"\n",
"The map or function associates each element in the domain with exactly one element in the range. \n",
"\n",
"So, more than one distinct element in the domain can be associated with the same element in the range, and not every element in the range needs to be mapped (i.e, not everything in the range needs to have something in the domain associated with it). \n",
"\n",
"Here is a map for some family relationships: \n",
"\n",
"- $Raaz$ has two daughters named $Anu$ and $Ashu$, \n",
"- $Jenny$ has a sister called $Cathy$ and the father of $Jenny$ and $Cathy$ is $Jonathan$, and\n",
"- $Fred$ is a man who has no daughter.\n",
"\n",
"We can map these daughters to their fathers with the $FindFathersMap$ shown below: The daughters are in the domain and the fathers are in the range made up of men. Each daughter maps to a father (there is no immaculate conceptions here!). More than one daughter can map to the same father (some fathers have more daughters!). But there can be men in the range who are not fathers (this is also natural, may be they only have sons or have no children at all).\n",
"\n",
"
\n",
"\n",
"The notation for this mapping would be:\n",
"\n",
"$$FindFathersMap: Daughters \\rightarrow Men$$\n",
"\n",
"The domain is the set:\n",
"\n",
"$$Daughters = \\{Anu, Ashu, Jenny, Cathy\\}$$ \n",
"\n",
"and the range or codomain is the set:\n",
"\n",
"$$Men = \\{Raaz, Jonathan, Fred\\}.$$\n",
"\n",
"The element in the range that an element in the domain maps to is called the image of that domain element. Sometimes range is used interchangably with image, and sometimes it refers to the codomain, depending on where you read. This is a good thing to keep in mind.\n",
"For example, $Raaz$ is the image of $Anu$ in the $FindFathersMap$. The notation for this is:\n",
"\n",
"$$FindFathersMap(Anu) = Raaz \\ .$$ \n",
"\n",
"Note that what we have just written is a function, just like the more familiar format of \n",
"\n",
"$$f(x) = y \\ .$$\n",
"\n",
"In computer science lingo each element in the domain is called a key and the images in the range are called values. The map or function associates each key with a value.\n",
"\n",
"The keys for the map are unique since the domain is a set, i.e., a collection of distinct elements. Lots of keys can map to the same image value ($Jenny$ and $Cathy$ have the same father, $Anu$ and $Ashu$ have the same father), but the idea is that we can uniquely identify the value we want if we know the key. This means that we can't have multiple identical keys. This makes sense when you look at how the map works. We use the map to find values from the key, as we did when we found $Anu$'s father above. If we had more than one $Anu$ in the set of keys, we could not uniquely identify the value that maps to the key $Anu$.\n",
"\n",
"We do not allow maps (functions) where one element in the domain maps to more than one element in the range. In computer science lingo, each key can have only one value associated with it. \n",
"\n",
"
\n",
"\n",
"Formalising this, a function $f: \\mathbb{X} \\rightarrow \\mathbb{Y}$ that maps each element $x \\in \\mathbb{X}$ to eaxctly one element $f(x) \\in \\mathbb{Y}$ is equivalent to the corresponding set of ordered pairs:\n",
"$$\\left\\{(x, f(x)): x \\in \\mathbb{X}, f(x) \\in \\mathbb{Y}\\right\\} \\ .$$\n",
"\n",
"Here, $\\left(x, f(x)\\right)$ is an ordered pair (the familiar key-value pair in computer science).\n",
"\n",
"The pre-image or inverse image of a function $f: \\mathbb{X} \\rightarrow \\mathbb{Y}$ is $f^{[-1]}$.\n",
"\n",
"The inverse image takes subsets in $\\mathbb{Y}$ and returns subsets of $\\mathbb{X}$.\n",
"\n",
"The pre-image or inverse image of $y$ is $f^{[-1]}(y) = \\left\\{x \\in \\mathbb{X} : f(x) = y \\right\\} \\subset \\mathbb{X}$.\n",
"\n",
"More generally, for any $B \\subset \\mathbb{Y}$:\n",
"\n",
"$f^{[-1]}(B) = \\{x \\in \\mathbb{X}: f(x) \\in B \\}$\n",
"\n",
"For example, for our FindFathersMap,\n",
"\n",
"- $FindFathersMap^{[-1]}(Raaz) = \\{Anu, Ashu\\}$ and \n",
"- $FindFathersMap^{[-1]}(Jonathan) = \\{Jenny, Cathy\\}$.\n",
"\n",
"Now lets take a more mathematical looking function $f(x) = x^2$.\n",
"\n",
"Version 1 of this function is going to have a finite domain (only five elements). \n",
"\n",
"The domain is the set $\\{-2, -1, 0, 1, 2\\}$ and the range is the set $\\{0, 1, 2, 3, 4\\}$. \n",
"\n",
"The mapping for version 1 is \n",
"$$f(x) = x^2\\,:\\,\\{-2, -1, 0, 1, 2\\} \\rightarrow \\{0, 1, 2, 3, 4\\} \\ .$$\n",
"\n",
"
\n",
"\n",
"We can also represent this mapping as the set of ordered pairs $\\{(-2,4), (-1,1), (0,0), (1,1), (2,4)\\}$.\n",
"\n",
"Note that the values 2 and 3 in the range have no pre-image in the domain. This is okay because not every element in the range needs to be mapped.\n",
"\n",
"Having a domain with only five elements in it is a bit restrictive: how about an infinite domain?\n",
"\n",
"Version 2 \n",
"$$f(x) = x^2\\,:\\,\\{\\ldots, -2, -1, 0, 1, 2, \\ldots\\} \\rightarrow \\{0, 1, 2, 3, 4, \\ldots\\}$$\n",
"\n",
"As ordered pairs, we now have $\\{\\ldots, (-2,4), (-1,1), (0,0), (1,1), (2,4), \\ldots\\}$, but it is impossible to write them all down since there are infinitely many elements in the domain.\n",
"\n",
"What if we wanted to use the function for the whole of $\\mathbb{R}$, the real line?\n",
"\n",
"Version 3 \n",
"$$f(x) = x^2 : \\mathbb{R} \\rightarrow \\mathbb{R} \\ . $$\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def showURL(url, ht=500):\n",
" \"\"\"Return an IFrame of the url to show in notebook with height ht\"\"\"\n",
" from IPython.display import IFrame\n",
" return IFrame(url, width='95%', height=ht) \n",
"showURL('https://en.wikipedia.org/wiki/Function_(mathematics)#Introduction_and_examples',400)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Mathematical notation for functions\n",
"There are many notations for functions and it is a good idea to familiarize ourselves with some of them via examples. Scroll down the url function notation to become familiar with the following notations:\n",
"\n",
"- $f: \\mathbb{X} \\to \\mathbb{Y}$ or $\\mathbb{X} \\overset{f}{\\to} \\mathbb{Y}$ means $f$ is a function from $\\mathbb{X}$ to $\\mathbb{Y}$ \n",
"- and $y=f(x)$ means $y \\in \\mathbb{Y}$ is equal to $f(x)$, the image of the function evaluated at $x \\in \\mathbb{X}$\n",
"\n",
"Another convenient notation for a function is *maps to* denoted by $\\mapsto$. For example, we can know the domain and range of a function and how it *maps* an argument *to* its image as follows:\n",
"\n",
"- $f: \\mathbb{N} \\to \\mathbb{Z}$ can be read as \n",
" - \"$f$ is a function from $\\mathbb{N}$ (the set of natural numbers) to $\\mathbb{Z}$ (the set of integers)\" or\n",
" - \"$f$ is a $\\mathbb{Z}$-valued function of an $\\mathbb{N}$-valued variable\"\n",
"- $x \\mapsto 4-x$ can be read as \"$x$ maps to $4-x$\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"showURL(\"https://en.wikipedia.org/wiki/Function_(mathematics)#Notation\",300)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"In a computer, maps or functions are encoded as (i) sets of ordered pairs or as (ii) procedures.\n",
"\n",
"### Example 1: Encoding a function or map as a set of ordered pairs\n",
"In SageMath we can encode a map as a set of ordered pairs as follows."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'Jenny': 'Jonathan', 'Cathy': 'Jonathan', 'Anu': 'Raaz', 'Ashu': 'Raaz'}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"findFathersMap = {'Jenny': 'Jonathan', 'Cathy': 'Jonathan', 'Anu': 'Raaz', 'Ashu': 'Raaz'}\n",
"findFathersMap"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(findFathersMap) # find the type of findFathersMap"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"`findFathersMap` is a Python/SageMath built-in datatype called `dict` which is short for `dictionary`. We will see dictionaries in more detail soon (or if you are imaptient see [dict Python docs](https://docs.python.org/2/library/stdtypes.html#mapping-types-dict)). For now, we are taking a mathematical view and learning to implement functions or maps as a set of ordered pairs. "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Given a key (child) we can use the map to find the father."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'Raaz'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"findFathersMap['Anu']"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The father of Anu is Raaz\n"
]
}
],
"source": [
"print(\"The father of Anu is\", findFathersMap['Anu']) # we can use a print statement to make it flow"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"We could also encode our *inverse map* of the `findFathersMap` as `findDaughtersMap`. We map from one father key (e.g. Raaz) to multiple values for daughters (for Raaz, it is Anu and Ashu)."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Jonathan has daughters:\n",
"\t Jenny\n",
"\t Cathy\n",
"Raaz has daughters:\n",
"\t Anu\n",
"\t Ashu\n"
]
}
],
"source": [
"findDaughtersMap = {'Jonathan': ['Jenny', 'Cathy'], 'Raaz': ['Anu', 'Ashu']}\n",
"\n",
"# don't worry about the way we have done the printing below, just check the output makes sense to you!\n",
"for fatherKey in findDaughtersMap:\n",
" print(fatherKey, \"has daughters:\")\n",
" for child in findDaughtersMap[fatherKey]:\n",
" print('\\t',child)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"We can also use a map for a numerical function \n",
"$$y(x) = x^2 + 2: \\{-2 ,-1 ,0, 1 ,2\\} \\to \\{2, 3, 4, 5, 6\\}$$ \n",
"by encoding this function as a set of ordered pairs \n",
"$$\\{ (-2,6), (-1,3), (0,2), (1,3), (2,6)\\}$$ \n",
"in SageMath as follows."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{-2: 6, -1: 3, 0: 2, 1: 3, 2: 6}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myFunctionMap = {-2: 6, -1: 3, 0: 2, 1: 3, 2: 6}\n",
"myFunctionMap"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Looking up the value or image of the keys in our domain $\\{-2,-1,0,1,2\\}$ works. Note the `KeyError` message when the key is outside the domain."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myFunctionMap[-2] # this return the value or image of the argument or key -2"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "KeyError",
"evalue": "-20",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmyFunctionMap\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mInteger\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m20\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;31m# KeyError: -20 since -20 is not in {-2,-1,0,1,2}\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m: -20"
]
}
],
"source": [
"myFunctionMap[-20] # the error message is clear, isn't it? \"...KeyError: -20 since -20 is not in {-2,-1,0,1,2}\""
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Example 2: Encoding function as a procedure\n",
"\n",
"But it is clearly not a good way to try to specify a function mapping that can deal with lots of different $x$'s: in fact it would be impossible to try to specify a mapping like this if we want the domain of the function to have infinitely many elements (for example, if the domain is the set of all integers, or all rational numbers, or a segment of the real line, for example $f(x) = x^2 + 2 : \\mathbb{R} \\rightarrow \\mathbb{R}$).\n",
"\n",
"Instead, in SageMath we can just define our own *function as a procedure* that can evaluate our image value \"on-the-fly\" for any $x$ we want. We'll be doing more on functions in later labs so for the moment just think about how defining the function can be seen as a much more flexible way of specifying a mapping from the arguments in the domain of the function to values or images in its range.\n",
"\n",
"#### The basics steps in encoding a function as a procedure in SageMath/Python:\n",
"\n",
"1. The function named `myFunc` we want to define for $x \\mapsto x^2+2$ is preceeded by the keyword `def` for definition.\n",
"- The function name `myFunc` is succeeded by any input argument(s) to it within a pair of parentheses, for e.g., `(x)` in this case since there is only one argument, namely `x`.\n",
"- After we have defined the function by its name `myFunc` followed by its input argument `(x)` we end the line with a colon `:` before continuing to the next line.\n",
"- Now, we are ready to write the body of the function. It is a customary to leave 4 white spaces. The number of spaces before a line or the indentation is used to delineate a block of code in SageMath/Python.\n",
"- It is a matter of courteous programming practice to enclose the Docstring, i.e., comments on what the function does, inside triple quotes. The Docstring is returned when we ask SageMath for help on the function.\n",
"- Finally, we output the image of our function with the keyword `return` and the expression `x^2+2`."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def myFunc(x): # starting the definition\n",
" '''A function to return x^2 + 2''' # the docstring\n",
" return x^2+2 # the function body (only one line in this example"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"myFunc?"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"When you evaluate the above cell you should see something like this, but with a different File path:\n",
"```\n",
"Signature: myFunc(x)\n",
"Docstring: A function to return x^2 + 2\n",
"Init docstring: Initialize self. See help(type(self)) for accurate signature.\n",
"File: ~/datascience-intro/raaz/1MS041/master/jp/\n",
"Type: function\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"2.01000000000000"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myFunc(0.1) # use the function to calculate 0.1^2 + 2 with argument as a mpfr_real Literal"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### YouTry\n",
"\n",
"When you evaluate the cell below, SageMath will complain about an `IndentationError` that you can easily fix."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def myFunc(x):\n",
" '''A function to return x^2 + 2'''\n",
" return x^2+2"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"The command to plot is pretty simple. The four arguments to plot are the function to be plotted, the input argument that is varying along the x-axis, the lower-bound and upper-bound of the input argument."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"plot(myFunc(x),x, -20, 20) "
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"?plot # for help hit the DocString of the function"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"We can get a bit more control of the figure size (and other aspects) by first assigning the plot to a variable and using the show method as follows."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"myPlot = plot(myFunc(x),x, -20, 20)\n",
"myPlot.show(figsize=[6,3])"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"type(myPlot)"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"myPlot.show?"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"The simple plot command hides what is going on under the hood. Before we understand the fundamentals of plotting, let us get a better appreciation for the ordered pairs $(x,f(x))$ that make up the curve in this plot.\n",
"\n",
"We can use SageMath to plot functions and add a way for you to interact with the plot. When you have evaluated this cell you'll see a plot of our function between $x=-20$ and $x=20$. The point on the curve where $x=3$ is indicated in red. You can alter the position of this point by putting a new value into the box at the top of the display (you can put in real number, eg 4.45, but if your number is outside the range -20 to 20 it won't be shown. Just try changing the value of x to plot as a point on the curve and don't worry about the way the code looks - you aren't expected to do this yet."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "48435989bc5f4461bdc9490a3cdd3c10",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Interactive function with 1 widget\n",
" my_x: EvalText(value='3', description='$x$…"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Don't worry about this code and just interact with its output cell by changing the value in the box labelled x\n",
"@interact\n",
"def _(my_x=input_box(3, width=10, label=\"$x$\")):\n",
" myPt = (my_x, myFunc(my_x))\n",
" myLabel = \"(%.1f, %.1f)\" % myPt\n",
" p = plot(myFunc, (x,-20,20))\n",
" if (my_x >= -20 and my_x <= 20):\n",
" p += point(myPt,rgbcolor='red', pointsize=20)\n",
" p += text(myLabel, (my_x+4,myFunc(my_x)), rgbcolor='red')\n",
" p.show(figsize=[6,3])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"#### A bit under plot's hood\n",
"\n",
"The function you are viewing above from the plot command is actually just interpolated by a whole lot of points connected by lines!\n",
"\n",
"To keep things more real let's expose the plot for what it really is next by only plotting a few points."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# plot 5 randomized points and leave them hanging\n",
"plot(myFunc,(-2,2), plot_points=5,marker='.',linestyle='', randomize=True, adaptive_recursion=0, figsize=[6,3])"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# plot 5 randomized points and join them\n",
"plot(myFunc,(-2,2), plot_points=5,marker='.',linestyle='-', randomize=True, adaptive_recursion=0, figsize=[6,3])"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# plot 100 randomized points and join them - now it looks continuous, doesn't it?\n",
"plot(myFunc,(-2,2), plot_points=100, marker='.',linestyle='-', randomize=True, adaptive_recursion=0, figsize=[6,3])"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# plot 100 randomized points and join them without marker for points - now it looks like default plot, doesn't it?\n",
"plot(myFunc,(-2,2), plot_points=100, marker='',linestyle='-', randomize=True, adaptive_recursion=0, figsize=[6,3])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"We will play with point, lines and other such objects in the sequel. For now, just remember that what you see is sometimes not what is under the hood."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### You try\n",
"\n",
"Define a more complicated function with four input arguments next. The rules for defining such a function are as before with the additional caveat of declaring all four input arguments inside the pair of parenthesis following the name of the function. In the cell below you will have to uncomment one line and then evaluate the cell to define the function (remember that comments begin with the `#` character)."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Here is a quadratic function of x with three additional parameters a,b,c\n",
"def myQuadFunc(a,b,c,x):\n",
" '''A function to return a*x^2 + b*x + c'''\n",
" return a*x^2 + b*x + c"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Now try writing an expression to find out what `myQuadFunc` is for some values of `x` and coefficients `a=1`, `b=0`, `c=2`. \n",
"\n",
"We have put the expression that uses these coefficients and `x=10` into the cell below for you. \n",
"\n",
"Can you see how Sage interprets the expression using the order in which we specified `a, b, c, x` in the definition above? \n",
"\n",
"Try changing the expression to evaluate the function with same coefficients `(a, b, c)` but different values of `x`. "
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"myQuadFunc(1, 0, 2, 10) # a = 1, b = 0, c = 2, x = 10"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# we can make the same plot as before by letting a=1, b=0, c=2\n",
"plot(myQuadFunc(1,0,2,x),x, -20, 20, figsize=[6,3])"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Example 3: Polymorphism and Type Errors\n",
"You can call the same function `myFunc` with different `types` and the operations in the body of the function, such a s power (`^`) and addition (`+`), will be automatically evaluated for the input type. This can be quite convenient! \n",
"\n",
"When code is written without mention of any specific type and thus can be used transparently with any number of new types we are experiencing the concept of polymorphism (parametric plymorphism or generic programming)."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"2.01"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myFunc(float(0.1)) # use the function to calculate 0.1^2 + 2 with argument as a Python float"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"201/100"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myFunc(1/10) # use the function to calculate 0.1^2 + 2 with argument as a Sage Rational"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myFunc(2) # use the function to calculate 0.1^2 + 2 with argument as a Sage Integer"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myFunc(int(2)) # use the function to calculate 0.1^2 + 2 with argument as a Sage Rational"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unsupported operand type(s) for ** or pow(): 'str' and 'int'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmyFunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'hello'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# calling myFunc on an input string argument results in a TypeError\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m\u001b[0m in \u001b[0;36mmyFunc\u001b[0;34m(x)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mmyFunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;31m# starting the definition\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m'''A function to return x^2 + 2'''\u001b[0m \u001b[0;31m# the docstring\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0mInteger\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m+\u001b[0m\u001b[0mInteger\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# the function body (only one line in this example\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m/ext/sage/sage-9.1/local/lib/python3.7/site-packages/sage/rings/integer.pyx\u001b[0m in \u001b[0;36msage.rings.integer.Integer.__pow__ (build/cythonized/sage/rings/integer.c:15017)\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2224\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mcoercion_model\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbin_op\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mleft\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mright\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moperator\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpow\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2225\u001b[0m \u001b[0;31m# left is a non-Element: do the powering with a Python int\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2226\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mleft\u001b[0m \u001b[0;34m**\u001b[0m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mright\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2227\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2228\u001b[0m \u001b[0mcpdef\u001b[0m \u001b[0m_pow_\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mother\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for ** or pow(): 'str' and 'int'"
]
}
],
"source": [
"myFunc('hello') # calling myFunc on an input string argument results in a TypeError"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"We should see something like the following, **please become comfortable with reading error messages!** there is really no way out here :\n",
"```\n",
"/ext/sage/sage-9.1/local/lib/python3.7/site-packages/sage/rings/integer.pyx in sage.rings.integer.Integer.__pow__ (build/cythonized/sage/rings/integer.c:15017)()\n",
" 2224 return coercion_model.bin_op(left, right, operator.pow)\n",
" 2225 # left is a non-Element: do the powering with a Python int\n",
"-> 2226 return left ** int(right)\n",
" 2227 \n",
" 2228 cpdef _pow_(self, other):\n",
"TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'\n",
"\n",
"```\n",
"This is because in the body of `myFunc(x)` we have `x^2+2` where `x^2` is added to `2` using the `+` operator, where `2` is a SageMath type `Integer`. And when we pass in the string `hello` for `x` by evaluating `myFunc('hello')` we are running into the mentioned `TypeError` of unsupported operand types for `^` which is just the same as `**`.\n",
"\n",
"Interestingly, integer multiples of a string using the `*` operator is well defined for string replications as illustrated for `hello` below!"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'hellohello'"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"'hello'*2 # or 'hello'*2 = 'hellohello', i.e., 'hello' concatenated with itself two times\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'hihihi'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"'hi'*3 # or 'hi'*3 = 'hihihi', i.e., 'hi' concatenated with itself three times"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Also, addition of two string is also defined as concatenation:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'hellohey'"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"'hello'+'hey'"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Just to illustrate this, we next write a function named `myFunc2` that returns `x*2+x` and will work for any input argument `x` for which the operations of `*2` and `+` are well-defined for their operand types."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def myFunc2(x):\n",
" '''square x and add x to it'''\n",
" return x*2+x"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.300000000000000"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myFunc2(0.1)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'hihihi'"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myFunc2('hi') # 'hi'*2 + 'hi' = 'hihi'+'hi' = 'hihihi'"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"**Remark:** such parametric polymorphism can be convenient but it can also have unintended consequences when you run the program. So be cautious! Some say this is bug and others say this isa feature, we note that this is reality in Pyhton/SageMath and get back to work."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Example 4: Mathematical functions\n",
"\n",
"Many familiar mathematical functions such as $\\sin$, $\\cos$, $\\log$, $\\exp$ are also available directly in SageMath as *built-in* functions. They can be evaluated using parenthesis (brackets) as follows:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cos(0)\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sin(0)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sin(pi)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"sin(pi) prints as 0\n"
]
}
],
"source": [
"print('sin(pi) prints as', sin(pi))"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cos(1/2) prints as cos(1/2)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"cos(1/2).n(digits=5) prints as 0.87758\n",
"float(cos(1/2)) prints as 0.8775825618903728\n",
"cos(0.5) prints as 0.877582561890373\n",
"exp(2*pi*e) prints as e^(2*pi*e)\n",
"log(10) prints as log(10)\n",
"log(10).n(digits=5) prints as 2.3026\n",
"float(log(10)) prints as 2.302585092994046\n",
"log(10.0) prints as 2.30258509299405\n"
]
}
],
"source": [
"# understand the output of each line and the symbolic/numeric expressions being evaluated and printed\n",
"print ('cos(1/2) prints as', cos(1/2))\n",
"print ('cos(1/2).n(digits=5) prints as', cos(1/2).n(digits=5))\n",
"print ('float(cos(1/2)) prints as', float(cos(1/2)))\n",
"print ('cos(0.5) prints as', cos(0.5))\n",
"\n",
"print ('exp(2*pi*e) prints as', exp(2*pi*e))\n",
"\n",
"print ('log(10) prints as', log(10))\n",
"print ('log(10).n(digits=5) prints as', log(10).n(digits=5))\n",
"print ('float(log(10)) prints as', float(log(10)))\n",
"print ('log(10.0) prints as', log(10.0))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"#### You try\n",
"\n",
"You can find out what built-in functions are available for a particular variable by typing the variable name followed by a `.` and then pressing the TAB key."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"x=-2"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"x. # place the cursor after the . and press Tab"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Try the `abs` function that evaluates the absolute value of `x` that is one of the methods available for `x`.\n",
"\n",
"Here are two ways of calling `abs` method for `x`."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"x.abs() # "
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"abs(x)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Remember to ask for help! \n",
"\n",
"If you want to know what a built-in function does, type the function name prepended by a '?'."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"?abs"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Collections in Sage\n",
"\n",
"We have already talked about the SageMath and Python number types and a little about the string type. You have also met sets in SageMath with a brief mention of lists. A set and list in Sage are, loosely speaking, examples of collection types. Collections are a useful idea: grouping or collecting together some data (or variables) so that we can refer to it and use it collectively.\n",
"\n",
"SageMath provides quite a few collections. One that we will meet very often is the list, [a built-in sequence type in Python](https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange) like the strings we have already seen. See [Python tutorial for built-in data structures](https://docs.python.org/2/tutorial/datastructures.html).\n",
"\n",
"### Example 1: Lists\n",
"\n",
"Technically, a *list* in SageMath/Python is a *sequence type*. This basically means that the order of the things in the list is imporant and we can use concepts related to ordering when we work with a list. More specifically, a list is a *mutable sequence type*. This just means that you can change or mutate the contents of the list.\n",
"\n",
"Don't worry about these details (unless you are interested in them of course and follow links below), but for now just look at the worksheet cells below to see how useful and flexible lists are.\n",
"\n",
"When you want a list in SageMath you put the things in the list inside `[` `]` called square brackets. You can put almost anything in a list (including having lists of lists, as we'll see later)."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"L1 = [] # an empty list\n",
"L1 # display L1\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['orange', 'apple', 'lemon']"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"L2 = ['orange', 'apple', 'lemon'] # a list of strings - remember that strings are within quote marks\n",
"L2 # display L2"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"['orange', 'apple', 'lemon', 'banana']"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"L2.append('banana') # append something to the end of a list\n",
"L2 # display L2"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"L3 = [10, 11, 12 ,13] # a list of integers"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(L3) # type of L3 is"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"There are various functions and methods we can use with lists. For a more exhaustive dive see [Python standard library docs](https://docs.python.org/2/library/index.html) specifically for the methods on lists that are:\n",
"\n",
"- [common to all sequence types](https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange) and \n",
"- [common to mutable sequence types](https://docs.python.org/2/library/stdtypes.html#mutable-sequence-types).\n",
"\n",
"In the interest of time, we are going to only familiarize ourselves with the immediately useful methods and learn new ones as we need them.\n",
"\n",
"A very useful one is `len`, which gives us the length of the list, i.e., the number of elements in the list."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(L3)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"What about getting at the elements in a list once we have put them in? This is done by indexing into the list, or list indexing . Slightly confusingly, once you have a list, you index into it by using the `[ ]` brackets again."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"L3[0] # the first position in the list is"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Note that in SageMath the first position in the list is at position 0, or index [0] not at index [1]. In the list `L3`, which has 4 elements (`len(L3) = 4`), the indices are `[0]`, `[1]`, `[2]` and `[3]`. In the cell below you can check what you get when you ask for the element at the fourth position in the list (i.e., index `[3]` since the indexes start from `[0]`)."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"13"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"L3[3]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"You will get an error message if the index you use is out of range (which means that you are trying to refer to something outside the range of the list). SageMath/Python \"knows\" that the list only has 4 elements, so asking for the element in the fifth position (index `[4]`) makes no sense and you get an `IndexError: list out of range`."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "IndexError",
"evalue": "list index out of range",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mL3\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mInteger\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mIndexError\u001b[0m: list index out of range"
]
}
],
"source": [
"L3[4]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"We can also get at more than one element in the list with the indexing operator `[ ]`, by using `:` to indicate all elements from a position to another position. This *slicing* of a list is hard to explain in words but easy to see in action."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[10, 11]"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"L3[0:2] # elements in positions 0 to 2 in list L3 are"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"If you leave out the starting and ending positions and just use `[:]` you'll get the whole list. This is useful for making copies of whole lists."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[10, 11, 12, 13]"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"L4 = L3[:]\n",
"L4 # disclose L4"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"SageMath also provides some helpful ways to make lists quickly. The one you'll use most often is range. Used in its most simple form, `range(n)` gives you a list of `n` integers from `0` to `n-1`."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"range(0, 10)"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"L5 = range(10) # a quick way to make a list of 10 numbers starting from 0\n",
"L5"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Note that the numbers you get start from `0` and the last one is `9`.\n",
"\n",
"You'll see that we can get even cleverer and use range to get a list that starts and stops at specified numbers with a specified step size between the numbers. Let's try this to get numbers in steps of `5` from `100` to `145`, in the cell below."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"range(100, 150, 5)"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"L6 = range(100, 150, 5) # get a list with a specified start, stop and step \n",
"L6"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Notice that again we don't go right up to the \"stop\" number (`200`) but to the last one below it taking into account our step size (`5`), ie `145`.\n",
"\n",
"When we just asked for `range(10)` SageMath assumed a default start of `0` and a default step of `1`. Thus, `range(10)` is equivalent to `range(0, 10, 1)`.\n",
"\n",
"### YouTry\n",
"\n",
"#### Start of this YouTry\n",
"\n",
"Find out more about list and range by evaluating the cells below and looking at the Python docs (see links above) or just via DocStrings."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"?range"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"help(list.append) # or use help for brief DocStrings"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Make yourself a list with some elements in it; you can choose how many elements to have and what type they are. \n",
"Assign the list to a variable named `myList`."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"myList = [ REPLACE_THIS_WITH_ELEMENTS_OF_LIST ]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Add a new element to `myList` using the `.append` method."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Use the nice way of copying we showed you above (remember, `[:]`) to copy everything in `myList` to a new list called `myNewList`. Use the `len` function to check the length of your new list."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"myNewList = myList[ REPLACE_THIS_WITH_THE_RIGHT_CHARACTER ]"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"len(myNewList) "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Use the indexing operator `[ ]` to find out what the first element in `myList` is (remember that the index of the first element will be `0`)."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"myList[ FILL_IN_HERE ]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Use the indexing opertor `[ ]` to change the first element in `myNewList` to some different value."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"myNewList[0] = FILL_IN_HERE "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Disclose the original list, `myList`, to check that nothing in that has changed."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print (myList)\n",
"print (myNewList)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Use range to make a list of the integer numbers between `4` and `16`, going up in steps of `4`. Assign this list to a variable named `rangeList`."
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"rangeList = range( FILL_IN_HERE )"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Disclose your list `rangeList` to check the contents. You should have the values `4`, `8`, `12`, `16`. If you don't, check what you asked for in the cell above and fix it up to give the values that you wanted. "
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"rangeList"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"#### END of YouTry"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Example 2: Tuples\n",
"\n",
"A `tuple` is another [built-in sequence type in Python](https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange) like lists and strings. The values in a `tuple` are enclosed in curved parentheses `(` `)` and the values are separated by commas."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(1, 2)"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myTuple1 = (1, 2) # assign the tuple (1,2) to variable nemed muTuple1\n",
"myTuple1"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(myTuple1)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(10, 11, 13)"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myTuple2 = (10, 11, 13)\n",
"myTuple2"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Tuples are *immutable*. In programming, an 'immutable' object is an object whose state cannot be modified after it has been created (Etymology: 'mutable comes from the Latin verb mutare, or 'to change' -- the same root we get 'mutate' from. So in-mutable, or immutable, means not capable of or susceptible to change). This means that although we can access the element at a particular position in a tuple by indexing ..."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myTuple1[0] # disclose what is in the first position in the tuple"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"... we can't change what is in that particular position in the tuple."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "TypeError",
"evalue": "'tuple' object does not support item assignment",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmyTuple1\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mInteger\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mInteger\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# try to assign a different value to the first position in the tuple\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: 'tuple' object does not support item assignment"
]
}
],
"source": [
"myTuple1[0] = 10 # try to assign a different value to the first position in the tuple"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myTuple1[0] # the first element in the tuple is immutably 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"#### Useful things you can do with tuples\n",
"\n",
"Sage has a very useful `zip` function. Zip can be used to 'zip' sequences together, making a list of tuples out of the values at corresponding index positions in each list. Consider a simple example: Note that in general `zip` works with sequences, so it can be used to zip tuples as well as lists."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"zip(['x', 'y', 'z'], [1, 2, 3])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Example 3: Sets\n",
"\n",
"We already created and assigned sets and operated with them. `Set`/`set` are another SageMath/Python collection. Remember that in a set, each element has to be unique. We can specify a set directly or make one out of a list."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{100, 105, 110, 115, 120, 125, 130, 135, 140, 145}"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"L6 = range(100, 150, 5) # make sure we have a list L6\n",
"S = set(L6) # make the set S from the list L6\n",
"S # display the set s"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Sets are *unordered collections*. This makes sense when we think about what we know about sets: what matters about a set is the unique elements in it. \n",
"\n",
"The set $\\{1, 2, 3\\}$ is the same as the set $\\{1, 3, 2\\}$ is the same as the set $\\{2, 3, 1\\}$, etc. \n",
"\n",
"This means that it makes no sense to ask SageMath what's at some particular position in a set as we could with lists. Lists are sequnces and order matters. Sets are unordered - order makes no sense for a set. \n",
"\n",
"We cannot use the indexing operator `[ ]` with a set."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [
{
"ename": "TypeError",
"evalue": "'set' object is not subscriptable",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mS\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mInteger\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;31m# will give an error message 'TypeError: 'set' object does not support indexing'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: 'set' object is not subscriptable"
]
}
],
"source": [
"S[0] # will give an error message 'TypeError: 'set' object does not support indexing'"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"S. # put the cursor after the . and hit Tab to see all the methods available on Python set"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"SS = Set(L6) # this is a SageMath set with upper-case Set"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"SS. # put the cursor after the . and hit Tab to see all the extra methods available on SageMath Set"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Example 4: Dictionaries\n",
"\n",
"When we created our maps or functions at the start of this worksheet, we actually used dictionaries: A SageMath/Python dictionary gives you a way of mapping from a key to a value. As we said earlier, the keys have to be unique (only one of each key) but more than one key can map to the same value. Remember the `FindFathersMap`? It is actually a [Python dictionary or simply `dict`](https://docs.python.org/2/library/stdtypes.html#dict).\n",
"\n",
"Although, we used the syntax for dictionaries to conceptually reinforce functions and maps, we revisit them here and contrast them with the other collections we have already seen like lists and sets."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'Jenny': 'Jonathan', 'Cathy': 'Jonathan', 'Anu': 'Raaz', 'Ashu': 'Raaz'}"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"findFathersMap = {'Jenny': 'Jonathan', 'Cathy': 'Jonathan', 'Anu': 'Raaz', 'Ashu': 'Raaz'}\n",
"findFathersMap\n"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(findFathersMap)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"When we make a dictionary, we tell SageMath that it is a dictionary by using the curly brackets `{ }` and by giving each key value pair in the format `key: value`.\n",
"\n",
"A dictionary has something like the indexing operator we used for lists, but instead of specifying the position we want (like `[0]`) we specify the key we want, and SageMath returns the value that key maps to."
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'Raaz'"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"findFathersMap['Anu'] # who is Anu's father"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"#### Manipulating a dict\n",
"\n",
"In the cell below we have the start of a simple dictionary for phone numbers. "
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'Ben': 8888, 'Raaz': 3333}"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myPhoneDict = {'Ben': 8888, 'Raaz': 3333}\n",
"myPhoneDict # disclose the contents of the dictionary"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"In the cell below let us add `susy` with phone number `78987` to our dictionary. "
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"myPhoneDict['susy']=78987"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'Ben': 8888, 'Raaz': 3333, 'susy': 78987}"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myPhoneDict # disclose the current contents of our dict"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"`zip`-ping of tuples gives us a quick way to make a dictionary if we have separate lists or tuples which contain our keys and values. Note that the ordering in the key and value sequences has to be consistent -- the first key will be mapped to the first value, etc., etc."
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'Ben': 888, 'Raaz': 333, 'susy': 78987}"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myKeys = ('Ben', 'Raaz', 'susy')\n",
"myValues = (888, 333, 78987)\n",
"myPhoneDictByZip = dict(zip(myKeys, myValues))\n",
"myPhoneDictByZip"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"#### You try\n",
"Try adding to what we have to put in two more people, `Fred`, whose phone number is `1234`, and `Mary` whose phone number is `7777`. \n",
"Remember that for SageMath, the names `Fred` and `Mary` are strings and you must put them in quote marks, like the names that are already there."
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"myPhoneDictByZip['Fred']=1234\n",
"myPhoneDictByZip['Mary']=7777"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'Ben': 888, 'Raaz': 333, 'susy': 78987, 'Fred': 1234, 'Mary': 7777}"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myPhoneDictByZip"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Now try asking SageMath for Ben's phone number (ie, the phone number value associated with the key `Ben`)."
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"888"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myPhoneDictByZip['Ben']"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"There are also some useful methods of dictionaries that allow you to 'dissect' the dictionary and extract just the keys or just the values. "
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['Ben', 'Raaz', 'susy', 'Fred', 'Mary'])"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myPhoneDictByZip.keys()\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"myPhoneDictByZip.values()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"And there is also an `.items()` method that gives you back your your (key, value) pairs again. You will see that is it a list of tuples."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"dict_items([('Ben', 888), ('Raaz', 333), ('susy', 78987), ('Fred', 1234), ('Mary', 7777)])"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myPhoneDictByZip.items()"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"## Probability\n",
"\n",
"**(This is a partial type-setting of the notes scribed on the black board.)** There are minor difference in notation but it should be obvious from the context. For example probability is \\\\(P\\\\) instead of \\\\(\\mathbb{P}\\\\).\n",
"\n",
"The origins of probability can be traced back to the 17th century. It arose out of the study of gambling and games of chance. Many well-known names associated with probability worked on problems to do with gambling: people like Bernoulli and Pascal did quite a lot of work in this area ... even Newton was persuaded to set down some of his thoughts about a game involving dice (in a letter to a Samuel Pepys). \n",
"\n",
"Dive into the main wikipedia article [here](https://en.wikipedia.org/wiki/Probability) for more details. Here, we will take the shortest mathemtical path to understanding probability.\n",
"\n",
"Probability has a language of its own. We are going to introduce you to some of the essential terms:\n",
"\n",
"An **experiment** is an activity or procedure that produces distinct or well-defined outcomes. The *set* of such outcomes is called the **sample space** of the experiment. The sample space is usually denoted with the symbol $\\Omega$. Lets look at some examples of experiments.\n",
"\n",
"### Roll a dice experiment\n",
"\n",
"If our experiment is to roll a dice wth faces painted $red$, $green$, $yellow$, $pink$, $blue$ and $black$ and find what colour the top face is at the end of each roll, then the sample space $\\Omega = \\{red, green, yellow, pink, blue, black\\}$.\n",
"\n",
"### Flip a coin experiment\n",
"\n",
"If our experiment is to flip a coin where the two faces of the coin can be identified as 'heads' ($H$) and 'tails' ($T$), and we are interested in what face lands uppermost, then the sample space is $\\Omega = \\{H, T\\}$.\n",
"\n",
"### Draw a fruit from a fruit bowl\n",
"\n",
"Suppose we have a well-mixed fruit bowl that contains:\n",
"\n",
"- 2 oranges\n",
"- 3 apples\n",
"- 1 lemon\n",
"\n",
"If our experiment is to take a single fruit from the bowl and the outcome is the type of fruit we take then what is the sample space for this experiment? \n",
"\n",
"Recall that the sample space is the set of all possible outcomes of the experiment. If we take a single fruit we could get only one of the three fruits in each draw: an orange, or an apple or a lemon. The sample space $\\Omega = \\{orange, apple, lemon\\}$.\n",
"\n",
"\n",
"\n",
"An **event** is a *subset* of the sample space. For example, we could take the event $\\{orange, lemon\\} \\subset \\Omega$ in the fruit bowl experiment. \n",
"\n",
"**Probability** maps a set of events to a set of numbers in a certain axiomatic manner. Abstractly, probability is a function that assigns numbers in the range 0 to 1 to events\n",
"\n",
"$$P : \\text{set of events } \\rightarrow [0,1]$$\n",
"\n",
"which satisfies the following axioms:\n",
"\n",
"1. For any event $A$, we have $0 \\leq P(A) \\leq 1$.\n",
"- If $\\Omega$ is the sample space, $P(\\Omega) = 1$.\n",
"- If $A$ and $B$ are disjoint (i.e., $A \\cap B = \\emptyset$), then $P(A \\cup B) = P(A) + P(B)$.\n",
"- If $A_1, A_2, \\ldots$ is an infinite sequence of pair-wise disjoint events (i.e., $A_i \\cap A_j = \\emptyset$ when $i \\ne j$), then\n",
"$$\n",
"\\begin{array}{lcl}\n",
"\\underbrace{P\\left(\\bigcup_{i=1}^{\\infty}A_i\\right)} &=& \\underbrace{\\sum_{i=1}^{\\infty}P\\left(A_i\\right)} \\\\\n",
"A_1 \\cup A_2 \\cup A_3 \\dots &=& P(A_1) + P(A_2) + P(A_3) + \\ldots\n",
"\\end{array}\n",
"$$\n",
"\n",
"These axioms or assumptions are motivated by the frequency interpretation of probability. The frequency interpretation of probability says that if we repeat an experiment a very large number of times then the fraction of times that the event $A$ occurs will be close to $P(A)$. \n",
"\n",
"More precisely,\n",
"\n",
"$$\n",
"\\begin{array}{llcl}\n",
"\\mbox{let } & N(A, n) & = & \\mbox{ the number of times } A \\mbox{ occurs in the first } n \\mbox{ trials,} \\\\\n",
"\\mbox{then } & P(A) & = & \\lim_{n \\rightarrow \\infty} \\frac{N(A, n)}{n}\n",
"\\end{array}\n",
"$$\n",
"\n",
"To think about this, consider what $\\lim_{n \\rightarrow \\infty} \\frac{N(A, n)}{n}$ is: \n",
"\n",
"$$\n",
"\\begin{array}{c}\n",
"\\frac{N(A, 1)}{1},\\, \\frac{N(A, 2)}{2},\\,\\frac{N(A, 3)}{3},\\,\\ldots \\mbox{where is this fraction going?}\n",
"\\end{array}\n",
"$$\n",
"\n",
"Let's look at axioms 1, 2, and 3 above more closely.\n",
"\n",
"For any event $A$, with $0 \\leq P(A) \\leq 1$. \n",
"Well, clearly $0 \\leq \\frac{N(A, n)}{n} \\leq 1$.\n",
"\n",
"If $\\Omega$ is the sample space, $P(\\Omega) = 1$. This essentially says \"something must happen\". $P(\\Omega) = \\frac{N(\\Omega, n)}{n} = \\frac{n}{n} = 1$.\n",
"\n",
"If $A$ and $B$ are disjoint (i.e., $A \\cap B = \\emptyset$), then $N(A \\cup B, n) = N(A, n) + N(B, n)$ since $A \\cup B$ occurs if either $A$ or $B$ occurs but we know that it is impossible for both $A$ and $B$ to happen (the intersection is the empty set). This extends to infinitely many disjoint events.\n",
"\n",
"Axiom 4 is a bit more controversial, but here we assume it as part of our axiomatic definition of probability (without it the maths is much harder!). \n",
"\n",
"Lets do some probability examples.\n",
"\n",
"### Example 1: Tossing a fair coin\n",
"\n",
"The sample space and probabilties of this experiment are:\n",
"\n",
"$$\\Omega = \\{H, T\\}, \\ \\text{and} \\ P(\\{H\\}) = P(\\{T\\}) = \\frac{1}{2} \\ .$$\n",
"\n",
"We can represent our probability as the following function:\n",
"\n",
"- $P : \\{ \\{H\\} , \\{T\\}, \\{H,T\\}, \\{\\} \\} \\to \\{0,\\frac{1}{2},1\\}$, \n",
"- with $P(\\{H\\})=\\frac{1}{2}$, $P(\\{T\\})=\\frac{1}{2}$, $P(\\{H,T\\})=1$ and $P(\\{\\})=0$.\n",
"\n",
"*notational convenience:* The outcomes which are the elements in the the sample space are denoted without set brackets, for example: $P(\\{H\\})$ is denoted by $P(H)$ for brevity.\n",
"\n",
"Check that all our axioms are satisfied: \n",
"\n",
"1. yes! because: $0 \\le P(H) = P(T) = \\frac{1}{2} \\le 1$ and $0 \\le P(\\Omega) = 1 \\le 1$.\n",
"- yes! becasue: $P(\\Omega) = P(\\{H, T\\} = P(\\{H\\}) + P(\\{T\\}) = \\frac{1}{2} + \\frac{1}{2} = 1$.\n",
"- yes!, because: $P(\\{H, T\\} = P(\\{H\\}) + P(\\{T\\})$.\n",
"\n",
"See [this video about tossing a coin](https://www.youtube.com/watch?v=AYnJv68T3MM) by Persi Diaconis of Stanford's Statistics Department.\n",
"\n",
"### Example 2: Tossing an unfair coin\n",
"\n",
"$\\Omega = \\{H, T\\}$, $P(H) =\\frac{3}{4}$, $P(T) = \\frac{1}{4}$. So the coin lands heads 3 out of 4 times and lands tails only 1 out of 4 times.\n",
"\n",
"Check that all our axioms are satisfied: \n",
"\n",
"1. $0 \\le P(H) = \\frac{3}{4} \\le 1$, $0 \\le P(T) = \\frac{1}{4} \\le 1$ and $0 \\le P(\\Omega) = 1 \\le 1$.\n",
"- $P(\\Omega) = P(\\{H, T\\} = P(\\{H\\}) + P(\\{T\\}) = \\frac{3}{4} + \\frac{1}{4} = 1$.\n",
"- $P(\\{H, T\\} = P(\\{H\\}) + P(\\{T\\})$.\n",
"\n",
"Yes, all three axioms are satisfied by the probabiliy for this unfair coin experiment too.\n",
"\n",
"### Example 2': Tossing an unfair coin $m$ times to construct a random graph\n",
"\n",
"From Wikipedia:\n",
"\n",
"In the mathematical field of [graph theory](https://en.wikipedia.org/wiki/Graph_theory), the Erdős–Rényi model is either of two closely related models for generating [random graphs](https://en.wikipedia.org/wiki/Random_graph). They are named after mathematicians [Paul Erdős](https://en.wikipedia.org/wiki/Paul_Erd%C5%91s) and [Alfréd Rényi](https://en.wikipedia.org/wiki/Alfr%C3%A9d_R%C3%A9nyi), who first introduced one of the models in 1959,[1](https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model#cite_note-er59-1)[2](https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model#cite_note-b01-2) while [Edgar Gilbert](https://en.wikipedia.org/wiki/Edgar_Gilbert) introduced the other model contemporaneously and independently of Erdős and Rényi.[3](https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model#cite_note-g59-3) \n",
"\n",
"In the model of Erdős and Rényi, **all graphs on a fixed vertex set with a fixed number of edges are equally likely**; in the model introduced by Gilbert, **each edge has a fixed probability of being present or absent, [independently](https://en.wikipedia.org/wiki/Statistical_independence) of the other edges**. \n",
"\n",
"#### Draw the construction on a graph with $n$ vertices and $m$ edges.\n",
"\n",
"What is exactly a Graph-valued random variable?\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"showURL(\"https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model\",400)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Example 3: New Zealand Lotto\n",
"\n",
"In New Zealand Lotto, the balls drawn are numbered 1 to 40. The number on a ball is an outcome. \n",
"\n",
"$\\Omega = \\{1, 2, \\ldots,40\\}$, $P(\\omega) = \\frac{1}{40}$ for each $\\omega \\in \\Omega$ (i.e., $P(1) = P(2) = \\ldots = P(40) = \\frac{1}{40}$)\n",
"\n",
"Now, consider the event that the first ball is even? What is the probability of this event, $P(\\{2, 4, 6, \\ldots, 38, 40\\})$?\n",
"\n",
"$$\n",
"\\begin{array}{lcll} P(\\{2, 4, \\ldots, 38, 40\\}) & = & P \\left( \\{2\\} \\cup \\{4\\} \\cup \\cdots \\cup \\{38\\} \\cup \\{40\\} \\right) & \\mbox{(defn. of set union)}\\\\ & = & P( \\{2\\}) +P( \\{4\\}) + \\cdots + P(\\{38\\}) + P( \\{40\\}) & \\mbox{(extn. Axiom 3)} \\\\ & = & \\sum_{i \\in \\{2, 4, \\ldots, 40\\}} P(\\{i\\}) & \\\\ & = & 20 \\times \\frac{1}{40} & \\\\ & = & \\frac{1}{2} & \\end{array}\n",
"$$\n",
"\n",
"Similarly for the probability of an odd ball:\n",
"\n",
"$$\n",
"\\begin{array}{lcl} P(\\{1, 3, 5, \\ldots, 37, 39\\}) & = & P \\left( \\{1\\} \\cup \\{3\\} \\cup \\cdots \\cup \\{37\\} \\cup \\{39\\} \\right)\\\\ & = & P(\\{1\\}) +P( \\{3\\}) + \\cdots + P(\\{37\\}) + P( \\{39\\})\\\\ & = & \\sum_{i \\in \\{1, 3, \\ldots, 37, 39\\}} P(\\{i\\}) \\\\ & = & 20 \\times \\frac{1}{40} \\\\ & = & \\frac{1}{2} \\end{array}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"*Aside:* The set of all possible subsets of $\\Omega$ is called the power set and is denoted by $2^{\\Omega}$. The power set contains all events of a sample space and is a natural domain for the probability function defined on a sample space with finitely many outcomes. We will see more on this later but if you are impatient see here.\n",
"\n",
"Now, having introduced a number of definitions, we will derive some basic logical consequences, (i.e., properties) of probabilities (axiomatically defined).\n",
"\n",
"### Property 1\n",
"\n",
"$P(A) = 1 - P(A^c)$, where $A^c = \\Omega \\setminus A$ is the complement of $A$.\n",
"\n",
"**Proof**\n",
"\n",
"$A \\cap A^c = \\emptyset$ and $A \\cup A^c = \\Omega$\n",
"\n",
"Recall that axiom 3 says that if $A_1 \\cap A_2 = \\emptyset$ then $P(A_1 \\cup A_2) = P(A_1) + P(A_2)$\n",
"\n",
"So this implies that $P(A) + P(A^c) = P(\\Omega) = 1$, by axiom 2 which says that $P(\\Omega) = 1$\n",
"\n",
"Subtracting $P(A^c)$ from both sides, we get $P(A) + P(A^c) - P(A^c) = 1 - P(A^c)$\n",
"\n",
"Cancelling out the two $P(A^c)$ terms on the right hand side, we get $P(A) = 1 - P(A^c)$ . \n",
"\n",
"\n",
"For example, in the coin tossing experiment, $\\Omega = \\{H, T\\}$\n",
"\n",
"$$P(H) = 1 - P(H^c) = 1 - P(\\Omega \\setminus H) = 1 - P(T)$$\n",
"\n",
"### Property 2\n",
"\n",
"For any two events $A$, $B$,\n",
"\n",
"$P(A \\cup B) = P(A) + P(B) - P(A \\cap B)$\n",
"\n",
"**Proof:**\n",
"\n",
"
\n",
"\n",
"This is an informal proof using the picture above. If we just add the probabilities of $A$ and $B$ we will double count the probabilties of the outcomes which are in both $A$ and $B$. We adjust for this double counting by subtracting $P(A \\cap B)$.\n",
"\n",
"Note that if $A \\cap B = \\emptyset$ then $P(A \\cap B) = 0$ and $P(A \\cup B) = P(A) + P(B)$."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Computer Representations of Mathematical Concepts and Objects\n",
"\n",
"So we have probabilities associated with events satisfying some axioms -- that sounds like a good use for a dictionary? That's basically what are going to use, but we 'wrap' up the dictionary in some extra code that does things like check that the probabilities add to 1, and that each element in the list of outcomes we have given is unique. If you are interested in programming, we have coded our own type, or class, called `ProbyMap` (it's given in the next cell, and you can simply evaluate it and ignore all the details completely!). Once the class is coded, we create a `ProbyMap` by providing the sample space and the probabilities. You don't have to worry about how the class is implemented, but note that you may often want to use the computer to create a **computerised representation of a mathematical concept**. Once the concept (a discrete probability map in our case) is implemenetd, then we can use the computer to automate the mundane tasks of large-scale computations.\n",
"\n",
"SageMath already has some such implementations that are more sophisticated and general: \n",
"\n",
"- [probability distributions](http://doc.sagemath.org/html/en/reference/probability/sage/probability/probability_distribution.html)\n",
"- [random variables](http://doc.sagemath.org/html/en/reference/probability/sage/probability/random_variable.html)\n",
"\n",
"Here we will roll our own for pedagogical reasons."
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# create a class for a probability map - if you are new to SageMath/Python just evaluate and skip this cell\n",
"# This was coded by Jenny Harlow\n",
"\n",
"import copy\n",
"class ProbyMap(object): # class definition\n",
" 'Probability map class'\n",
" def __init__(self, sspace, probs): # constructor\n",
" self.__probmap = {} # default probmap is empty\n",
" # make checks on the objects given as sspace and probs\n",
" try: \n",
" sspace_set = set(sspace) # check that we can make the sample space into a set\n",
" assert len(sspace_set) == len(sspace) # and not lose any elements\n",
" prob_list = list(probs) # and we can make the probs into a list\n",
" probsum = sum(prob_list) # and we can sum the probs\n",
" assert probsum == 1 # and the probs sum to 1\n",
" assert len(prob_list) == len(sspace_set) # and there is proby for each event\n",
" \n",
" self.__probmap = dict(zip(list(sspace),prob_list)) # map from sspace to probs\n",
" \n",
" except TypeError as diag: # if there any problems with types\n",
" init_error = 1\n",
" print (str(diag))\n",
" \n",
" except AssertionError as e:\n",
" init_error = 1\n",
" print (\"Check sample space and probabilities\")\n",
" \n",
" \n",
" def P(self, events):\n",
" '''Return the probability of an event or set of events.\n",
" \n",
" events is set of events in the sample space to calculate the probability for.'''\n",
" \n",
" retvalue = 0\n",
" try: \n",
" events_set = set(events) # check we can make a set out of the events\n",
" assert len(events_set) == len(events) # and not lose any events\n",
" assert events_set <= set(self.__probmap.keys()) # events subset of sample space\n",
" \n",
" for ev in events: # add each mapped probability to the return value\n",
" retvalue += self.__probmap[ev]\n",
" \n",
" except TypeError as diag:\n",
" print (str(diag)) \n",
" \n",
" except AssertionError:\n",
" print (\"Check your events\")\n",
" \n",
" return retvalue\n",
" \n",
" def __str__(self): # redefine printable string rep\n",
" 'Printable representation of the object.'\n",
" num_keys = len(self.__probmap.keys())\n",
" counter = 0\n",
" retval = '{'\n",
" for each_key in self.__probmap:\n",
" counter += 1\n",
" retval += str(each_key)\n",
" retval += ': '\n",
" retval += \"%.3f\" % self.__probmap[each_key]\n",
" if counter < num_keys:\n",
" retval += ', '\n",
" retval += '}' \n",
" \n",
" return retval\n",
" \n",
" __repr__ = __str__\n",
" \n",
" def get_probmap(self): # get a deep copy of the proby map\n",
" return copy.deepcopy(self.__probmap) # getter cannot alter object's map\n",
" \n",
" probmap = property(get_probmap) # allow read access via .probmap\n",
" \n",
" def get_ref_probmap(self): # get a reference to the real probmap\n",
" return self.__probmap # getter can alter the object's map\n",
" \n",
" \n",
" ref_probmap = property(get_ref_probmap) # allow access via .ref_probmap\n",
" \n",
" @staticmethod\n",
" def dictExp(big_map, small_map):\n",
" '''Internal helper function for __pow__(...).\n",
" \n",
" Takes two proby map dictionaries and returns one mult by other.'''\n",
" new_bl = {}\n",
" for sle in small_map:\n",
" for ble in big_map:\n",
" new_key = str(ble) + ' ' + str (sle)\n",
" new_bl[new_key] = big_map[ble]*small_map[sle]\n",
" return new_bl\n",
" \n",
" def __pow__(self, x):\n",
" '''probability map exponentiated.'''\n",
" try:\n",
" assert isinstance(x, Integer)\n",
" pmap = copy.deepcopy(self.__probmap) # copy the probability map dictionary\n",
" new_pmap = copy.deepcopy(self.__probmap) # and another copy\n",
" for i in range(x-1):\n",
" new_pmap = self.dictExp(new_pmap, pmap)\n",
" \n",
" return ProbyMap(new_pmap.keys(), new_pmap.values()) \n",
" \n",
" except AssertionError as e:\n",
" print (\"cannot raise to non-integer power\")\n",
" return None"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"#### Example 4: Experiments, outcomes, sample spaces, events, and the probability of events\n",
"\n",
"Let's go back to the well-mixed fruit bowl experiment. The fruit bowl contains:\n",
"\n",
"- 2 oranges\n",
"- 3 apples\n",
"- 1 lemon\n",
"\n",
"The experiment is to take one piece of fruit from the bowl and the outcome is the type of fruit we get. \n",
"\n",
"The sample space is $\\Omega = \\{orange, apple, lemon\\}$\n",
"\n",
"We can use the Sage list to create this sample space (a list is a bit easier to use than a set, but using a list means that we are responsible for making sure that each element contained in it is unique)."
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# sample space is the set of distinct type of fruits in the bowl\n",
"samplespace = ['orange', 'apple', 'lemon']"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"We can also use a list to specify what the probability of each outcome in the sample space is. The probabilities can be calculated by knowing how many fruits of each kind are there in the bowl. We say that the fruit bowl is 'well-stirred', which essentially means that when we pick a fruit it really is a 'random sample' from the bowl (for example, we have not carefully put all the apples on the top so that someone will almost certainly get an apple when they take a fruit). More on this later in the course! Note that the probabilities encoded by a list named probabilities are in the same order as the outcomes in the samplespace list. "
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# probabilities take into account the number of each type of fruit in the \"well-stirred\" fruit bowl\n",
"probabilities = [2/6, 3/6, 1/6]"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{orange: 0.333, apple: 0.500, lemon: 0.167}"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"probMapFruitbowl = ProbyMap(sspace = samplespace, probs=probabilities) # make our probability map\n",
"probMapFruitbowl # disclose our probability map"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"We can use our probability map to find the probability of a single outcome like this:"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1/6"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Find the probability of outcome 'lemon'\n",
"probMapFruitbowl.P(['lemon'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"We can also use our probability map to find the probability of an event (set of outcomes)."
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1/2"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Find the probability of the event {lemon, orange}\n",
"probMapFruitbowl.P(['lemon','orange'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Basically, the probability map implemenetd by `ProbyMap` is essentially a map or dictionary with some additional bells and whistles."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Basically, the probability map is essentially a map or dictionary.\n",
"\n",
"Next we will obtain the set of all events (the largest $\\sigma$-algebra or $\\sigma$-field in math lingo) from the outcomes in our sample space via the `Subset` function and find the probability of each event using our `ProbyMap` in a for loop."
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[{},\n",
" {'orange'},\n",
" {'apple'},\n",
" {'lemon'},\n",
" {'apple', 'orange'},\n",
" {'lemon', 'orange'},\n",
" {'apple', 'lemon'},\n",
" {'apple', 'lemon', 'orange'}]"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# make the set of all possible events from the set of outcomes \n",
"setOfAllEvents = Subsets(samplespace) # Subsets(A) returns the set of all subsets of A\n",
"list(setOfAllEvents) # disclose the set of all events"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"We have not done loops yet, but we will soon. Just as a foretaste, here we use a for loop to print out the computed probabilities for each event in the `setOfAllEvents`."
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"P( {} ) = 0\n",
"P( {'orange'} ) = 1/3\n",
"P( {'apple'} ) = 1/2\n",
"P( {'lemon'} ) = 1/6\n",
"P( {'apple', 'orange'} ) = 5/6\n",
"P( {'lemon', 'orange'} ) = 1/2\n",
"P( {'apple', 'lemon'} ) = 2/3\n",
"P( {'apple', 'lemon', 'orange'} ) = 1\n"
]
}
],
"source": [
"# loop through the set of all events and print the computed probability\n",
"for event in setOfAllEvents:\n",
" print (\"P(\", event, \") = \", probMapFruitbowl.P(event))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"#### YouTry\n",
"\n",
"Try working through Example 6 below for yourself in the tutorial.\n",
"\n",
"### Example 5: Experiments with the English language\n",
"\n",
"In English language text there are 26 letters in the alphabet. The relative frequencies with which each letter appears is tabulated below:\n",
"\n",
" \n",
"\n",
"\n",
"E | \n",
"13.0% | \n",
"H | \n",
"3.5% | \n",
"W | \n",
"1.6% | \n",
"
\n",
"\n",
"T | \n",
"9.3% | \n",
"L | \n",
"3.5% | \n",
"V | \n",
"1.3% | \n",
"
\n",
"\n",
"N | \n",
"7.8% | \n",
"C | \n",
"3.0% | \n",
"B | \n",
"0.9% | \n",
"
\n",
"\n",
"R | \n",
"7.7% | \n",
"F | \n",
"2.8% | \n",
"X | \n",
"0.5% | \n",
"
\n",
"\n",
"O | \n",
"7.4% | \n",
"P | \n",
"2.7% | \n",
"K | \n",
"0.3% | \n",
"
\n",
"\n",
"I | \n",
"7.4% | \n",
"U | \n",
"2.7% | \n",
"Q | \n",
"0.3% | \n",
"
\n",
"\n",
"A | \n",
"7.3% | \n",
"M | \n",
"2.5% | \n",
"J | \n",
"0.2% | \n",
"
\n",
"\n",
"S | \n",
"6.3% | \n",
"Y | \n",
"1.9% | \n",
"Z | \n",
"0.1% | \n",
"
\n",
"\n",
"D | \n",
"4.4% | \n",
"G | \n",
"1.6% | \n",
" | \n",
" | \n",
"
\n",
"\n",
"
\n",
"\n",
"Using these relative frequencies as probabilities we can create a probability map for the letters in the English alphabet. We start by defining the sample space and the probabilities."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"alphaspace = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q',\n",
" 'R','S','T','U','V','W','X','Y','Z']\n",
"alphaRelFreqs = [73/1000,9/1000,30/1000,44/1000,130/1000,28/1000,16/1000,35/1000,74/1000,\n",
" 2/1000,3/1000,35/1000, 25/1000,78/1000,74/1000,27/1000,3/1000,77/1000,63/1000,\n",
" 93/1000,27/1000,13/1000,16/1000,5/1000,19/1000,1/1000]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Then we create the probability map, represented by a `ProbyMap` object."
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{A: 0.073, B: 0.009, C: 0.030, D: 0.044, E: 0.130, F: 0.028, G: 0.016, H: 0.035, I: 0.074, J: 0.002, K: 0.003, L: 0.035, M: 0.025, N: 0.078, O: 0.074, P: 0.027, Q: 0.003, R: 0.077, S: 0.063, T: 0.093, U: 0.027, V: 0.013, W: 0.016, X: 0.005, Y: 0.019, Z: 0.001}"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"probMapLetters = ProbyMap(sspace = alphaspace, probs=alphaRelFreqs) # make our probability map\n",
"probMapLetters # disclose our probability map"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Please do NOT try to list the set of all subsets (events) of the 26 alphabet set (sample space): there are over 67 million events and the computer will probably crash by running out of memory! You can see how large a number we are talking about by evaluating the next cell which calculates $2^{26}$ for you."
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"67108864"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"2^26"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Instead of asking for the probability of each event (over 67 million of them to exhaustively march through!) we define some events of interest, say the vowels in the alphabet or the set of letters that make up a name."
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"vowels = ['A', 'E', 'I', 'O', 'U']"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"And we can get the probability that a letter drawn from a 'well-stirred' jumble of English letters is a vowel."
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"189/500"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"probMapLetters.P(vowels)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"We can make ourselves another set of letters and find probabilities for that too. In the cell below, we go straight from a string to a set. The reason that we can do this is that a string or `str` is in fact another collection; `str` is a [sequence type](https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange) just like `list`. "
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'A', 'D', 'E', 'H', 'I', 'N', 'R', 'S', 'U', 'Z'}"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"NameOfRaaz=set(\"RAAZESHSAINUDIIN\") # make a set from a string\n",
"NameOfRaaz # disclose the set NameOfRaaz you have built"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"301/500"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"probMapLetters.P(NameOfRaaz)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"Try either adapting what we have above, or doing the same thing in some of cells below, to find the probabilities of other sets of letters yourself. For example, what is the probability of your name in the above sense?"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"The crucial point of the above exercise with our own implementation of a `Python class` for probability maps is to show how computers and mathematics can go hand in hand to quickly generalize operations over specific instances of a large class of mathematical notions (Example 4 and 5 for probability models on finite sample spaces in our case above)."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Example oo - A Double Pendulum Experiment\n",
"\n",
"See [http://lamastex.org/lmse/double-pendulum/](http://lamastex.org/lmse/double-pendulum/) for a more interesting statistical experiment. It is hanging from the window of Room 64108, Angstrom Laboratory, Uppsala Sweden. So one can actually collect data from it and experiment if interested.\n",
"\n",
"Think about the sample space $\\Omega$ for releasing the double pendulum from a given initial position for each arm.\n",
"\n",
"What is the Random Variable being measured for this experiment?\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### A Coming Attraction! \n",
"\n",
"Recall how we downloaded *Pride and Prejudice* and processed it as a String and break it by Chapters. This is at our disposal - all we need to do is copy-paste the right set of cells from earlier here to have the string from that Book by fetching live from the Guthenberg Project again (especially if you want to process the strings in another book) or directly loading from `data/pride_and_prejudice.txt`.\n",
"\n",
"Think about what algorithmic constructs and methods one will need to `split` each sentence by `words` it contains and count the number of each distinct word (we have already done this when we counted the number of occurrences of `he` and `she` in the previous notebook - see if it makes more sense now that you know a bit more Python).\n",
"\n",
"To fully understand that example we will need to understand `for` loops, `list` comprehensions and anonymous `function`s, as well as methods on strings for splitting (which you can search by adding a `.` after a `str` and hitting the `Tab` button to look through existing methods), and more crucially the data-structure we just saw called the `dictionary`, as we will see shortly.\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "SageMath 9.1",
"language": "sagemath",
"metadata": {
"cocalc": {
"description": "Open-source mathematical software system",
"priority": 1,
"url": "https://www.sagemath.org/"
}
},
"name": "sage-9.1",
"resource_dir": "/ext/jupyter/kernels/sage-9.1"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
},
"lx_course_instance": "2021",
"lx_course_name": "Introduction to Data Science: A Comp-Math-Stat Approach",
"lx_course_number": "1MS041"
},
"nbformat": 4,
"nbformat_minor": 4
}