02. Numbers, Strings, Booleans and Sets

We will start by showing you some of the basic numeric capabilities of SageMath.

Numbers and Arithmetic Operations

A worksheet cell is the area enclosed by a gray rectangle.
You may type any expression you want to evaluate into a worksheet cell. We have already put some expressions into this worksheet.

When you are in a cell you can evaluate the expression in it by pressing or just by clicking the evaluate button below the cell.

To start with, we are going to be using SAGE like a hand-held calculator. Let's perform the basic arithmetic operations of addition, subtraction, multiplication, division, exponentiation, and remainder over the three standard number systems: Integers denoted by $\mathbb{Z}$, Rational Numbers denoted by $\mathbb{Q}$ and Real Numbers denoted by $\mathbb{R}$. Let us recall the real number line and the basics of number systems next.

Number sets within Complex Numbers

In [1]:
def showURL(url, ht=500):
    """Return an IFrame of the url to show in notebook with height ht """
    from IPython.display import IFrame
    return IFrame(url, width='95%', height=ht) 
showURL('https://en.wikipedia.org/wiki/Number',400)
Out[1]:

The most basic numbers are called natural numbers and they are denoted by $\mathbb{N} :=\{0, 1,2,3,\ldots\}$. See https://en.wikipedia.org/wiki/Natural_number.

The natural numbers are the basis from which many other number sets may be built by extension: the integers, by including (if not yet in) the neutral element 0 and an additive inverse (−n) for each nonzero natural number n; the rational numbers, by including a multiplicative inverse (1/n) for each nonzero integer n (and also the product of these inverses by integers); the real numbers by including with the rationals the limits of (converging) Cauchy sequences of rationals; the complex numbers, by including with the real numbers the unresolved square root of minus one (and also the sums and products thereof); and so on. These chains of extensions make the natural numbers canonically embedded (identified) in the other number systems.

In [2]:
showURL("https://en.wikipedia.org/wiki/Natural_number#Notation",300)
Out[2]:

Let us get our fingers dirty with some numerical operations in SageMath.

Note that anything after a '#' symbol is a comment - comments are ignored by SAGE but help programmers to know what's going on.

Example 1: Integer Arithmetic

Try evaluating the cell containing 1+2 below by placing the cursor in the cell and pressing .

In [3]:
1+2 # one is being added to 2
Out[3]:
3

Now, modify the above expression and evaluate it again. Try 3+4, for instance.

In [4]:
3-4 # subtracting 4 from 3
Out[4]:
-1

The multiplication operator is *, the division operator is /.

In [5]:
2*6 # multiplying 2 by 6
Out[5]:
12
In [6]:
15/5 # dividing 15 by 5
Out[6]:
3
In [7]:
type(1)
Out[7]:
<class 'sage.rings.integer.Integer'>

The exponentiation operator is ^.

In [8]:
2^3 # exponentiating 2 by 3, i.e., raising 2 to the third power
Out[8]:
8

However, Python's exponentiation operator ** also works.

In [9]:
2**3
Out[9]:
8

Being able to finding the remainder after a division is surprisingly useful in computer programming.

In [10]:
11%3 # remainder after 11 is divided by 3; i.e., 11=3*3+2
Out[10]:
2

Another way of referring to this is 11 modulus 3, which evaluates to 2. Here % is the modulus operator.

You try

Try typing in and evaluating some expressions of your own. You can get new cells above or below an existing cell by clicking 'Insert' in the menu above and 'Insert Cell Above' or 'Insert Cell below'. You can also place the cursor at an existing cell and click + icon above to get a new cell below.

In [ ]:
 

What happens if you put space between the characters in your expression, like: 1 + 2 instead of 1+2?.

In [ ]:
 

Example 2: Operator Precedence for Evaluating Arithmetic Expressions

Sometimes we want to perform more than one arithmetic operation with some given integers.
Suppose, we want to

  • "divide 12 by 4 then add the product of 2 and 3 and finally subtract 1."

Perhaps this can be achieved by evaluating the expression "12/4+2*3-1"?

But could that also be interpreted as

  • "divide 12 by the sum of 4 and 2 and multiply the result by the difference of 3 and 1"?

In programming, there are rules for the order in which arithmetic operations are carried out. This is called the order of precedence.

The basic arithmetic operations are: +, -, *, %, /, ^.

The order in which operations are evaluated are as follows:

  • ^ Exponents are evaluated right to left
  • *, %, / Then multiplication, remainder and division operations are evaluated left to right
  • +, - Finally, addition and subtraction are evaluated left to right

When operators are at the same level in the list above, what matters is the evaluation order (right to left, or left to right).

Operator precedence can be forced using parenthesis.

In [11]:
showURL("https://en.wikipedia.org/wiki/Order_of_operations", 300)
Out[11]:
In [12]:
(12/4) + (2*3) - 1 # divide 12 by 4 then add the product of 2 and 3 and finally subtract 1
Out[12]:
8
In [13]:
12/4+2*3-1 # due to operator precedence this expression evaluates identically to the parenthesized expression above
Out[13]:
8

Operator precedence can be forced using nested parentheses. When our expression has nested parenthesis, i.e., one pair of parentheses inside another pair, the expression inside the inner-most pair of parentheses is evaluated first.

The following cell evaluates the mathematical expression:

$$\frac{12}{4+2} (3-1)$$
In [14]:
(12/(4+2)) * (3-1)  # divide 12 by the sum of 4 and 2 and multiply the result by the difference of 3 and 1
Out[14]:
4

You try

Try writing an expression which will subtract 3 from 5 and then raise the result to the power of 3.

In [ ]:
 

Find out for yourself what we mean by the precedence for exponentiation (^) being from right to left:

  • What do you think the expression 3^3^2 would evaluate to?
    • Is it the same as (3^3)^2, i.e., 27 squared, or
    • 3^(3^2), i.e., 3 raised to the power 9?

Try typing in the different expressions to find out:

In [ ]:
 
In [ ]:
 

Find an expression which will add the squares of four numbers together and then divide that sum of squares by 4.

In [ ]:
 

Find what the precedence is for the modulus operator % that we discussed above: try looking at the difference between the results for 10%2^2 and 10%2*2 (or 10^2+2). Can you see how SageMath is interpreting your expressions?

Note that when you have two operators at the same precedence level (like % and *), then what matters is the order - left to right or right to left. You will see this when you evaluate 10%2*2.

In [ ]:
 
In [ ]:
 

Does putting spaces in your expression make any difference?

In [ ]:
 

Using parenthesis or white spaces can improve readability a lot! So be generous with them to evaluate the following expression:

$$10^2 + 2^8 -4$$
In [15]:
10^2+2^8-4
Out[15]:
352
In [16]:
10^2 + 2^8 -4
Out[16]:
352
In [17]:
(((10^2) + (2^8)) - 4)  # this may be overkill!
Out[17]:
352

The lesson to learn is that it is always good to use the parentheses: you will make it clear to someone reading your code what you mean to happen as well as making sure that the computer actually does what you mean it to!

Try this 10 minutes-long videos to get some practice if you are really rusty with order of operations:

Example 3: Rational Arithmetic

So far we have been dealing with integers. Integers are a type in SAGE. Algebraically speaking, integers, rational numbers and real numbers form a ring. This is something you will learn in detail in a maths course in Group Theory or Abstract Algebra, but let's take a quick peek at the definition of a ring.

In [18]:
showURL("https://en.wikipedia.org/wiki/Ring_(mathematics)#Definition_and_illustration",400)
Out[18]:
In [19]:
type(1) # find the data type of 1
Out[19]:
<class 'sage.rings.integer.Integer'>

The output above tells us that 1 is of type sage.rings.integer.Integer.

In [20]:
showURL("https://en.wikipedia.org/wiki/Integer",400)
Out[20]:

However, life with only integers denoted by $\mathbb{Z} := \{\ldots,-3,-2,-1,0,1,2,3,\ldots\}$ is a bit limited. What about values like $1/2$ or $\frac{1}{2}$?

This brings us to the rational numbers denoted by $\mathbb{Q}$.

In [21]:
showURL("https://en.wikipedia.org/wiki/Rational_number",400)
Out[21]:
In [22]:
type(1/2) # data type of 1/2 is a sage.rings.rational.Rational
Out[22]:
<class 'sage.rings.rational.Rational'>

Try evaluating the cell containing 1/2 + 2 below.

In [23]:
1/2 + 2 # add one half to 2 or four halves to obtain the rational number 5/2 or five halves
Out[23]:
5/2

SageMath seems to have done rational arithmetic for us when evaluating the above expression.

Next, modify the expression in the cell below and evaluate it again. Try 1/3+2/4, for instance.

In [24]:
1/2 + 1/3
Out[24]:
5/6

You can do arithmetic with rationals just as we did with integers.

In [25]:
3/4 - 1/4 # subtracting 3/4 from 1/4
Out[25]:
1/2
In [26]:
1/2 * 1/2 # multiplying 1/2 by 1/2
Out[26]:
1/4
In [27]:
(2/5) / (1/5) # dividing 2/5 by 1/5
Out[27]:
2
In [28]:
(1/2)^3 # exponentiating 1/2 by 3, i.e., raising 1/2 to the third power
Out[28]:
1/8

You try

Write an expression which evaluates to 1 using the rationals 1/3 and 1/12, some integers, and some of the arithmetical operators - there are lots of different expressions you can choose, just try a few.

In [ ]:
 

What does SageMath do with something like 1/1/5? Can you see how this is being interpreted? What should we do if we really want to evaluate 1 divided by 1/5?

In [ ]:
 

Try adding some rationals and some integers together - what type is the result?

Example 4: Real Arithmetic (multi-precision floating-point arithmetic)

Recall that real numbers denoted by $\mathbb{R}$ include natural numbers ($\mathbb{N}$), integers ($\mathbb{Z}$), rational numbers ($\mathbb{Q}$) and various types of irrational numbers like:

Real numbers can be thought of as all the numbers in the real line between negative infinity and positive infinity. Real numbers are represented in decimal format, for e.g. 234.4677878.

In [29]:
showURL("https://en.wikipedia.org/wiki/Real_number#Definition",400)
Out[29]:

We cannot do exact real arithmetic (but do it aproximately) in a computer with http://www.mpfr.org/'s multiprecision floating-point numbers, and can combine them with integer and rational types in SageMath.

Technical note: Computers can be made to exactly compute in integer and rational arithmetic. But, because computers with finite memory (all computers today!) cannot represent the uncountably infinitely many real numbers, they can only mimic or approximate arithmetic over real numbers using finitely many computer-representable floating-point numbers.

Let's take a peak at von Neumann architecture of any typical computer today.

Von_Neumann_Architecture.svg

The punch-line is that the Memory unit or Random Access Memory (RAM) inside the Central Processing Unit as well as Input and Output Devices are physical with finite memory. Therefore we cannot exactly represent all the uncontably infinitely many real numbers in $\mathbb{R}$ and need to resort to their approximation using floating point numbers.

In [30]:
showURL("https://en.wikipedia.org/wiki/Von_Neumann_architecture",400)
Out[30]:

See SageMath Quick Start on Numerical Analysis to understand SageMath's multiprecision real arithmetic.

For now, let's compare the results of evaluating the expressions below to the equivalent expressions using rational numbers above.

In [31]:
type(0.5) # data type of 0.5 is a sage.rings.real_mpfr.RealLiteral
Out[31]:
<class 'sage.rings.real_mpfr.RealLiteral'>
In [32]:
RR # Real Field with the default 53 bits of precision
Out[32]:
Real Field with 53 bits of precision
In [33]:
RR(0.5)  # RR(0.5) is the same as 0.5 in SageMath
Out[33]:
0.500000000000000
In [34]:
type(RR(0.5)) # RR(0.5) is the same type as 0.5 from before
Out[34]:
<class 'sage.rings.real_mpfr.RealLiteral'>
In [35]:
??RR
In [36]:
0.5 + 2 # one half as 0.5 is being added to 2 to obtain the real number 2.500..0 in SageMath
Out[36]:
2.50000000000000
In [37]:
0.75 - 0.25 # subtracting 0.75 from 0.25 is the same as subtracting 0.75 from 1/4
Out[37]:
0.500000000000000
In [38]:
0.5 * 0.5 # multiplying 0.5 by 0.5 is the same as 1/2 * 1/2
Out[38]:
0.250000000000000
In [39]:
(2 / 5.0) / 0.2 # dividing 2/5. by 0.2 is the same as (2/5) / (1/5)
Out[39]:
2.00000000000000
In [40]:
0.5^3.0 # exponentiating 0.5 by 3.0 is the same as (1/2)^3
Out[40]:
0.125000000000000

You try

Find the type of 1/2.

In [ ]:
 

Try a few different ways of getting the same result as typing ((((1/5) / (1/10)) * (0.1 * 2/5) + 4/100))*5/(3/5) - this exact expression has already been put in for you in the cell below you could try something just using floating point numbers. Then see how important the parentheses are around rationals when you have an expression like this - try taking some of the parenthesis out and just play with complex expressions like these to get familiar.

In [41]:
((((1/5) / (1/10)) * (0.1 * 2/5) + 4/100))*5/(3/5)
Out[41]:
1.00000000000000
In [42]:
((((1/5) / (1/10)) * (1/10 * 2/5) + 4/100))*5/(3/5)
Out[42]:
1
In [ ]:
 
In [ ]:
 

Example 5: Variables and assignments of numbers and expressions

Loosely speaking one can think of a variable as a way of referring to a memory location used by a computer program. A variable is a symbolic name for this physical location. This memory location contains values, like numbers, text or more complicated types and crucially what is contained in a variable can change based on operations we do to it.

In SageMath, the symbol = is the assignment operator. You can assign a numerical value to a variable in SageMath using the assignment operator. This is a good way to store values you want to use or modify later.

(If you have programmed before using a a language like C or C++ or Java, you'll see that SageMath is a bit different because in SageMath you don't have to say what type of value is going to be assigned to the variable.)

Feel free to take a deeper dive into the computer science concept of assignment.

In [43]:
a = 1    # assign 1 to a variable named a
In [44]:
a        # disclose a  - you need to explicitly do this!
Out[44]:
1

Just typing the name of a variable to get the value works in the SageMath Notebook, but if you are writing a program and you want to output the value of a variable, you'll probably want to use something like the print command.

In [45]:
print(a)
1
In [46]:
b = 2
c = 3
print(a, b, c)  # print out the values of a and b and c
1 2 3
In [47]:
x=2^(1/2)
x
Out[47]:
sqrt(2)
In [48]:
type(x)   # x is a sage symbolic expression
Out[48]:
<class 'sage.symbolic.expression.Expression'>

Many of the commands in SageMath/Python are "methods" of objects.

That is, we access them by typing:

  • the name of the mathematical object,
  • a dot/period,
  • the name of the method, and
  • parentheses (possibly with an argument).

This is a huge advantage, once you get familiar with it, because it allows you to do only the things that are possible, and all such things. See SageMath programming guide for more details on this.

Let's try to hit the Tab button after the . following x below to view all available methods for x which is currently sqrt(2).

In [49]:
x.  #  hit the Tab button after the '.' following 'x' 
  File "<ipython-input-49-5259989eca8f>", line 1
    x.  #  hit the Tab button after the '.' following 'x'
                                                          ^
SyntaxError: invalid syntax
In [ ]:
help(x.n)
# we can use ? after a method to get breif help
In [50]:
x.n(digits=10) # this gives a numerical approximation for x
Out[50]:
1.414213562
In [51]:
s = 1; t = 2; u = 3;
print(s + t + u)
6
In [52]:
f=(5-3)^(6/2)+3*(7-2) # assign the expression to f
f # disclose f
Out[52]:
23
In [53]:
type(f)
Out[53]:
<class 'sage.rings.rational.Rational'>

You try

Try assigning some values to some variables - you choose what values and you choose what variable names to use. See if you can print out the values you have assigned.

In [ ]:
 
In [ ]:
 

You can reassign different values to variable names. Using SageMath you can also change the type of the values assigned to the variable (not all programming languages allow you to do this).

In [54]:
a = 1
print("Right now, a =", a, "and is of type", type(a)) # using , and strings in double quotes print can be more flexible

a = 1/3 # reassign 1/3 to the variable a
print("Now, a =", a, "and is of type", type(a)) # note the change in type
Right now, a = 1 and is of type <class 'sage.rings.integer.Integer'>
Now, a = 1/3 and is of type <class 'sage.rings.rational.Rational'>

You try

Assign the value 2 to a variable named x.

On the next line down in the same cell, assign the value 3 to a variable named y.

Then (on a third line) put in an expression which will evaluate x + y

In [ ]:
 

Now try reassigning a different value to x and re-evaluating x + y

In [ ]:
 

Example 6: Strings

Variables can be strings (an not just numbers). Anything you put inside quote marks will be treated as a string by SageMath/Python.

Strings as str and unicode are built-in sequence types for storing strings of bytes and unicode-encoded characters and and operating over them.

In [55]:
myStr = "this is a string"   # assign a string to the variable myStr
myStr                        # disclose myStr
Out[55]:
'this is a string'
In [56]:
type(myStr)                  # check the type for myStr
Out[56]:
<class 'str'>

You can also create a string by enclosing them in single quotes or three consecutive single quotes. In SageMath/Python a character (represented by the char type in languages like C/C++/Scala) is just a string made up of one character.

In [57]:
myStr = 'this is a string'   # assign a string to the variable myStr using single quotes
myStr                        # disclose myStr
Out[57]:
'this is a string'

You can assign values to more than one variable on the same line, by separating the assignment expressions with a semicolon ;. However, it is usually best not to do this because it will make your code easier to read (it is hard to spot the other assignments on a single line after the first one).

In [58]:
myStr = '''this is a string'''   # assign a string to the variable myStr using three consecutive single quotes
myStr                            # disclose myStr
Out[58]:
'this is a string'

Using triple single quotes is especially useful if your string has single or double quotes within it. Triple quotes are often used to create DocString to document code in Pyhton/SageMath.

In [59]:
myStrContainingQuotes = '''this string has "a double quoted sub-string" and some escaped characters: \\', - all OK!'''
myStrContainingQuotes
Out[59]:
'this string has "a double quoted sub-string" and some escaped characters: \\\', - all OK!'

Pride and Prejudice as unicode

We will explore frequencies of strings for the most downloaded book at Project Gutenberg that publishes public domain books online. Currently, books published before 1923 are in the public domain - meaning anyone has the right to copy or use the text in any way.

Pride and Prejudice by Jane Austin had the most number of downloads and it's available from

A quick exploration allows us to see the utf-encoded text here.

For now, we will just show how to download the most popular book from the project and display it's contents for processing down the road.

In [60]:
# this downloads the unicode text of the book from the right url we found at the Gutenberg Project
# and assigns it to a variable named prideAndPrejudiceRaw
from urllib.request import * 
prideAndPrejudiceRaw = urlopen('http://www.gutenberg.org/files/1342/1342-0.txt').read().decode('utf-8')
prideAndPrejudiceRaw[0:1000] # just showing the first 1000 raw characters of the downloaded book as unicode
Out[60]:
'\ufeff\r\nThe Project Gutenberg EBook of Pride and Prejudice, by Jane Austen\r\n\r\nThis eBook is for the use of anyone anywhere at no cost and with\r\nalmost no restrictions whatsoever.  You may copy it, give it away or\r\nre-use it under the terms of the Project Gutenberg License included\r\nwith this eBook or online at www.gutenberg.org\r\n\r\n\r\nTitle: Pride and Prejudice\r\n\r\nAuthor: Jane Austen\r\n\r\nRelease Date: August 26, 2008 [EBook #1342]\r\nLast Updated: November 12, 2019\r\n\r\n\r\nLanguage: English\r\n\r\nCharacter set encoding: UTF-8\r\n\r\n*** START OF THIS PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***\r\n\r\n\r\n\r\n\r\nProduced by Anonymous Volunteers, and David Widger\r\n\r\nTHERE IS AN ILLUSTRATED EDITION OF THIS TITLE WHICH MAY VIEWED AT EBOOK\r\n[# 42671 ]\r\n\r\ncover\r\n\r\n\r\n\r\n\r\n      Pride and Prejudice\r\n\r\n      By Jane Austen\r\n\r\n        CONTENTS\r\n\r\n         Chapter 1\r\n\r\n         Chapter 2\r\n\r\n         Chapter 3\r\n\r\n         Chapter 4\r\n\r\n         Chapter 5\r\n\r\n         Chapter 6\r\n\r\n         Chapter 7\r\n\r\n         Chapter 8\r\n\r\n '
In [61]:
type(prideAndPrejudiceRaw) # this is a sequence of utf-8-encoded characters
Out[61]:
<class 'str'>
In [62]:
len(prideAndPrejudiceRaw) 
# the length of the unicode string is about 700 thousand unicode characters
Out[62]:
790335

Next we will show how trivial it is to "read" all the chapters into SageMath/Python using these steps:

  • we use regular expressions via the re library to substitue all occurences of white-space characters like one or more consecutive end-of-line, tabs, white space characters, etc with a single white space,
  • we split by 'Chapter ' into multiple chapters in a list
  • print the first 100 character in each of the first 10 Chapters

(don't worry about the details now - we will revist these in detail later)

In [63]:
myString = "strBlah"
myString
Out[63]:
'strBlah'
In [64]:
myString.split?
In [65]:
import re 
# make a list of chapters
chapterList = re.sub('\\s+', ' ',prideAndPrejudiceRaw).split('Chapter ')[1:]
for chapter in chapterList:
    print(chapter[0:100])
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
1 It is a truth universally acknowledged, that a single man in possession of a good fortune, must be
2 Mr. Bennet was among the earliest of those who waited on Mr. Bingley. He had always intended to vi
3 Not all that Mrs. Bennet, however, with the assistance of her five daughters, could ask on the sub
4 When Jane and Elizabeth were alone, the former, who had been cautious in her praise of Mr. Bingley
5 Within a short walk of Longbourn lived a family with whom the Bennets were particularly intimate. 
6 The ladies of Longbourn soon waited on those of Netherfield. The visit was soon returned in due fo
7 Mr. Bennet’s property consisted almost entirely in an estate of two thousand a year, which, unfort
8 At five o’clock the two ladies retired to dress, and at half-past six Elizabeth was summoned to di
9 Elizabeth passed the chief of the night in her sister’s room, and in the morning had the pleasure 
10 The day passed much as the day before had done. Mrs. Hurst and Miss Bingley had spent some hours 
11 When the ladies removed after dinner, Elizabeth ran up to her sister, and seeing her well guarded
12 In consequence of an agreement between the sisters, Elizabeth wrote the next morning to their mot
13 “I hope, my dear,” said Mr. Bennet to his wife, as they were at breakfast the next morning, “that
14 During dinner, Mr. Bennet scarcely spoke at all; but when the servants were withdrawn, he thought
15 Mr. Collins was not a sensible man, and the deficiency of nature had been but little assisted by 
16 As no objection was made to the young people’s engagement with their aunt, and all Mr. Collins’s 
17 Elizabeth related to Jane the next day what had passed between Mr. Wickham and herself. Jane list
18 Till Elizabeth entered the drawing-room at Netherfield, and looked in vain for Mr. Wickham among 
19 The next day opened a new scene at Longbourn. Mr. Collins made his declaration in form. Having re
20 Mr. Collins was not left long to the silent contemplation of his successful love; for Mrs. Bennet
21 The discussion of Mr. Collins’s offer was now nearly at an end, and Elizabeth had only to suffer 
22 The Bennets were engaged to dine with the Lucases and again during the chief of the day was Miss 
23 Elizabeth was sitting with her mother and sisters, reflecting on what she had heard, and doubting
24 Miss Bingley’s letter arrived, and put an end to doubt. The very first sentence conveyed the assu
25 After a week spent in professions of love and schemes of felicity, Mr. Collins was called from hi
26 Mrs. Gardiner’s caution to Elizabeth was punctually and kindly given on the first favourable oppo
27 With no greater events than these in the Longbourn family, and otherwise diversified by little be
28 Every object in the next day’s journey was new and interesting to Elizabeth; and her spirits were
29 Mr. Collins’s triumph, in consequence of this invitation, was complete. The power of displaying t
30 Sir William stayed only a week at Hunsford, but his visit was long enough to convince him of his 
31 Colonel Fitzwilliam’s manners were very much admired at the Parsonage, and the ladies all felt th
32 Elizabeth was sitting by herself the next morning, and writing to Jane while Mrs. Collins and Mar
33 More than once did Elizabeth, in her ramble within the park, unexpectedly meet Mr. Darcy. She fel
34 When they were gone, Elizabeth, as if intending to exasperate herself as much as possible against
35 Elizabeth awoke the next morning to the same thoughts and meditations which had at length closed 
36 If Elizabeth, when Mr. Darcy gave her the letter, did not expect it to contain a renewal of his o
37 The two gentlemen left Rosings the next morning, and Mr. Collins having been in waiting near the 
38 On Saturday morning Elizabeth and Mr. Collins met for breakfast a few minutes before the others a
39 It was the second week in May, in which the three young ladies set out together from Gracechurch 
40 Elizabeth’s impatience to acquaint Jane with what had happened could no longer be overcome; and a
41 The first week of their return was soon gone. The second began. It was the last of the regiment’s
42 Had Elizabeth’s opinion been all drawn from her own family, she could not have formed a very plea
43 Elizabeth, as they drove along, watched for the first appearance of Pemberley Woods with some per
44 Elizabeth had settled it that Mr. Darcy would bring his sister to visit her the very day after he
45 Convinced as Elizabeth now was that Miss Bingley’s dislike of her had originated in jealousy, she
46 Elizabeth had been a good deal disappointed in not finding a letter from Jane on their first arri
47 “I have been thinking it over again, Elizabeth,” said her uncle, as they drove from the town; “an
48 The whole party were in hopes of a letter from Mr. Bennet the next morning, but the post came in 
49 Two days after Mr. Bennet’s return, as Jane and Elizabeth were walking together in the shrubbery 
50 Mr. Bennet had very often wished before this period of his life that, instead of spending his who
51 Their sister’s wedding day arrived; and Jane and Elizabeth felt for her probably more than she fe
52 Elizabeth had the satisfaction of receiving an answer to her letter as soon as she possibly could
53 Mr. Wickham was so perfectly satisfied with this conversation that he never again distressed hims
54 As soon as they were gone, Elizabeth walked out to recover her spirits; or in other words, to dwe
55 A few days after this visit, Mr. Bingley called again, and alone. His friend had left him that mo
56 One morning, about a week after Bingley’s engagement with Jane had been formed, as he and the fem
57 The discomposure of spirits which this extraordinary visit threw Elizabeth into, could not be eas
58 Instead of receiving any such letter of excuse from his friend, as Elizabeth half expected Mr. Bi
59 “My dear Lizzy, where can you have been walking to?” was a question which Elizabeth received from
60 Elizabeth’s spirits soon rising to playfulness again, she wanted Mr. Darcy to account for his hav
61 Happy for all her maternal feelings was the day on which Mrs. Bennet got rid of her two most dese

As we learn more we will return to this popular book's unicode which is stored in our data directory as data\pride_and_prejudice.txt.

Let us motivate the Python methods we will see soon by using them below to plot the number of occurences of he and she in each of the 61 chapters of the book.

In [66]:
import re 
heList = []
sheList = []
i = 0
# make a list of chapters
chapterList = re.sub('\\s+', ' ',prideAndPrejudiceRaw).split('Chapter ')[62:]
for chapter in chapterList:
    i = i+1 # increment chanpter count
    content = chapter[0:].lower() # get content as lower-case
    heCount = content.count('he') # count number of 'he' occurrences in the chapter
    sheCount = content.count('she') # count number of 'she' occurrences in the chapter
    heList.append((i,heCount)) # append to heList
    sheList.append((i,sheCount)) # append to sheList
In [67]:
p = points(heList, color='blue',legend_label='he')
p += line(heList, color='blue')
p += points(sheList, color='red',legend_label='she')
p += line(sheList, color='red')
p.axes_labels(['Chapter No.','Frequency'])
p.axes_labels_size(1.0)
p.show(figsize=[8,3])
In [68]:
#p. # uncomment and place the cursor after . and hit Tab button to see options for plotting
In [69]:
type(p) # what we did above is superimposing primitive geometric objects: points and lines
Out[69]:
<class 'sage.plot.graphics.Graphics'>

Assignment Gotcha!

Let's examine the three assignments in the cell below.

The first assignment of x=3 is standard: Python/SageMath chooses a memory location for x and saves the integer value 3 in it.

The second assignment of y=x is more interesting and Pythonic: Instead of finding another location for the variable y and copying the value of 3 in it, Python/SageMath differs from the ways of C/C++. Since both variables will have the same value after the assignment, Python/SageMath lets y point to the memory location of x.

Finally, after the third assignment of y=2, x will be NOT be changed to 2 as because the behavior is not that of a C-pointer. Since x and y will not share the same value anymore, y gets its own memory location, containing 2 and x sticks to the originally assigned value 3.

In [70]:
x=3
print(x) # x is 3
y=x
print(x,y) # x is 3 and y is 
y=2
print(x,y)
3
3 3
3 2

As every instance (object or variable) has an identity or id(), i.e. an integer which is unique within the script or program, we can use id() to understand the above behavior of Python/SageMath assignments.

So, let's have a look at our previous example and see how the identities change with the assignments.

In [71]:
x = 3
print('x and its id() are:')
print(x,id(x))

y = x
print('\ny and its id() are:')
print(y,id(y))

y = 2
print('\nx, y and their id()s are:')
print(x,y,id(x),id(y))
x and its id() are:
3 140637400826480

y and its id() are:
3 140637400826480

x, y and their id()s are:
3 2 140637400826480 140637578236576

Example 6: Truth statements and Boolean values

Consider statements like "Today is Friday" or "2 is greater than 1" or " 1 equals 1": statements which are either true or not true (i.e., false). SageMath has two values, True and False which you'll meet in this situation. These value are called Booleans values, or values of the type Boolean.

In SageMath, we can express statements like "2 is greater than 1" or " 1 equals 1" with relational operators, also known as value comparison operators. Have a look at the list below.

  • < Less than
  • > Greater than
  • <= Less than or equal to
  • >= Greater than or equal to
  • == Equal to.
  • != Not equal to

Lets try some really simple truth statements.

In [72]:
1 < 1          # 1 is less than 1
Out[72]:
False

Let us evaluate the following statement.

In [73]:
1 <= 1  # 1 is less than or equal to 1
Out[73]:
True

We can use these operators on variables as well as on values. Again, try assigning different values to x and y, or try using different operators, if you want to.

In [74]:
x = 1          # assign the value 1 to x             
y = 2          # assign the value 2 to y 
x == y         # evaluate the truth statement "x is equal to y"
Out[74]:
False

Note that when we check if something equals something else, we use ==, a double equals sign. This is because =, a single equals sign, is the assignment operator we talked about above. Therefore, to test if x equals y we can't write x = y because this would assign y to x, instead we use the equality operator == and write x == y.

We can also assign a Boolean value to a variable.

In [75]:
# Using the same x and y as above
myBoolean = (x == y)   # assign the result of x == y to the variable myBoolean
myBoolean              # disclose myBoolean
Out[75]:
False
In [76]:
type(myBoolean)        # check the type of myBoolean
Out[76]:
<class 'bool'>

If we want to check if two things are not equal we use !=. As we would expect, it gives us the opposite of testing for equality:

In [77]:
x != y                 # evaluate the truth statement "x is not equal to y"
Out[77]:
True
In [78]:
print(x,y)             # Let's print x and y to make sure the above statement makes sense 
1 2

You try

Try assigning some values to two variables - you choose what values and you choose what variable names to use. Try some truth statements to check if they are equal, or one is less than the other.

In [ ]:
 

You try

Try some strings (we looked at strings briefly in Example 5 above). Can you check if two strings are equal? Can you check if one string is less than (<) another string. How do you think that Sage is ordering strings (try comparing "fred" and "freddy", for example)?

In [79]:
'raazb' <= 'raaza'
Out[79]:
False
In [80]:
x = [1]
y = x
y[0] = 5
print(x)
print(x,id(x),y,id(y))
[5]
[5] 140637401253512 [5] 140637401253512

Example 7: Mathematical constants

Sage has reserved words that are defined as common mathematical constants. For example, pi and e behave as you expect. Numerical approximations can be obtained using the .n() method, as before.

In [81]:
print( pi, "~", pi.n())   # print a numerical approximation of the mathematical constant pi
print( e, "~", e.n())  # print a numerical approximation of the mathematical constant e
print( I, "~", I.n()) # print a numerical approximation of the imaginary number sqrt(-1)
pi ~ 3.14159265358979
e ~ 2.71828182845905
I ~ 1.00000000000000*I
In [82]:
(pi/e).n(digits=100)   # print the first 100 digits of pi/e
Out[82]:
1.155727349790921717910093183312696299120851023164415820499706535327288631840916939440188434235673559
In [83]:
e^(I*pi)+1          # Euler's identity symbolically - see https://en.wikipedia.org/wiki/Euler%27s_identity
Out[83]:
0

Example 8: SageMath number types and Python number types

We showed how you can find the type of a number value and we demonstrated that by default, SageMath makes 'real' numbers like 3.1 into Sage real literals (sage.rings.real_mpfr.RealLiteral).

If you were just using Python (the programming language underlying most of SageMath) then a value like 3.1 would be a floating point number or float type. Python has some interesting extra operators that you can use with Python floating point numbers, which also work with the Sage rings integer type but not with Sage real literals.

In [84]:
X = 3.1 # convert to default Sage real literal 3.1 
type(X)
Out[84]:
<class 'sage.rings.real_mpfr.RealLiteral'>
In [85]:
X = float(3.1) # convert the default Sage real literal 3.1 to a float 3.1
type(X)
Out[85]:
<class 'float'>

Floor Division (//) - The division of operands where the result is the quotient in which the result is floored, "rounded towards negative infinity: examples: 9//2 = 4 and 9.0//2.0 = 4.0, -11//3 = -4, -11.0//3 = -4.0"

In [86]:
3 // 2 # floor division 
Out[86]:
1
In [87]:
3.3 // 2.0 # this will give error - floor division is undefined for Sage real literals
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/sage/sage/local/lib/python3.7/site-packages/sage/structure/element.pyx in sage.structure.element.Element._floordiv_ (build/cythonized/sage/structure/element.c:13425)()
   1849         try:
-> 1850             python_op = (<object>self)._floordiv_
   1851         except AttributeError:

/home/sage/sage/local/lib/python3.7/site-packages/sage/structure/element.pyx in sage.structure.element.Element.__getattr__ (build/cythonized/sage/structure/element.c:4609)()
    486         """
--> 487         return self.getattr_from_category(name)
    488 

/home/sage/sage/local/lib/python3.7/site-packages/sage/structure/element.pyx in sage.structure.element.Element.getattr_from_category (build/cythonized/sage/structure/element.c:4718)()
    499             cls = P._abstract_element_class
--> 500         return getattr_from_other_class(self, cls, name)
    501 

/home/sage/sage/local/lib/python3.7/site-packages/sage/cpython/getattr.pyx in sage.cpython.getattr.getattr_from_other_class (build/cythonized/sage/cpython/getattr.c:2614)()
    393         dummy_error_message.name = name
--> 394         raise AttributeError(dummy_error_message)
    395     attribute = <object>attr

AttributeError: 'sage.rings.real_mpfr.RealField_class' object has no attribute '__custom_name'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-87-28211c458c9c> in <module>()
----> 1 RealNumber('3.3') // RealNumber('2.0') # this will give error - floor division is undefined for Sage real literals

/home/sage/sage/local/lib/python3.7/site-packages/sage/structure/element.pyx in sage.structure.element.Element.__floordiv__ (build/cythonized/sage/structure/element.c:13204)()
   1815         cdef int cl = classify_elements(left, right)
   1816         if HAVE_SAME_PARENT(cl):
-> 1817             return (<Element>left)._floordiv_(right)
   1818         if BOTH_ARE_ELEMENT(cl):
   1819             return coercion_model.bin_op(left, right, floordiv)

/home/sage/sage/local/lib/python3.7/site-packages/sage/structure/element.pyx in sage.structure.element.Element._floordiv_ (build/cythonized/sage/structure/element.c:13497)()
   1850             python_op = (<object>self)._floordiv_
   1851         except AttributeError:
-> 1852             raise bin_op_exception('//', self, other)
   1853         else:
   1854             return python_op(other)

TypeError: unsupported operand parent(s) for //: 'Real Field with 53 bits of precision' and 'Real Field with 53 bits of precision'
In [ ]:
float(3.5) // float(2.0)

Similarly, we have the light-weight Python integer type int that we may want instead of SageMath integer type for non-mathematical operations.

In [88]:
type(3) # the default Sage rings integer type
Out[88]:
<class 'sage.rings.integer.Integer'>
In [89]:
X = int(3) # conversion to a plain Python integer type
type(X)
Out[89]:
<class 'int'>
In [90]:
3/2 # see the result you get when dividing one default Sage rings integer type by another
Out[90]:
3/2

One of the differences of SageMath rings integers to plain Python integers is that result of dividing one SageMath rings integer by another is a rational. This probably seems very sensible, but it is not what happens at the moment with Python integers.

In [91]:
int(7)/int(2) # division using python integers is "float division" 
Out[91]:
3.5

We showed the .n() method. If X is some Sage real literal and we use X.n(20) we will be asking for 20 bits of precision, which is about how many bits in the computer's memory will be allocated to hold the number. If we ask for X.n(digits=20) will be asking for 20 digits of precision, which is not the same thing. Also note that 20 digits of precision does not mean showing the number to 20 decimal places, it means all the digits including those in front of the decimal point.

In [92]:
help(n) # always ask for help when you need it - or lookup in help menu above
Help on function numerical_approx in module sage.misc.functional:

numerical_approx(x, prec=None, digits=None, algorithm=None)
    Return a numerical approximation of ``self`` with ``prec`` bits
    (or decimal ``digits``) of precision.
    
    No guarantee is made about the accuracy of the result.
    
    .. NOTE::
    
        Lower case :func:`n` is an alias for :func:`numerical_approx`
        and may be used as a method.
    
    INPUT:
    
    - ``prec`` -- precision in bits
    
    - ``digits`` -- precision in decimal digits (only used if
      ``prec`` is not given)
    
    - ``algorithm`` -- which algorithm to use to compute this
      approximation (the accepted algorithms depend on the object)
    
    If neither ``prec`` nor ``digits`` is given, the default
    precision is 53 bits (roughly 16 digits).
    
    EXAMPLES::
    
        sage: numerical_approx(pi, 10)
        3.1
        sage: numerical_approx(pi, digits=10)
        3.141592654
        sage: numerical_approx(pi^2 + e, digits=20)
        12.587886229548403854
        sage: n(pi^2 + e)
        12.5878862295484
        sage: N(pi^2 + e)
        12.5878862295484
        sage: n(pi^2 + e, digits=50)
        12.587886229548403854194778471228813633070946500941
        sage: a = CC(-5).n(prec=40)
        sage: b = ComplexField(40)(-5)
        sage: a == b
        True
        sage: parent(a) is parent(b)
        True
        sage: numerical_approx(9)
        9.00000000000000
    
    You can also usually use method notation::
    
        sage: (pi^2 + e).n()
        12.5878862295484
        sage: (pi^2 + e).numerical_approx()
        12.5878862295484
    
    Vectors and matrices may also have their entries approximated::
    
        sage: v = vector(RDF, [1,2,3])
        sage: v.n()
        (1.00000000000000, 2.00000000000000, 3.00000000000000)
    
        sage: v = vector(CDF, [1,2,3])
        sage: v.n()
        (1.00000000000000, 2.00000000000000, 3.00000000000000)
        sage: _.parent()
        Vector space of dimension 3 over Complex Field with 53 bits of precision
        sage: v.n(prec=20)
        (1.0000, 2.0000, 3.0000)
    
        sage: u = vector(QQ, [1/2, 1/3, 1/4])
        sage: n(u, prec=15)
        (0.5000, 0.3333, 0.2500)
        sage: n(u, digits=5)
        (0.50000, 0.33333, 0.25000)
    
        sage: v = vector(QQ, [1/2, 0, 0, 1/3, 0, 0, 0, 1/4], sparse=True)
        sage: u = v.numerical_approx(digits=4)
        sage: u.is_sparse()
        True
        sage: u
        (0.5000, 0.0000, 0.0000, 0.3333, 0.0000, 0.0000, 0.0000, 0.2500)
    
        sage: A = matrix(QQ, 2, 3, range(6))
        sage: A.n()
        [0.000000000000000  1.00000000000000  2.00000000000000]
        [ 3.00000000000000  4.00000000000000  5.00000000000000]
    
        sage: B = matrix(Integers(12), 3, 8, srange(24))
        sage: N(B, digits=2)
        [0.00  1.0  2.0  3.0  4.0  5.0  6.0  7.0]
        [ 8.0  9.0  10.  11. 0.00  1.0  2.0  3.0]
        [ 4.0  5.0  6.0  7.0  8.0  9.0  10.  11.]
    
    Internally, numerical approximations of real numbers are stored in base-2.
    Therefore, numbers which look the same in their decimal expansion might be
    different::
    
        sage: x=N(pi, digits=3); x
        3.14
        sage: y=N(3.14, digits=3); y
        3.14
        sage: x==y
        False
        sage: x.str(base=2)
        '11.001001000100'
        sage: y.str(base=2)
        '11.001000111101'
    
    Increasing the precision of a floating point number is not allowed::
    
        sage: CC(-5).n(prec=100)
        Traceback (most recent call last):
        ...
        TypeError: cannot approximate to a precision of 100 bits, use at most 53 bits
        sage: n(1.3r, digits=20)
        Traceback (most recent call last):
        ...
        TypeError: cannot approximate to a precision of 70 bits, use at most 53 bits
        sage: RealField(24).pi().n()
        Traceback (most recent call last):
        ...
        TypeError: cannot approximate to a precision of 53 bits, use at most 24 bits
    
    As an exceptional case, ``digits=1`` usually leads to 2 digits (one
    significant) in the decimal output (see :trac:`11647`)::
    
        sage: N(pi, digits=1)
        3.2
        sage: N(pi, digits=2)
        3.1
        sage: N(100*pi, digits=1)
        320.
        sage: N(100*pi, digits=2)
        310.
    
    In the following example, ``pi`` and ``3`` are both approximated to two
    bits of precision and then subtracted, which kills two bits of precision::
    
        sage: N(pi, prec=2)
        3.0
        sage: N(3, prec=2)
        3.0
        sage: N(pi - 3, prec=2)
        0.00
    
    TESTS::
    
        sage: numerical_approx(I)
        1.00000000000000*I
        sage: x = QQ['x'].gen()
        sage: F.<k> = NumberField(x^2+2, embedding=sqrt(CC(2))*CC.0)
        sage: numerical_approx(k)
        1.41421356237309*I
    
        sage: type(numerical_approx(CC(1/2)))
        <type 'sage.rings.complex_number.ComplexNumber'>
    
    The following tests :trac:`10761`, in which ``n()`` would break when
    called on complex-valued algebraic numbers.  ::
    
        sage: E = matrix(3, [3,1,6,5,2,9,7,3,13]).eigenvalues(); E
        [18.16815365088822?, -0.08407682544410650? - 0.2190261484802906?*I, -0.08407682544410650? + 0.2190261484802906?*I]
        sage: E[1].parent()
        Algebraic Field
        sage: [a.n() for a in E]
        [18.1681536508882, -0.0840768254441065 - 0.219026148480291*I, -0.0840768254441065 + 0.219026148480291*I]
    
    Make sure we've rounded up log(10,2) enough to guarantee
    sufficient precision (:trac:`10164`)::
    
        sage: ks = 4*10**5, 10**6
        sage: check_str_length = lambda k: len(str(numerical_approx(1+10**-k,digits=k+1)))-1 >= k+1
        sage: check_precision = lambda k: numerical_approx(1+10**-k,digits=k+1)-1 > 0
        sage: all(check_str_length(k) and check_precision(k) for k in ks)
        True
    
    Testing we have sufficient precision for the golden ratio (:trac:`12163`), note
    that the decimal point adds 1 to the string length::
    
        sage: len(str(n(golden_ratio, digits=5000)))
        5001
        sage: len(str(n(golden_ratio, digits=5000000)))  # long time (4s on sage.math, 2012)
        5000001
    
    Check that :trac:`14778` is fixed::
    
        sage: n(0, algorithm='foo')
        0.000000000000000

In [93]:
X=3.55555555
X.n(digits = 3)
Out[93]:
3.56
In [94]:
X.n(3) # this will use 3 bits of precision
Out[94]:
3.5
In [95]:
round(X,3)
Out[95]:
3.556
In [96]:
#?round # uncomment and evaluate this to open a window with help information that can be closed

If you want to actually round a number to a specific number of decimal places, you can also use the round(...) function.

For deeper dive see documents on Python Numeric Types and SageMath Numeric Types

Sets

Set theory is at the very foundation in modern mathematics and is necessary to understand the mathematical notions of probability and statistics. We will take a practical mathemtical tour of the essential concepts from set theory that a data scientist needs to understand and build probabilistic models from the data using statistical principles.

In [97]:
showURL("https://en.wikipedia.org/wiki/Set_(mathematics)",500)
Out[97]:

Essentials of Set Theory for Probability and Statistics

These are black-board lectures typeset here.

Let us learn or recall elementary set theory. Sets are perhaps the most fundamental concept in mathematics.

Definitions

Set is a collection of distinct elements.

We write a set by enclosing its elements with curly brackets. Let us see some example next.

  • The collection of $\star$ and $\circ$ is $\{\star,\circ\}$.
  • We can name the set $\{\star,\circ\}$ by the letter $A$ and write $$A=\{\star,\circ\}.$$
  • Question: Is $\{\star,\star,\circ\}$ a set?
  • A set of letters and numbers that I like is $\{b,d,6,p,q,9\}$.
  • The set of first five Greek alphabets is $\{\alpha,\beta,\gamma,\delta,\epsilon\}$.

The set that contains no elements is the empty set. It is denoted by $$\boxed{\emptyset = \{\}} \ .$$

We say an element belongs to or does not belong to a set with the binary operators

$$\boxed{\in \ \text{or} \ \notin} \ .$$

For example,

  • $\star \in \{\star,\circ\}$ but the element $\otimes \notin \{\star,\circ\}$
  • $b \in \{b,d,6,p,q,9\}$ but $8 \notin \{b,d,6,p,q,9\}$
  • Question: Is $9 \in \{3,4,1,5,2,8,6,7\}$?

We say a set $C$ is a subset of a set $D$ and write

$$\boxed{C \subset D}$$

if every element of $C$ is also an element of $D$. For example,

  • $\{\star\} \subset \{\star,\circ\}$
  • Question: Is $\{6,9\}\subset \{b,d,6,p,q,9\}$?

Set Operations

We can add distinct new elements to an existing set by union operation denoted by $\cup$ symbol.

For example

  • $\{\circ, \bullet\} \cup \{\star\} = \{\circ,\bullet,\star\}$
  • Question: $\{\circ, \bullet\} \cup \{\bullet\} = \quad$?

More formally, we write the union of two sets $A$ and $B$ as $$\boxed{A \cup B = \{x: x \in A \ \text{or} \ x \in B \}} \ .$$

The symbols above are read as $A$ union $B$ is equal to the set of all $x$ such that $x$ belongs to $A$ or $x$ belongs to $B$ and simply means that $A$ union $B$ or $A \cup B$ is the set of elements that belong to $A$ or $B$.

Similarly, the intersection of two sets $A$ and $B$ written as $$\boxed{A \cap B = \{x: x \in A \ \text{and} \ x \in B \}} $$ means $A$ intersection $B$ is the set of elements that belong to both $A$ and $B$.

For example

  • $\{\circ, \bullet\} \cap \{\circ\} = \{\circ\}$
  • $\{\circ, \bullet\} \cap \{\bullet\} = \{\bullet\}$
  • $\{\circ\} \cap \{a,b,c,d\}=\emptyset$

The set difference of two sets $A$ and $B$ written as

$$\boxed{A \setminus B = \{x: x \in A \ \text{and} \ x \notin B \}} $$

means $A \setminus B$ is the set of elements that belong to $A$ and not belong to $B$.

For example

  • $\{\circ, \bullet\} \setminus \{\circ\} = \{\bullet\}$
  • $\{\circ, \bullet\} \setminus \{\bullet\} = \{\circ\}$
  • $\{a,b,c,d\} \setminus \{a,b,c,d\}=\emptyset$

The equality of two sets $A$ and $B$ is defined in terms of subsets as follows: $$\boxed{A = B \quad \text{if and only if} \quad A \subset B \ \text{and} \ B \subset A} \ .$$

Two sets $A$ anb $B$ are said to be disjoint if $$\boxed{ A \cap B = \emptyset} \ .$$

Given a universal set $\Omega$, we define the complement of a subset $A$ of the universal set by $$\boxed{A^c = \Omega \setminus A = \{x: x \in \Omega \ \text{and} \ x \notin A\}} \ .$$

An Interactive Venn Diagram

Let us gain more intuition by seeing the unions and intersections of sets interactively. The following interact is from interact/misc page of Sage Wiki.

In [98]:
# ignore this code for now and focus on the interact in the output cell
def f(s, braces=True): 
    t = ', '.join(sorted(list(s)))
    if braces: return '{' + t + '}'
    return t
def g(s): return set(str(s).replace(',',' ').split())

@interact
def _(X='1,2,3,a', Y='2,a,3,4,apple', Z='a,b,10,apple'):
    S = [g(X), g(Y), g(Z)]
    X,Y,Z = S
    XY = X & Y
    XZ = X & Z
    YZ = Y & Z
    XYZ = XY & Z
    pretty_print(html('<center>'))
    pretty_print(html("$X \\cap Y$ = %s"%f(XY)))
    pretty_print(html("$X \\cap Z$ = %s"%f(XZ)))
    pretty_print(html("$Y \\cap Z$ = %s"%f(YZ)))
    pretty_print(html("$X \\cap Y \\cap Z$ = %s"%f(XYZ)))
    pretty_print(html('</center>'))
    centers = [(cos(n*2*pi/3), sin(n*2*pi/3)) for n in [0,1,2]]
    scale = 1.7
    clr = ['yellow', 'blue', 'green']
    G = Graphics()
    for i in range(len(S)):
        G += circle(centers[i], scale, rgbcolor=clr[i], 
             fill=True, alpha=0.3)
    for i in range(len(S)):
        G += circle(centers[i], scale, rgbcolor='black')

    # Plot what is in one but neither other
    for i in range(len(S)):
        Z = set(S[i])
        for j in range(1,len(S)):
            Z = Z.difference(S[(i+j)%3])
        G += text(f(Z,braces=False), (1.5*centers[i][0],1.7*centers[i][1]), rgbcolor='black')


    # Plot pairs of intersections
    for i in range(len(S)):
        Z = (set(S[i]) & S[(i+1)%3]) - set(XYZ)
        C = (1.3*cos(i*2*pi/3 + pi/3), 1.3*sin(i*2*pi/3 + pi/3))
        G += text(f(Z,braces=False), C, rgbcolor='black')

    # Plot intersection of all three
    G += text(f(XYZ,braces=False), (0,0), rgbcolor='black')

    # Show it
    G.show(aspect_ratio=1, axes=False)

Create and manipulate sets in SageMath.

Example 0: Lists before Sets

A list is a sequential collection that we will revisit in detail soon. For now, we just need to know that we can create a list by using delimiter , between items and by wrapping with left and right square brackets: [ and ]. For example, the following is a list of 4 integers:

In [99]:
[1,2,3,4]
Out[99]:
[1, 2, 3, 4]
In [100]:
myList = [1,2,3,4]    # we can assign the list to a variable myList
print(myList)         # print myList 
type(myList)          # and ask for its type 
[1, 2, 3, 4]
Out[100]:
<class 'list'>

List is one of the most primitive data structures and has a long history in a popular computer programming language called LISP - originally created as a practical mathematical notation for computer programs.

For now, we just use lists to create sets.

Example 1: Making sets

In SageMath, you do have to specifically say that you want a set when you make it.

Do the following in SageMath/Python to create the mathematical set:

$$X = \{1,2,3,4\}$$
In [101]:
X = set([1, 2, 3, 4])  # make the set X={1,2,3,4} from the List [1,2,3,4]
X                      # disclose X
Out[101]:
{1, 2, 3, 4}
In [102]:
type(X)                # what is the type of X
Out[102]:
<class 'set'>

This is a specialized datatype in Python and more details can be found in Python docs: https://docs.python.org/2/library/datatypes.html

Do the following in SageMath/Python to find out if:

$$4 \in X$$

i.e., if $4$ is an element of $X$ or if $4$ is an element of $X$ or $4$ is in $X$.

In [103]:
4 in X                 # 'is 4 in X?'
Out[103]:
True
In [104]:
5 in X                 # 'is 5 in X?'
Out[104]:
False
In [105]:
Y = set([1, 2])        # make the set Y={1,2}
Y                      # disclose Y
Out[105]:
{1, 2}
In [106]:
4 not in Y             # 'is 4 not in Y?'
Out[106]:
True
In [107]:
1 not in Y             # 'is 1 not in Y?'
Out[107]:
False

We can add new elements to a set.

In [108]:
X.add(5)     # add 5 to the set X
X
Out[108]:
{1, 2, 3, 4, 5}

But remember from the mathematical exposition above that sets contain distinct elements.

In [109]:
X.add(1)     # try adding another 1 to the set X
X
Out[109]:
{1, 2, 3, 4, 5}

You try

Try making the set $Z=\{4,5,6,7\}$ next. The instructions are in the two cells below.

In [110]:
# Write in the expression to make set Z ={4, 5, 6, 7} 
# (press ENTER at the end of this line to get a new line)
Z = set([4,5,6,7]) #([4],[5],[6,7]))
In [111]:
# Check if 4 is in Z 
4 in Z
# (press ENTER at the end of this line to get a new line)
Out[111]:
True

Make a set with the value 2/5 (as a rational) in it. Try adding 0.4 (as a floating point number) to the set.

Does SageMath do what you expect?

In [ ]:
 

Example 2: Subsets

In lectures we talked about subsets of sets.

Recall that Y is a subset of X if each element in Y is also in X.

In [115]:
print ("X is", X)
print ("Y is", Y)
print ("Is Y a subset of X?")
Y <= X                       # 'is Y a subset of X?'
X is {1, 2, 3, 4, 5}
Y is {1, 2}
Is Y a subset of X?
Out[115]:
True

If you have time: We say Y is a proper subset of X if all the elements in Y are also in X but there is at least one element in X that it is not in Y. If X is a (proper) subset of Y, then we also say that Y is a (proper) superset of X.

In [120]:
X < X     # 'is X a proper subset of itself?'
Out[120]:
False
In [121]:
X > Y     # 'is X a proper superset of Y?'
Out[121]:
True
In [122]:
X > X     # 'is X a proper superset of itself?'
Out[122]:
False
In [123]:
X >= Y     # 'is X a superset of Y?' is the same as 'is Y a subset of X?'
Out[123]:
True

Example 3: More set operations

Now let's have a look at the other set operations we talked about above: intersection, union, and difference.

Recall that the intersection of X and Y is the set of elements that are in both X and Y.

In [124]:
X & Y    # '&' is the intersection operator
Out[124]:
{1, 2}

The union of X and Y is the set of elements that are in either X or Y.

In [125]:
X | Y    # '|' is the union operator
Out[125]:
{1, 2, 3, 4, 5}

The set difference between X and Y is the set of elements in X that are not in Y.

In [126]:
X - Y    # '-' is the set difference operator
Out[126]:
{3, 4, 5}

You try

Try some more work with sets of strings below.

In [127]:
fruit = set(['orange', 'banana', 'apple'])
fruit
Out[127]:
{'apple', 'banana', 'orange'}
In [128]:
colours = set(['red', 'green', 'blue', 'orange'])
colours
Out[128]:
{'blue', 'green', 'orange', 'red'}

Fruit and colours are different to us as people, but to the computer, the string 'orange' is just the string 'orange' whether it is in a set called fruit or a set called colours.

In [131]:
print ("fruit intersection colours is", fruit & colours)
print ("fruit union colours is", fruit | colours)    
print ("fruit - colours is", fruit - colours)    
print ("colours - fruit is", colours - fruit)
fruit intersection colours is {'orange'}
fruit union colours is {'banana', 'red', 'blue', 'green', 'apple', 'orange'}
fruit - colours is {'banana', 'apple'}
colours - fruit is {'red', 'green', 'blue'}

Try a few other simple subset examples - make up your own sets and try some intersections, unions, and set difference operations. The best way to try new possible operations on a set such as X we just created is to type a period after X and hit <TAB> key. THis will bring up all the possible methods you can call on the set X.

In [132]:
mySet = set([1,2,3,4,5,6,7,8,9])
In [ ]:
#mySet.         # uncomment and try placing the cursor after the dot and hit <TAB> key
In [ ]:
 
In [ ]:
#?mySet.add     # uncomment and evaluate to get help on a method by prepending a question mark
In [ ]:
 

Infact, there are two ways to make sets in SageMath. We have so far used the python set to make a set.

However we can use the SageMath Set to maka sets too. SageMath Set is more mathematically consisitent. If you are interested in the SageMath Set go to the source and work through the SageMath reference on Sets.

But, first let us appreciate the difference between Python set and SageMath Set!

In [133]:
X = set([1, 2, 3, 4])  # make the set X={1,2,3,4} with python set
X                      # disclose X
Out[133]:
{1, 2, 3, 4}
In [134]:
type(X)                # this is the set in python
Out[134]:
<class 'set'>
In [135]:
anotherX = Set([1, 2, 3, 4])   # make the set anotherX={1,2,3,4} in SAGE Set
anotherX #disclose it
Out[135]:
{1, 2, 3, 4}
In [136]:
type(anotherX)                 # this is the set in SAGE and is more mathy
Out[136]:
<class 'sage.sets.set.Set_object_enumerated_with_category'>
In [137]:
anotherX. # see what methods are available in SageMath Set
  File "<ipython-input-137-192c5f49ee3a>", line 1
    anotherX. # see what methods are available in SageMath Set
                                                              ^
SyntaxError: invalid syntax

Example 4

Python also provides something called a frozenset, which you can't change like an ordinary set.

In [138]:
aFrozenSet = frozenset([2/5, 0.2, 1/7, 0.1])
aFrozenSet
Out[138]:
frozenset({0.100000000000000, 1/7, 0.200000000000000, 2/5})
In [139]:
aFrozenSet.add(0.3) # This should give an error
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-139-428bebd0abe4> in <module>()
----> 1 aFrozenSet.add(RealNumber('0.3')) # This should give an error

AttributeError: 'frozenset' object has no attribute 'add'

Under the Set's Hood

The key is to remember that sets are unordered: a set {1, 2, 3} is the same as the set {2, 1, 3} is the same as the set {3, 1, 2} ... (soon we will talk about ordered collections like lists where order is important - but for the moment just remember that a set is about collections of unique values or objects). Remember also that the (lower case s) set is a python type (in contrast to the 'more mathy' SageMath Set with a capital S). For many 'non-mathy' computing purposes, what matters about a set is the speed of being able to find things in the set without making it expensive (in computer power) to add and remove things, and the best way of doing this (invisible to the programmer actually using the set) is to base the set on a hash table. In practical terms what this means is that the ordering that Sage uses to actually display a set is related to the hash values it has given to the elements in the set which may be totally unrelated to what you or I would think of as any obvious ordering based on the magnitude of the values in the set, or the order in which we put them in, etc. Remember, you don't need to know anything about hash values and the 'under-the-hood' construction of sets for this course - this explanation is just to give a bit more background.

Using sets to find the words with 15 or more characters in P&P

In [140]:
# This time we are loading the file from our data directory
with open('data/pride_and_prejudice.txt', 'r') as myfile:
  prideAndPrejudiceRaw = myfile.read()

import re 
bigWordsSet = set([]) # start an empty set
# make a list of chapters
chapterList = re.sub('\\s+', ' ',prideAndPrejudiceRaw).split('Chapter ')[1:]
for chapter in chapterList:
    content = (chapter[0:]).lower() # get content as lower-case
    bigWordsSet = bigWordsSet | \
                    set([x for x in content.split(' ') if (len(x)>14 and x.isalpha())])
    # union with list-comprehended set - soon you will get this!
In [141]:
bigWordsSet # set of big words with > 14 characters in any chapter of P&P
Out[141]:
{'accomplishments',
 'acknowledgments',
 'communicativeness',
 'condescendingly',
 'congratulations',
 'conscientiously',
 'disappointments',
 'discontentedness',
 'disinterestedness',
 'inconsistencies',
 'merchantibility',
 'misrepresentation',
 'recommendations',
 'representations',
 'superciliousness',
 'thoughtlessness',
 'uncompanionable',
 'unenforceability'}


Note that there are much more convenient methods to read text files that have been structured into a certain form. You are purposely being keept much closer to the raw data here. Starting directly with convenient high-level methods is not a good way to appreciate wrangling data closer to the source. Furthermore, it is easier to pick up the high-level methods later on.