Introduction to Data Science ¶

1MS041, 2023¶

Other measurements of performance¶

Recall that in the logistic regression case our function $G(x) \in [0,1]$ and represents the probability of the label being $1$, we then used the following rule to construct a decision function from this $G$, i.e. $$ g(x) = \begin{cases} 1, & \text{if } G(x) > 1/2 \\ 0, & \text{otherwise.} \end{cases} $$ The parameter $1/2$ can be changed in order to create a trade-off between precision and recall.

Lets consider the function $$ g_\alpha(x) = \begin{cases} 1, & \text{if } G(x) > \alpha \\ 0, & \text{otherwise.} \end{cases} $$ where $\alpha \in [0,1]$, then for each such $\alpha$ we get a precision and recall, i.e. $$ \begin{aligned} \text{Precision:} \quad \text{Pr}(\alpha) = P(Y = 1 \mid g_\alpha(X) = 1) \\ \text{Recall:} \quad \text{Re} (\alpha) = P(g_\alpha(X) = 1 \mid Y = 1). \end{aligned} $$

These functions can be plotted as functions of $\alpha$, we can see that below

from sklearn.datasets import load_digits
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
digits = load_digits()

from sklearn.svm import SVC

labels = digits['target'] >= 5

X = digits['data']

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X,labels)

per = SVC(kernel='linear',probability=True)

per.fit(X_train,Y_train)

SVC(kernel='linear', probability=True)

SVC(kernel='linear', probability=True)

per_rbf = SVC(kernel='rbf',probability=True)

per_rbf.fit(X_train,Y_train)

SVC(probability=True)

SVC(probability=True)

from Utils import classification_report_interval

print(classification_report_interval(Y_test,per.predict(X_test)))

            labels           precision             recall

             False  0.89 : [0.77,1.00] 0.86 : [0.74,0.99]
              True  0.87 : [0.74,0.99] 0.90 : [0.77,1.00]

          accuracy                                        0.88 : [0.79,0.97]

per.predict_proba(X_test)

from sklearn.metrics import precision_recall_curve
prec,rec,thresh = precision_recall_curve(Y_test,per.predict_proba(X_test)[:,1])
prec_rbf,rec_rbf,thresh_rbf = precision_recall_curve(Y_test,per_rbf.predict_proba(X_test)[:,1])

Tradeoff between precision and recall¶

import matplotlib.pyplot as plt
plt.plot(thresh,prec[:-1])
plt.plot(thresh,rec[:-1])

[<matplotlib.lines.Line2D at 0x168cf3970>]

Precision and recall curve¶

It is customary to plot the precision as a function of recall into the so called precision and recall curve

plt.plot(rec,prec)
plt.plot(rec_rbf,prec_rbf)
plt.xlim(0,1)
plt.ylim(0,1)

(0.0, 1.0)

Average precision¶

$$ p(r) = \text{Pr}(\text{Re}^{-1}(r)) $$

then $p(r)$ is the curve above where the $x$-axis is $r$ (recall) and the $y$ axis is precision. Average precision is thus just $$ \text{AP} = \int_0^1 p(r) dr $$ or also called Area Under the Precision Recall Curve.

If we call $\mathbb{P}(g_\alpha(X) = 1) = t$, the detection level, where $t(1) = 0$ and $t(0) = 1$. Then $$ p(t) = \text{Pr}(\alpha) = \mathbb{P}(Y=1 \mid g_\alpha(X)=1) = Re(\alpha) \frac{\mathbb{P}(Y = 1)}{\mathbb{P}(g_\alpha(X) = 1)} $$ and $$ r(t) = \text{Re}(\alpha) $$ then we get $$ p(t) = \frac{\mathbb{P}(Y = 1) r(t)}{t} $$

This equation captures the trade-off between precision and recall as functions of the detection level.

It may look like average precision is a strange quantity, and indeed it is. See for instance the review paper on the course website.

from sklearn.metrics import average_precision_score
average_precision_score(Y_test,per.predict_proba(X_test)[:,1])

0.9481477402057775

average_precision_score(Y_test,per_rbf.predict_proba(X_test)[:,1])

0.9950172870414039

Reciever Operating Characteristic (ROC)¶

Lets consider Recall for the $0$ class vs Recall for the $1$ class $$ \text{FPR}(\alpha) = \mathbb{P}(g(X)=1 \mid Y = 0) \quad \text{also goes by the name false positive rate} $$ $$ \text{Re}(\alpha) = \mathbb{P}(g(X)=1 \mid Y = 1) \quad \text{also goes by the name true positive rate} $$

We can plot these using sklearn as follows

from sklearn.metrics import roc_curve
fpr,tpr,thresholds = roc_curve(Y_test,per.predict_proba(X_test)[:,1])
fpr_rbf,tpr_rbf,thresholds_rbf = roc_curve(Y_test,per_rbf.predict_proba(X_test)[:,1])

plt.plot(thresholds,fpr)
plt.plot(thresholds,tpr)

[<matplotlib.lines.Line2D at 0x1690fa490>]

However more common is to consider plotting them against eachother

plt.plot(fpr,tpr)
plt.plot(fpr_rbf,tpr_rbf)

[<matplotlib.lines.Line2D at 0x1692daa60>]

This is the plot of $\text{Re}(\text{FPR}^{-1}(r))$.

There is also the AUC, which is Area Under the Curve, which is defined as $$ \int_0^1 \text{Re}(\text{FPR}^{-1}(r)) dr = -\int_0^1 \text{Re}(\alpha) \text{FPR}'(\alpha) d\alpha $$

Both the AP (Average Precision) and the (AUC) is used as a single performance metric of a classifier.

Let $Z = G(X)$, where $G$ is the predicted probability, then $Z$ has density $F_Z$. Let $F_{Z,Y}$ be the joint distribution of $Z$ and $Y$

Let $$ \begin{aligned} f_1(z) &= f_{Z \mid Y=1} \\ f_0(z) &= f_{Z \mid Y=0} \end{aligned} $$ thus we simply get $$ \begin{aligned} \text{FPR}(\alpha) = \int_{\alpha}^1 f_0(z)dz \\ \text{Re}(\alpha) = \int_{\alpha}^1 f_1(z)dz \end{aligned} $$ As such $$ \text{Re}'(\alpha) = -f_1(\alpha) $$ and we can write $$ -\int_0^1 \text{Re}(\alpha) \text{FPR}'(\alpha) d\alpha = \int_0^1 \int_z^1 f_1(z') f_0(z) dz dz' $$ Consider $Z_1$ be a random variable sampled from $f_1$ and $Z_0$ be sampled from $f_0$. Then we can write the above as $$ \mathbb{P}(Z_1 > Z_0) = \int_0^1 \int_z^1 f_1(z') f_0(z) dz dz' $$ It is useful to think about what this probability means. That is, if we take a randomly chosen sample from the positive class and call it $X_1$ and do the same with class $0$ and call that $X_0$, then the AUC is the probability that $G(X_1) > G(X_0)$.

Introduction to Data Science¶