{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false }, "source": [ "# [Introduction to Data Science](http://datascience-intro.github.io/1MS041-2023/) \n", "## 1MS041, 2023 \n", "©2023 Raazesh Sainudiin, Benny Avelin. [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other measurements of performance\n", "\n", "Recall that in the logistic regression case our function $G(x) \\in [0,1]$ and represents the probability of the label being $1$, we then used the following rule to construct a decision function from this $G$, i.e. \n", "$$\n", " g(x) = \n", " \\begin{cases}\n", " 1, & \\text{if } G(x) > 1/2 \\\\\n", " 0, & \\text{otherwise.}\n", " \\end{cases}\n", "$$\n", "The parameter $1/2$ can be changed in order to create a trade-off between precision and recall.\n", "\n", "Lets consider the function\n", "$$\n", " g_\\alpha(x) = \n", " \\begin{cases}\n", " 1, & \\text{if } G(x) > \\alpha \\\\\n", " 0, & \\text{otherwise.}\n", " \\end{cases}\n", "$$\n", "where $\\alpha \\in [0,1]$, then for each such $\\alpha$ we get a precision and recall, i.e.\n", "$$\n", " \\begin{aligned}\n", " \\text{Precision:} \\quad \\text{Pr}(\\alpha) = P(Y = 1 \\mid g_\\alpha(X) = 1) \\\\\n", " \\text{Recall:} \\quad \\text{Re} (\\alpha) = P(g_\\alpha(X) = 1 \\mid Y = 1).\n", " \\end{aligned}\n", "$$\n", "\n", "These functions can be plotted as functions of $\\alpha$, we can see that below" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
SVC(kernel='linear', probability=True)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "SVC(kernel='linear', probability=True)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.datasets import load_digits\n", "import matplotlib.pyplot as plt\n", "from sklearn.datasets import load_digits\n", "digits = load_digits()\n", "\n", "from sklearn.svm import SVC\n", "\n", "labels = digits['target'] >= 5\n", "\n", "X = digits['data']\n", "\n", "from sklearn.model_selection import train_test_split\n", "X_train, X_test, Y_train, Y_test = train_test_split(X,labels)\n", "\n", "per = SVC(kernel='linear',probability=True)\n", "\n", "per.fit(X_train,Y_train)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
SVC(probability=True)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "SVC(probability=True)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "per_rbf = SVC(kernel='rbf',probability=True)\n", "\n", "per_rbf.fit(X_train,Y_train)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from Utils import classification_report_interval" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " labels precision recall\n", "\n", " False 0.89 : [0.77,1.00] 0.86 : [0.74,0.99]\n", " True 0.87 : [0.74,0.99] 0.90 : [0.77,1.00]\n", "\n", " accuracy 0.88 : [0.79,0.97]\n", "\n" ] } ], "source": [ "print(classification_report_interval(Y_test,per.predict(X_test)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "per.predict_proba(X_test)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "from sklearn.metrics import precision_recall_curve\n", "prec,rec,thresh = precision_recall_curve(Y_test,per.predict_proba(X_test)[:,1])\n", "prec_rbf,rec_rbf,thresh_rbf = precision_recall_curve(Y_test,per_rbf.predict_proba(X_test)[:,1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tradeoff between precision and recall" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "plt.plot(thresh,prec[:-1])\n", "plt.plot(thresh,rec[:-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Precision and recall curve\n", "\n", "It is customary to plot the precision as a function of recall into the so called precision and recall curve" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.0, 1.0)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(rec,prec)\n", "plt.plot(rec_rbf,prec_rbf)\n", "plt.xlim(0,1)\n", "plt.ylim(0,1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Average precision\n", "$$\n", " p(r) = \\text{Pr}(\\text{Re}^{-1}(r))\n", "$$\n", "then $p(r)$ is the curve above where the $x$-axis is $r$ (recall) and the $y$ axis is precision.\n", "Average precision is thus just\n", "$$\n", " \\text{AP} = \\int_0^1 p(r) dr\n", "$$\n", "or also called Area Under the Precision Recall Curve." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we call $\\mathbb{P}(g_\\alpha(X) = 1) = t$, the detection level, where $t(1) = 0$ and $t(0) = 1$. Then\n", "$$\n", " p(t) = \\text{Pr}(\\alpha) = \\mathbb{P}(Y=1 \\mid g_\\alpha(X)=1) = Re(\\alpha) \\frac{\\mathbb{P}(Y = 1)}{\\mathbb{P}(g_\\alpha(X) = 1)}\n", "$$\n", "and\n", "$$\n", " r(t) = \\text{Re}(\\alpha)\n", "$$\n", "then we get\n", "$$\n", " p(t) = \\frac{\\mathbb{P}(Y = 1) r(t)}{t}\n", "$$\n", "\n", "This equation captures the trade-off between precision and recall as functions of the detection level.\n", "\n", "It may look like average precision is a strange quantity, and indeed it is. See for instance the review paper on the course website." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9481477402057775" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import average_precision_score\n", "average_precision_score(Y_test,per.predict_proba(X_test)[:,1])" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9950172870414039" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "average_precision_score(Y_test,per_rbf.predict_proba(X_test)[:,1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reciever Operating Characteristic (ROC)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets consider Recall for the $0$ class vs Recall for the $1$ class\n", "$$\n", " \\text{FPR}(\\alpha) = \\mathbb{P}(g(X)=1 \\mid Y = 0) \\quad \\text{also goes by the name false positive rate}\n", "$$\n", "$$\n", " \\text{Re}(\\alpha) = \\mathbb{P}(g(X)=1 \\mid Y = 1) \\quad \\text{also goes by the name true positive rate}\n", "$$\n", "\n", "We can plot these using sklearn as follows" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "from sklearn.metrics import roc_curve\n", "fpr,tpr,thresholds = roc_curve(Y_test,per.predict_proba(X_test)[:,1])\n", "fpr_rbf,tpr_rbf,thresholds_rbf = roc_curve(Y_test,per_rbf.predict_proba(X_test)[:,1])" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(thresholds,fpr)\n", "plt.plot(thresholds,tpr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However more common is to consider plotting them against eachother" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(fpr,tpr)\n", "plt.plot(fpr_rbf,tpr_rbf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the plot of $\\text{Re}(\\text{FPR}^{-1}(r))$.\n", "\n", "There is also the AUC, which is Area Under the Curve, which is defined as\n", "$$\n", " \\int_0^1 \\text{Re}(\\text{FPR}^{-1}(r)) dr = -\\int_0^1 \\text{Re}(\\alpha) \\text{FPR}'(\\alpha) d\\alpha\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both the AP (Average Precision) and the (AUC) is used as a single performance metric of a classifier." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let $Z = G(X)$, where $G$ is the predicted probability, then $Z$ has density $F_Z$. Let $F_{Z,Y}$ be the joint distribution of $Z$ and $Y$\n", "\n", "Let\n", "$$\n", " \\begin{aligned}\n", " f_1(z) &= f_{Z \\mid Y=1} \\\\\n", " f_0(z) &= f_{Z \\mid Y=0}\n", " \\end{aligned}\n", "$$\n", "thus we simply get\n", "$$\n", " \\begin{aligned}\n", " \\text{FPR}(\\alpha) = \\int_{\\alpha}^1 f_0(z)dz \\\\\n", " \\text{Re}(\\alpha) = \\int_{\\alpha}^1 f_1(z)dz\n", " \\end{aligned}\n", "$$\n", "As such \n", "$$\n", " \\text{Re}'(\\alpha) = -f_1(\\alpha)\n", "$$\n", "and we can write\n", "$$\n", " -\\int_0^1 \\text{Re}(\\alpha) \\text{FPR}'(\\alpha) d\\alpha = \\int_0^1 \\int_z^1 f_1(z') f_0(z) dz dz'\n", "$$\n", "Consider $Z_1$ be a random variable sampled from $f_1$ and $Z_0$ be sampled from $f_0$. Then we can write the above as\n", "$$\n", " \\mathbb{P}(Z_1 > Z_0) = \\int_0^1 \\int_z^1 f_1(z') f_0(z) dz dz'\n", "$$\n", "It is useful to think about what this probability means. That is, if we take a randomly chosen sample from the positive class and call it $X_1$ and do the same with class $0$ and call that $X_0$, then the AUC is the probability that $G(X_1) > G(X_0)$." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "lx_course_instance": "2023", "lx_course_name": "Introduction to Data Science", "lx_course_number": "1MS041" }, "nbformat": 4, "nbformat_minor": 5 }