{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false }, "source": [ "# [Introduction to Data Science](http://datascience-intro.github.io/1MS041-2023/) \n", "## 1MS041, 2023 \n", "©2023 Raazesh Sainudiin, Benny Avelin. [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_diabetes" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "dataset = load_diabetes()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".. _diabetes_dataset:\n", "\n", "Diabetes dataset\n", "----------------\n", "\n", "Ten baseline variables, age, sex, body mass index, average blood\n", "pressure, and six blood serum measurements were obtained for each of n =\n", "442 diabetes patients, as well as the response of interest, a\n", "quantitative measure of disease progression one year after baseline.\n", "\n", "**Data Set Characteristics:**\n", "\n", " :Number of Instances: 442\n", "\n", " :Number of Attributes: First 10 columns are numeric predictive values\n", "\n", " :Target: Column 11 is a quantitative measure of disease progression one year after baseline\n", "\n", " :Attribute Information:\n", " - age age in years\n", " - sex\n", " - bmi body mass index\n", " - bp average blood pressure\n", " - s1 tc, total serum cholesterol\n", " - s2 ldl, low-density lipoproteins\n", " - s3 hdl, high-density lipoproteins\n", " - s4 tch, total cholesterol / HDL\n", " - s5 ltg, possibly log of serum triglycerides level\n", " - s6 glu, blood sugar level\n", "\n", "Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times the square root of `n_samples` (i.e. the sum of squares of each column totals 1).\n", "\n", "Source URL:\n", "https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html\n", "\n", "For more information see:\n", "Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) \"Least Angle Regression,\" Annals of Statistics (with discussion), 407-499.\n", "(https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)\n", "\n" ] } ], "source": [ "print(dataset.DESCR)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "X,Y = load_diabetes(return_X_y=True)\n", "X_train,X_test,Y_train,Y_test = train_test_split(X,Y,random_state=0)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(331, 10) (111, 10) (331,) (111,)\n" ] } ], "source": [ "print(X_train.shape,X_test.shape,Y_train.shape,Y_test.shape)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
\n", " | fixed acidity | \n", "volatile acidity | \n", "citric acid | \n", "residual sugar | \n", "chlorides | \n", "free sulfur dioxide | \n", "total sulfur dioxide | \n", "density | \n", "pH | \n", "sulphates | \n", "alcohol | \n", "quality | \n", "type | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "7.4 | \n", "0.70 | \n", "0.00 | \n", "1.9 | \n", "0.076 | \n", "11.0 | \n", "34.0 | \n", "0.9978 | \n", "3.51 | \n", "0.56 | \n", "9.4 | \n", "5 | \n", "1 | \n", "
1 | \n", "7.8 | \n", "0.88 | \n", "0.00 | \n", "2.6 | \n", "0.098 | \n", "25.0 | \n", "67.0 | \n", "0.9968 | \n", "3.20 | \n", "0.68 | \n", "9.8 | \n", "5 | \n", "1 | \n", "
2 | \n", "7.8 | \n", "0.76 | \n", "0.04 | \n", "2.3 | \n", "0.092 | \n", "15.0 | \n", "54.0 | \n", "0.9970 | \n", "3.26 | \n", "0.65 | \n", "9.8 | \n", "5 | \n", "1 | \n", "
3 | \n", "11.2 | \n", "0.28 | \n", "0.56 | \n", "1.9 | \n", "0.075 | \n", "17.0 | \n", "60.0 | \n", "0.9980 | \n", "3.16 | \n", "0.58 | \n", "9.8 | \n", "6 | \n", "1 | \n", "
4 | \n", "7.4 | \n", "0.70 | \n", "0.00 | \n", "1.9 | \n", "0.076 | \n", "11.0 | \n", "34.0 | \n", "0.9978 | \n", "3.51 | \n", "0.56 | \n", "9.4 | \n", "5 | \n", "1 | \n", "
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()