{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false }, "source": [ "# [Introduction to Data Science](http://datascience-intro.github.io/1MS041-2023/) \n", "## 1MS041, 2023 \n", "©2023 Raazesh Sainudiin, Benny Avelin. [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# CountVectorizer and TFIDFVectorizer" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [], "source": [ "from sklearn.feature_extraction.text import CountVectorizer" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "X_test = np.array(['test of stuff','something of test','stuff of something'])" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [], "source": [ "cv = CountVectorizer()" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
CountVectorizer()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
CountVectorizer()
TfidfVectorizer()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
TfidfVectorizer()
TfidfVectorizer()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
TfidfVectorizer()
LogisticRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LogisticRegression()
\n", " | mpg | \n", "cylinders | \n", "displacement | \n", "horsepower | \n", "weight | \n", "acceleration | \n", "model-year | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "18.0 | \n", "8 | \n", "307.0 | \n", "130.0 | \n", "3504 | \n", "12.0 | \n", "70 | \n", "
1 | \n", "15.0 | \n", "8 | \n", "350.0 | \n", "165.0 | \n", "3693 | \n", "11.5 | \n", "70 | \n", "
2 | \n", "18.0 | \n", "8 | \n", "318.0 | \n", "150.0 | \n", "3436 | \n", "11.0 | \n", "70 | \n", "
3 | \n", "16.0 | \n", "8 | \n", "304.0 | \n", "150.0 | \n", "3433 | \n", "12.0 | \n", "70 | \n", "
4 | \n", "17.0 | \n", "8 | \n", "302.0 | \n", "140.0 | \n", "3449 | \n", "10.5 | \n", "70 | \n", "