{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "CvmiakWDcuyf" }, "source": [ "# Лабораторная работа №2\n", "## по предмету \"Системы искусственного интеллекта\"\n", "\n", "В данной лабораторной работе вы будете работать с набором данных, который содержит информацию о технических характеристиках ноутбуков и их цену.\n", "Целью работы является изучение теоретических основ методов машинного обучения.\n", "\n", "В наборе данных для лабораторной работы содержится абор характеристик мобильных телефонов, включая мощность аккумулятора, характеристики камеры, поддержку сети, память, размеры экрана и другие атрибуты. Столбец «price_range» классифицирует телефоны по ценовым диапазонам (этот столбец необходимо предсказать)." ] }, { "cell_type": "markdown", "metadata": { "id": "YEZ0T1uwj34v" }, "source": [ "### Задание 1\n", "\n", "Выгрузите данные из датасета. Изучите колонки, проверьте наличие пропусков. Постройте матрицу корреляции между признаками и целевой переменной. Сделайте выводы, что показывает эта матрица." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "91NHysjQj26f" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
battery_powerblueclock_speeddual_simfcfour_gint_memorym_depmobile_wtn_cores...px_heightpx_widthramsc_hsc_wtalk_timethree_gtouch_screenwifiprice_range
084202.201070.61882...20756254997190011
1102110.5101530.71363...9051988263117371102
256310.5121410.91455...12631716260311291102
361512.5000100.81316...121617862769168111002
4182111.20131440.61412...12081212141182151101
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " battery_power blue clock_speed dual_sim fc four_g int_memory m_dep \\\n", "0 842 0 2.2 0 1 0 7 0.6 \n", "1 1021 1 0.5 1 0 1 53 0.7 \n", "2 563 1 0.5 1 2 1 41 0.9 \n", "3 615 1 2.5 0 0 0 10 0.8 \n", "4 1821 1 1.2 0 13 1 44 0.6 \n", "\n", " mobile_wt n_cores ... px_height px_width ram sc_h sc_w talk_time \\\n", "0 188 2 ... 20 756 2549 9 7 19 \n", "1 136 3 ... 905 1988 2631 17 3 7 \n", "2 145 5 ... 1263 1716 2603 11 2 9 \n", "3 131 6 ... 1216 1786 2769 16 8 11 \n", "4 141 2 ... 1208 1212 1411 8 2 15 \n", "\n", " three_g touch_screen wifi price_range \n", "0 0 0 1 1 \n", "1 1 1 0 2 \n", "2 1 1 0 2 \n", "3 1 0 0 2 \n", "4 1 1 0 1 \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv('AIS2.csv')\n", "\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "battery_power 0\n", "blue 0\n", "clock_speed 0\n", "dual_sim 0\n", "fc 0\n", "four_g 0\n", "int_memory 0\n", "m_dep 0\n", "mobile_wt 0\n", "n_cores 0\n", "pc 0\n", "px_height 0\n", "px_width 0\n", "ram 0\n", "sc_h 0\n", "sc_w 0\n", "talk_time 0\n", "three_g 0\n", "touch_screen 0\n", "wifi 0\n", "price_range 0\n", "dtype: int64" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "battery_power 0\n", "blue 0\n", "clock_speed 0\n", "dual_sim 0\n", "fc 0\n", "four_g 0\n", "int_memory 0\n", "m_dep 0\n", "mobile_wt 0\n", "n_cores 0\n", "pc 0\n", "px_height 0\n", "px_width 0\n", "ram 0\n", "sc_h 0\n", "sc_w 0\n", "talk_time 0\n", "three_g 0\n", "touch_screen 0\n", "wifi 0\n", "price_range 0\n", "dtype: int64" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "battery_power 0\n", "blue 1010\n", "clock_speed 0\n", "dual_sim 981\n", "fc 474\n", "four_g 957\n", "int_memory 0\n", "m_dep 0\n", "mobile_wt 0\n", "n_cores 0\n", "pc 101\n", "px_height 2\n", "px_width 0\n", "ram 0\n", "sc_h 0\n", "sc_w 180\n", "talk_time 0\n", "three_g 477\n", "touch_screen 994\n", "wifi 986\n", "price_range 500\n", "dtype: int64" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(df.isnull().sum())\n", "display(df.isna().sum())\n", "display(df.eq(0).sum())" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "price_range 1.000000\n", "ram 0.917046\n", "battery_power 0.200723\n", "px_width 0.165818\n", "px_height 0.148858\n", "int_memory 0.044435\n", "sc_w 0.038711\n", "pc 0.033599\n", "three_g 0.023611\n", "sc_h 0.022986\n", "fc 0.021998\n", "talk_time 0.021859\n", "blue 0.020573\n", "wifi 0.018785\n", "dual_sim 0.017444\n", "four_g 0.014772\n", "n_cores 0.004399\n", "m_dep 0.000853\n", "clock_speed -0.006606\n", "mobile_wt -0.030302\n", "touch_screen -0.030411\n", "Name: price_range, dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.corr()['price_range'].sort_values(ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Ch5WytHwlGpd" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
battery_powerblueclock_speeddual_simfcfour_gint_memorym_depmobile_wtn_corespcpx_heightpx_widthramsc_hsc_wtalk_timethree_gtouch_screenwifi
084202.201070.6188222075625499719001
1102110.5101530.713636905198826311737110
256310.5121410.9145561263171626031129110
361512.5000100.81316912161786276916811100
4182111.20131440.61412141208121214118215110
\n", "
" ], "text/plain": [ " battery_power blue clock_speed dual_sim fc four_g int_memory m_dep \\\n", "0 842 0 2.2 0 1 0 7 0.6 \n", "1 1021 1 0.5 1 0 1 53 0.7 \n", "2 563 1 0.5 1 2 1 41 0.9 \n", "3 615 1 2.5 0 0 0 10 0.8 \n", "4 1821 1 1.2 0 13 1 44 0.6 \n", "\n", " mobile_wt n_cores pc px_height px_width ram sc_h sc_w talk_time \\\n", "0 188 2 2 20 756 2549 9 7 19 \n", "1 136 3 6 905 1988 2631 17 3 7 \n", "2 145 5 6 1263 1716 2603 11 2 9 \n", "3 131 6 9 1216 1786 2769 16 8 11 \n", "4 141 2 14 1208 1212 1411 8 2 15 \n", "\n", " three_g touch_screen wifi \n", "0 0 0 1 \n", "1 1 1 0 \n", "2 1 1 0 \n", "3 1 0 0 \n", "4 1 1 0 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "array([[ 1.38919326, 1.011314 , -1.25918898, ..., -1.78870765,\n", " -1.00878862, -1.02532046],\n", " [ 0.07840603, 1.011314 , -1.25918898, ..., -1.78870765,\n", " -1.00878862, 0.97530483],\n", " [-1.02457347, -0.98881258, 1.18747125, ..., 0.55906285,\n", " 0.99128795, 0.97530483],\n", " ...,\n", " [ 1.10374308, -0.98881258, 0.5758062 , ..., 0.55906285,\n", " 0.99128795, 0.97530483],\n", " [-1.21867959, 1.011314 , 1.43213728, ..., 0.55906285,\n", " -1.00878862, -1.02532046],\n", " [-0.39429947, -0.98881258, 1.6768033 , ..., 0.55906285,\n", " -1.00878862, -1.02532046]], shape=(1600, 20))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "X = df.drop('price_range', axis=1)\n", "y = df['price_range']\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, y, test_size=0.2, random_state=42, stratify=y\n", ")\n", "\n", "display(X.head())\n", "\n", "scaler = StandardScaler() # С minmax хуже (x - mean) / std\n", "X_train_scaled = scaler.fit_transform(X_train)\n", "X_test_scaled = scaler.transform(X_test)\n", "\n", "display(X_train_scaled)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Задание 2\n", "\n", "Реализуйте с алгоритм логистической регрессии для многоклассовой классификации." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
LogisticRegression(max_iter=1000, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LogisticRegression(max_iter=1000, random_state=42)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import LogisticRegression\n", "\n", "model = LogisticRegression(\n", " max_iter=1000,\n", " random_state=42\n", ")\n", "\n", "model.fit(X_train_scaled, y_train)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "nwDuPoSHlKDP" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Точность модели: 0.965\n", "[[98 2 0 0]\n", " [ 1 96 3 0]\n", " [ 0 2 94 4]\n", " [ 0 0 2 98]]\n" ] } ], "source": [ "from sklearn.metrics import confusion_matrix, accuracy_score\n", "\n", "y_pred = model.predict(X_test_scaled)\n", "\n", "print(\"Точность модели:\", accuracy_score(y_test, y_pred))\n", "print(confusion_matrix(y_test, y_pred))" ] }, { "cell_type": "markdown", "metadata": { "id": "eWc9D163lKPB" }, "source": [ "### Задание 3\n", "\n", "Реализуйте алгоритм Наивный Байес." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "id": "LRuuj9PDli5A" }, "outputs": [ { "data": { "text/html": [ "
GaussianNB()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "GaussianNB()" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.naive_bayes import GaussianNB\n", "\n", "model = GaussianNB()\n", "\n", "model.fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "id": "amg6aqULlovg" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Точность модели: 0.8100\n", "[[90 10 0 0]\n", " [ 7 69 24 0]\n", " [ 0 18 73 9]\n", " [ 0 0 8 92]]\n" ] } ], "source": [ "y_pred = model.predict(X_test)\n", "\n", "accuracy = accuracy_score(y_test, y_pred)\n", "print(f\"Точность модели: {accuracy:.4f}\")\n", "\n", "print(confusion_matrix(y_test, y_pred))" ] }, { "cell_type": "markdown", "metadata": { "id": "RYFhErkHlmFV" }, "source": [ "### Задание 4\n", "\n", "Реализуйте алгоритм kNN." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "id": "fUQ70y9Plq9u" }, "outputs": [ { "data": { "text/html": [ "
KNeighborsClassifier(n_neighbors=16)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "KNeighborsClassifier(n_neighbors=16)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.neighbors import KNeighborsClassifier\n", "\n", "knn = KNeighborsClassifier(n_neighbors=16)\n", "knn.fit(X_train_scaled, y_train)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.56\n", "[[72 28 0 0]\n", " [31 44 25 0]\n", " [ 5 27 47 21]\n", " [ 0 5 34 61]]\n" ] } ], "source": [ "y_pred = knn.predict(X_test_scaled)\n", "\n", "print(\"Accuracy:\", accuracy_score(y_test, y_pred))\n", "print(confusion_matrix(y_test, y_pred))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": { "id": "ssFzfn1Pl4AI" }, "source": [ "### Задание 5\n", "\n", "Сделайте выводы о результатах обучения." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Логистическая регрессия хорошо подходит для линейно разделимых данных, что, вероятно, имело место в данном случае. Наивный Байес хуже справился с задачей из-за нарушения предположения о независимости признаков. Например, такие параметры, как оперативная память и размер экрана, могут быть коррелированы, что снижает качество прогноза. Метод KNN, вероятно, не подходит для данной задачи из-за большой разности в корреляции цены в зависимости от параметра." ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" } }, "nbformat": 4, "nbformat_minor": 0 }