{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "CvmiakWDcuyf" }, "source": [ "# Лабораторная работа №2\n", "## по предмету \"Системы искусственного интеллекта\"\n", "\n", "В данной лабораторной работе вы будете работать с набором данных, который содержит информацию о технических характеристиках ноутбуков и их цену.\n", "Целью работы является изучение теоретических основ методов машинного обучения.\n", "\n", "В наборе данных для лабораторной работы содержится абор характеристик мобильных телефонов, включая мощность аккумулятора, характеристики камеры, поддержку сети, память, размеры экрана и другие атрибуты. Столбец «price_range» классифицирует телефоны по ценовым диапазонам (этот столбец необходимо предсказать)." ] }, { "cell_type": "markdown", "metadata": { "id": "YEZ0T1uwj34v" }, "source": [ "### Задание 1\n", "\n", "Выгрузите данные из датасета. Изучите колонки, проверьте наличие пропусков. Постройте матрицу корреляции между признаками и целевой переменной. Сделайте выводы, что показывает эта матрица." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "91NHysjQj26f" }, "outputs": [ { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "

	battery_power	blue	clock_speed	dual_sim	fc	four_g	int_memory	m_dep	mobile_wt	n_cores	...	px_height	px_width	ram	sc_h	sc_w	talk_time	three_g	touch_screen	wifi	price_range
0	842	0	2.2	0	1	0	7	0.6	188	2	...	20	756	2549	9	7	19	0	0	1	1
1	1021	1	0.5	1	0	1	53	0.7	136	3	...	905	1988	2631	17	3	7	1	1	0	2
2	563	1	0.5	1	2	1	41	0.9	145	5	...	1263	1716	2603	11	2	9	1	1	0	2
3	615	1	2.5	0	0	0	10	0.8	131	6	...	1216	1786	2769	16	8	11	1	0	0	2
4	1821	1	1.2	0	13	1	44	0.6	141	2	...	1208	1212	1411	8	2	15	1	1	0	1

\n", "

5 rows × 21 columns

\n", "

" ], "text/plain": [ " battery_power blue clock_speed dual_sim fc four_g int_memory m_dep \\\n", "0 842 0 2.2 0 1 0 7 0.6 \n", "1 1021 1 0.5 1 0 1 53 0.7 \n", "2 563 1 0.5 1 2 1 41 0.9 \n", "3 615 1 2.5 0 0 0 10 0.8 \n", "4 1821 1 1.2 0 13 1 44 0.6 \n", "\n", " mobile_wt n_cores ... px_height px_width ram sc_h sc_w talk_time \\\n", "0 188 2 ... 20 756 2549 9 7 19 \n", "1 136 3 ... 905 1988 2631 17 3 7 \n", "2 145 5 ... 1263 1716 2603 11 2 9 \n", "3 131 6 ... 1216 1786 2769 16 8 11 \n", "4 141 2 ... 1208 1212 1411 8 2 15 \n", "\n", " three_g touch_screen wifi price_range \n", "0 0 0 1 1 \n", "1 1 1 0 2 \n", "2 1 1 0 2 \n", "3 1 0 0 2 \n", "4 1 1 0 1 \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv('AIS2.csv')\n", "\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "battery_power 0\n", "blue 0\n", "clock_speed 0\n", "dual_sim 0\n", "fc 0\n", "four_g 0\n", "int_memory 0\n", "m_dep 0\n", "mobile_wt 0\n", "n_cores 0\n", "pc 0\n", "px_height 0\n", "px_width 0\n", "ram 0\n", "sc_h 0\n", "sc_w 0\n", "talk_time 0\n", "three_g 0\n", "touch_screen 0\n", "wifi 0\n", "price_range 0\n", "dtype: int64" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "battery_power 0\n", "blue 0\n", "clock_speed 0\n", "dual_sim 0\n", "fc 0\n", "four_g 0\n", "int_memory 0\n", "m_dep 0\n", "mobile_wt 0\n", "n_cores 0\n", "pc 0\n", "px_height 0\n", "px_width 0\n", "ram 0\n", "sc_h 0\n", "sc_w 0\n", "talk_time 0\n", "three_g 0\n", "touch_screen 0\n", "wifi 0\n", "price_range 0\n", "dtype: int64" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "battery_power 0\n", "blue 1010\n", "clock_speed 0\n", "dual_sim 981\n", "fc 474\n", "four_g 957\n", "int_memory 0\n", "m_dep 0\n", "mobile_wt 0\n", "n_cores 0\n", "pc 101\n", "px_height 2\n", "px_width 0\n", "ram 0\n", "sc_h 0\n", "sc_w 180\n", "talk_time 0\n", "three_g 477\n", "touch_screen 994\n", "wifi 986\n", "price_range 500\n", "dtype: int64" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(df.isnull().sum())\n", "display(df.isna().sum())\n", "display(df.eq(0).sum())" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "price_range 1.000000\n", "ram 0.917046\n", "battery_power 0.200723\n", "px_width 0.165818\n", "px_height 0.148858\n", "int_memory 0.044435\n", "sc_w 0.038711\n", "pc 0.033599\n", "three_g 0.023611\n", "sc_h 0.022986\n", "fc 0.021998\n", "talk_time 0.021859\n", "blue 0.020573\n", "wifi 0.018785\n", "dual_sim 0.017444\n", "four_g 0.014772\n", "n_cores 0.004399\n", "m_dep 0.000853\n", "clock_speed -0.006606\n", "mobile_wt -0.030302\n", "touch_screen -0.030411\n", "Name: price_range, dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.corr()['price_range'].sort_values(ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Ch5WytHwlGpd" }, "outputs": [ { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "

	battery_power	blue	clock_speed	dual_sim	fc	four_g	int_memory	m_dep	mobile_wt	n_cores	pc	px_height	px_width	ram	sc_h	sc_w	talk_time	three_g	touch_screen	wifi
0	842	0	2.2	0	1	0	7	0.6	188	2	2	20	756	2549	9	7	19	0	0	1
1	1021	1	0.5	1	0	1	53	0.7	136	3	6	905	1988	2631	17	3	7	1	1	0
2	563	1	0.5	1	2	1	41	0.9	145	5	6	1263	1716	2603	11	2	9	1	1	0
3	615	1	2.5	0	0	0	10	0.8	131	6	9	1216	1786	2769	16	8	11	1	0	0
4	1821	1	1.2	0	13	1	44	0.6	141	2	14	1208	1212	1411	8	2	15	1	1	0

\n", "

" ], "text/plain": [ " battery_power blue clock_speed dual_sim fc four_g int_memory m_dep \\\n", "0 842 0 2.2 0 1 0 7 0.6 \n", "1 1021 1 0.5 1 0 1 53 0.7 \n", "2 563 1 0.5 1 2 1 41 0.9 \n", "3 615 1 2.5 0 0 0 10 0.8 \n", "4 1821 1 1.2 0 13 1 44 0.6 \n", "\n", " mobile_wt n_cores pc px_height px_width ram sc_h sc_w talk_time \\\n", "0 188 2 2 20 756 2549 9 7 19 \n", "1 136 3 6 905 1988 2631 17 3 7 \n", "2 145 5 6 1263 1716 2603 11 2 9 \n", "3 131 6 9 1216 1786 2769 16 8 11 \n", "4 141 2 14 1208 1212 1411 8 2 15 \n", "\n", " three_g touch_screen wifi \n", "0 0 0 1 \n", "1 1 1 0 \n", "2 1 1 0 \n", "3 1 0 0 \n", "4 1 1 0 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "array([[ 1.38919326, 1.011314 , -1.25918898, ..., -1.78870765,\n", " -1.00878862, -1.02532046],\n", " [ 0.07840603, 1.011314 , -1.25918898, ..., -1.78870765,\n", " -1.00878862, 0.97530483],\n", " [-1.02457347, -0.98881258, 1.18747125, ..., 0.55906285,\n", " 0.99128795, 0.97530483],\n", " ...,\n", " [ 1.10374308, -0.98881258, 0.5758062 , ..., 0.55906285,\n", " 0.99128795, 0.97530483],\n", " [-1.21867959, 1.011314 , 1.43213728, ..., 0.55906285,\n", " -1.00878862, -1.02532046],\n", " [-0.39429947, -0.98881258, 1.6768033 , ..., 0.55906285,\n", " -1.00878862, -1.02532046]], shape=(1600, 20))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "X = df.drop('price_range', axis=1)\n", "y = df['price_range']\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, y, test_size=0.2, random_state=42, stratify=y\n", ")\n", "\n", "display(X.head())\n", "\n", "scaler = StandardScaler() # С minmax хуже (x - mean) / std\n", "X_train_scaled = scaler.fit_transform(X_train)\n", "X_test_scaled = scaler.transform(X_test)\n", "\n", "display(X_train_scaled)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Задание 2\n", "\n", "Реализуйте с алгоритм логистической регрессии для многоклассовой классификации." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

LogisticRegression(max_iter=1000, random_state=42)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

LogisticRegression

?Documentation for LogisticRegressioniFitted

\n", "

\n", "

\n", "

Parameters

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "

	penalty	'l2'
	dual	False
	tol	0.0001
	C	1.0
	fit_intercept	True
	intercept_scaling	1
	class_weight	None
	random_state	42
	solver	'lbfgs'
	max_iter	1000
	multi_class	'deprecated'
	verbose	0
	warm_start	False
	n_jobs	None
	l1_ratio	None

\n", "

\n", "

\n", "

" ], "text/plain": [ "LogisticRegression(max_iter=1000, random_state=42)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import LogisticRegression\n", "\n", "model = LogisticRegression(\n", " max_iter=1000,\n", " random_state=42\n", ")\n", "\n", "model.fit(X_train_scaled, y_train)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "nwDuPoSHlKDP" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Точность модели: 0.965\n", "[[98 2 0 0]\n", " [ 1 96 3 0]\n", " [ 0 2 94 4]\n", " [ 0 0 2 98]]\n" ] } ], "source": [ "from sklearn.metrics import confusion_matrix, accuracy_score\n", "\n", "y_pred = model.predict(X_test_scaled)\n", "\n", "print(\"Точность модели:\", accuracy_score(y_test, y_pred))\n", "print(confusion_matrix(y_test, y_pred))" ] }, { "cell_type": "markdown", "metadata": { "id": "eWc9D163lKPB" }, "source": [ "### Задание 3\n", "\n", "Реализуйте алгоритм Наивный Байес." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "id": "LRuuj9PDli5A" }, "outputs": [ { "data": { "text/html": [ "

GaussianNB()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

" ], "text/plain": [ "GaussianNB()" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.naive_bayes import GaussianNB\n", "\n", "model = GaussianNB()\n", "\n", "model.fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "id": "amg6aqULlovg" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Точность модели: 0.8100\n", "[[90 10 0 0]\n", " [ 7 69 24 0]\n", " [ 0 18 73 9]\n", " [ 0 0 8 92]]\n" ] } ], "source": [ "y_pred = model.predict(X_test)\n", "\n", "accuracy = accuracy_score(y_test, y_pred)\n", "print(f\"Точность модели: {accuracy:.4f}\")\n", "\n", "print(confusion_matrix(y_test, y_pred))" ] }, { "cell_type": "markdown", "metadata": { "id": "RYFhErkHlmFV" }, "source": [ "### Задание 4\n", "\n", "Реализуйте алгоритм kNN." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "id": "fUQ70y9Plq9u" }, "outputs": [ { "data": { "text/html": [ "

KNeighborsClassifier(n_neighbors=16)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

KNeighborsClassifier

?Documentation for KNeighborsClassifieriFitted

\n", "

\n", "

\n", "

Parameters

\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "

	n_neighbors	16
	weights	'uniform'
	algorithm	'auto'
	leaf_size	30
	p	2
	metric	'minkowski'
	metric_params	None
	n_jobs	None

\n", "

\n", "

\n", "

" ], "text/plain": [ "KNeighborsClassifier(n_neighbors=16)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.neighbors import KNeighborsClassifier\n", "\n", "knn = KNeighborsClassifier(n_neighbors=16)\n", "knn.fit(X_train_scaled, y_train)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.56\n", "[[72 28 0 0]\n", " [31 44 25 0]\n", " [ 5 27 47 21]\n", " [ 0 5 34 61]]\n" ] } ], "source": [ "y_pred = knn.predict(X_test_scaled)\n", "\n", "print(\"Accuracy:\", accuracy_score(y_test, y_pred))\n", "print(confusion_matrix(y_test, y_pred))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": { "id": "ssFzfn1Pl4AI" }, "source": [ "### Задание 5\n", "\n", "Сделайте выводы о результатах обучения." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Логистическая регрессия хорошо подходит для линейно разделимых данных, что, вероятно, имело место в данном случае. Наивный Байес хуже справился с задачей из-за нарушения предположения о независимости признаков. Например, такие параметры, как оперативная память и размер экрана, могут быть коррелированы, что снижает качество прогноза. Метод KNN, вероятно, не подходит для данной задачи из-за большой разности в корреляции цены в зависимости от параметра." ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" } }, "nbformat": 4, "nbformat_minor": 0 }