{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "CvmiakWDcuyf" }, "source": [ "# Лабораторная работа №3\n", "## по предмету \"Системы искусственного интеллекта\"\n", "\n", "Целью работы является изучение методов регуляризации.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "YEZ0T1uwj34v" }, "source": [ "### Задание 1\n", "\n", "Выгрузите данные из датасета. Изучите колонки, проверьте наличие пропусков. Постройте матрицу корреляции между признаками и целевой переменной. Сделайте выводы, что показывает эта матрица." ] }, { "cell_type": "code", "execution_count": 128, "metadata": { "id": "91NHysjQj26f" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
brandprocessor_brandprocessor_nameprocessor_gnrtnram_gbram_typessdhddosos_bitgraphic_card_gbweightwarrantyTouchscreenmsofficePriceratingNumber of RatingsNumber of Reviews
0ASUSIntelCore i310th4 GBDDR40 GB1024 GBWindows64-bit0 GBCasualNo warrantyNoNo346492 stars30
1LenovoIntelCore i310th4 GBDDR40 GB1024 GBWindows64-bit0 GBCasualNo warrantyNoNo389993 stars655
2LenovoIntelCore i310th4 GBDDR40 GB1024 GBWindows64-bit0 GBCasualNo warrantyNoNo399993 stars81
3ASUSIntelCore i510th8 GBDDR4512 GB0 GBWindows32-bit2 GBCasualNo warrantyNoNo699903 stars00
4ASUSIntelCeleron DualNot Available4 GBDDR40 GB512 GBWindows64-bit0 GBCasualNo warrantyNoNo269903 stars00
\n", "
" ], "text/plain": [ " brand processor_brand processor_name processor_gnrtn ram_gb ram_type \\\n", "0 ASUS Intel Core i3 10th 4 GB DDR4 \n", "1 Lenovo Intel Core i3 10th 4 GB DDR4 \n", "2 Lenovo Intel Core i3 10th 4 GB DDR4 \n", "3 ASUS Intel Core i5 10th 8 GB DDR4 \n", "4 ASUS Intel Celeron Dual Not Available 4 GB DDR4 \n", "\n", " ssd hdd os os_bit graphic_card_gb weight warranty \\\n", "0 0 GB 1024 GB Windows 64-bit 0 GB Casual No warranty \n", "1 0 GB 1024 GB Windows 64-bit 0 GB Casual No warranty \n", "2 0 GB 1024 GB Windows 64-bit 0 GB Casual No warranty \n", "3 512 GB 0 GB Windows 32-bit 2 GB Casual No warranty \n", "4 0 GB 512 GB Windows 64-bit 0 GB Casual No warranty \n", "\n", " Touchscreen msoffice Price rating Number of Ratings Number of Reviews \n", "0 No No 34649 2 stars 3 0 \n", "1 No No 38999 3 stars 65 5 \n", "2 No No 39999 3 stars 8 1 \n", "3 No No 69990 3 stars 0 0 \n", "4 No No 26990 3 stars 0 0 " ] }, "execution_count": 128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv('AISP2.csv')\n", "\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "brand 0\n", "processor_brand 0\n", "processor_name 0\n", "processor_gnrtn 0\n", "ram_gb 0\n", "ram_type 0\n", "ssd 0\n", "hdd 0\n", "os 0\n", "os_bit 0\n", "graphic_card_gb 0\n", "weight 0\n", "warranty 0\n", "Touchscreen 0\n", "msoffice 0\n", "Price 0\n", "rating 0\n", "Number of Ratings 0\n", "Number of Reviews 0\n", "dtype: int64" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "brand 0\n", "processor_brand 0\n", "processor_name 0\n", "processor_gnrtn 0\n", "ram_gb 0\n", "ram_type 0\n", "ssd 0\n", "hdd 0\n", "os 0\n", "os_bit 0\n", "graphic_card_gb 0\n", "weight 0\n", "warranty 0\n", "Touchscreen 0\n", "msoffice 0\n", "Price 0\n", "rating 0\n", "Number of Ratings 0\n", "Number of Reviews 0\n", "dtype: int64" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(df.isnull().sum())\n", "display(df.isna().sum())" ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Price 1.000000\n", "ssd 0.628272\n", "ram_gb 0.518323\n", "graphic_card_gb 0.459986\n", "processor_name_Core i7 0.377777\n", "processor_name_Core i9 0.359096\n", "brand_APPLE 0.312112\n", "os_Mac 0.312112\n", "processor_name_M1 0.274581\n", "processor_brand_M1 0.274581\n", "processor_name_Ryzen 9 0.253506\n", "weight_Casual 0.247878\n", "processor_gnrtn_12th 0.219060\n", "Touchscreen_Yes 0.189126\n", "ram_type_LPDDR3 0.181314\n", "ram_type_LPDDR4X 0.173809\n", "ram_type_DDR5 0.168689\n", "processor_gnrtn_10th 0.164034\n", "os_DOS 0.140780\n", "brand_MSI 0.123952\n", "msoffice_No 0.105752\n", "warranty_3 years 0.080610\n", "processor_name_Ryzen 7 0.061872\n", "ram_type_DDR3 0.042006\n", "processor_gnrtn_8th 0.040292\n", "warranty_1 year 0.033312\n", "brand_ASUS 0.032036\n", "ram_type_LPDDR4 0.028034\n", "processor_gnrtn_9th 0.021192\n", "os_bit_32-bit 0.018458\n", "processor_brand_AMD -0.001583\n", "weight_Gaming -0.012524\n", "os_bit_64-bit -0.018458\n", "processor_gnrtn_4th -0.018769\n", "processor_name_Core i5 -0.023218\n", "brand_acer -0.024663\n", "warranty_2 years -0.029339\n", "brand_HP -0.030649\n", "rating -0.033528\n", "brand_Avita -0.033819\n", "brand_Lenovo -0.039079\n", "warranty_No warranty -0.045241\n", "processor_gnrtn_7th -0.045656\n", "processor_gnrtn_11th -0.085683\n", "processor_brand_Intel -0.103966\n", "processor_gnrtn_Not Available -0.105722\n", "msoffice_Yes -0.105752\n", "processor_name_Pentium Quad -0.111755\n", "processor_name_Ryzen 5 -0.114138\n", "Number of Ratings -0.140392\n", "Number of Reviews -0.148738\n", "processor_name_Ryzen 3 -0.150211\n", "brand_DELL -0.166272\n", "Touchscreen_No -0.189126\n", "processor_name_Celeron Dual -0.200490\n", "weight_ThinNlight -0.250425\n", "hdd -0.252699\n", "ram_type_DDR4 -0.270184\n", "os_Windows -0.337929\n", "processor_name_Core i3 -0.377232\n", "Name: Price, dtype: float64" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['ram_gb'] = df['ram_gb'].str.replace(' GB', '').astype(float)\n", "df['ssd'] = df['ssd'].str.replace(' GB', '').astype(float)\n", "df['hdd'] = df['hdd'].str.replace(' GB', '').astype(float)\n", "df['graphic_card_gb'] = df['graphic_card_gb'].str.replace(' GB', '').astype(float)\n", "df['rating'] = df['rating'].str.replace(' stars', '').str.replace(' star', '').astype(float)\n", "\n", "df = pd.get_dummies(df, \n", " columns=['brand', 'processor_brand', 'processor_name', 'ram_type', \n", " 'os', 'os_bit', 'Touchscreen', 'msoffice', 'warranty', 'processor_gnrtn', 'weight'])\n", "\n", "df.corr(numeric_only=True)['Price'].sort_values(ascending=False)" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ram_gb float64\n", "ssd float64\n", "hdd float64\n", "graphic_card_gb float64\n", "Price int64\n", "rating float64\n", "Number of Ratings int64\n", "Number of Reviews int64\n", "brand_APPLE bool\n", "brand_ASUS bool\n", "brand_Avita bool\n", "brand_DELL bool\n", "brand_HP bool\n", "brand_Lenovo bool\n", "brand_MSI bool\n", "brand_acer bool\n", "processor_brand_AMD bool\n", "processor_brand_Intel bool\n", "processor_brand_M1 bool\n", "processor_name_Celeron Dual bool\n", "processor_name_Core i3 bool\n", "processor_name_Core i5 bool\n", "processor_name_Core i7 bool\n", "processor_name_Core i9 bool\n", "processor_name_M1 bool\n", "processor_name_Pentium Quad bool\n", "processor_name_Ryzen 3 bool\n", "processor_name_Ryzen 5 bool\n", "processor_name_Ryzen 7 bool\n", "processor_name_Ryzen 9 bool\n", "ram_type_DDR3 bool\n", "ram_type_DDR4 bool\n", "ram_type_DDR5 bool\n", "ram_type_LPDDR3 bool\n", "ram_type_LPDDR4 bool\n", "ram_type_LPDDR4X bool\n", "os_DOS bool\n", "os_Mac bool\n", "os_Windows bool\n", "os_bit_32-bit bool\n", "os_bit_64-bit bool\n", "Touchscreen_No bool\n", "Touchscreen_Yes bool\n", "msoffice_No bool\n", "msoffice_Yes bool\n", "warranty_1 year bool\n", "warranty_2 years bool\n", "warranty_3 years bool\n", "warranty_No warranty bool\n", "processor_gnrtn_10th bool\n", "processor_gnrtn_11th bool\n", "processor_gnrtn_12th bool\n", "processor_gnrtn_4th bool\n", "processor_gnrtn_7th bool\n", "processor_gnrtn_8th bool\n", "processor_gnrtn_9th bool\n", "processor_gnrtn_Not Available bool\n", "weight_Casual bool\n", "weight_Gaming bool\n", "weight_ThinNlight bool\n", "dtype: object" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes\n" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "y = df['Price']\n", "\n", "X = df.drop('Price', axis=1)\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" ] }, { "cell_type": "markdown", "metadata": { "id": "hgqsngyck7xl" }, "source": [ "### Задание 2\n", "\n", "Реализуйте алгоритм линеной регрессии без использования регуляризации." ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "id": "Ch5WytHwlGpd" }, "outputs": [ { "data": { "text/html": [ "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "LinearRegression()" ] }, "execution_count": 122, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import LinearRegression\n", "\n", "model = LinearRegression()\n", "model.fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "id": "nwDuPoSHlKDP" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MSE: 638671150.10\n", "RMSE: 25271.94\n", "R²: 0.6723\n" ] } ], "source": [ "import numpy as np\n", "\n", "from sklearn.metrics import mean_squared_error, r2_score\n", "\n", "y_pred = model.predict(X_test)\n", "\n", "mse = mean_squared_error(y_test, y_pred)\n", "r2 = r2_score(y_test, y_pred)\n", "\n", "print(f'MSE: {mse:.2f}')\n", "print(f\"RMSE: {np.sqrt(mse):.2f}\") # рублей\n", "print(f'R²: {r2:.4f}')" ] }, { "cell_type": "markdown", "metadata": { "id": "eWc9D163lKPB" }, "source": [ "### Задание 3\n", "\n", "Реализуйте алгоритм линейной регрессии с L1-регуляризацией." ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "id": "LRuuj9PDli5A" }, "outputs": [ { "data": { "text/html": [ "
Pipeline(steps=[('standardscaler', StandardScaler()),\n",
              "                ('lasso', Lasso(alpha=100, random_state=42))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(steps=[('standardscaler', StandardScaler()),\n", " ('lasso', Lasso(alpha=100, random_state=42))])" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import Lasso\n", "from sklearn.pipeline import make_pipeline\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "lasso = make_pipeline(StandardScaler(), Lasso(alpha=100, random_state=42))\n", "lasso.fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MSE: 608702817.47\n", "RMSE: 24671.90\n", "R²: 0.6877\n" ] } ], "source": [ "y_pred = lasso.predict(X_test)\n", "mse = mean_squared_error(y_test, y_pred)\n", "r2 = r2_score(y_test, y_pred)\n", "\n", "print(f\"MSE: {mse:.2f}\")\n", "print(f\"RMSE: {np.sqrt(mse):.2f}\")\n", "print(f\"R²: {r2:.4f}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "RYFhErkHlmFV" }, "source": [ "### Задание 4\n", "\n", "Реализуйте алгоритм линейной регрессии с L2-регуляризацией." ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "id": "fUQ70y9Plq9u" }, "outputs": [ { "data": { "text/html": [ "
Pipeline(steps=[('standardscaler', StandardScaler()),\n",
              "                ('ridge', Ridge(alpha=100, random_state=42))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(steps=[('standardscaler', StandardScaler()),\n", " ('ridge', Ridge(alpha=100, random_state=42))])" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import Ridge\n", "\n", "ridge = make_pipeline(StandardScaler(), Ridge(alpha=100, random_state=42))\n", "ridge.fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 127, "metadata": { "id": "78PA6hmwl-1p" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MSE: 576791993.95\n", "RMSE: 24016.49\n", "R²: 0.7040\n" ] } ], "source": [ "y_pred = ridge.predict(X_test)\n", "\n", "mse = mean_squared_error(y_test, y_pred)\n", "rmse = np.sqrt(mse)\n", "r2 = r2_score(y_test, y_pred)\n", "\n", "print(f\"MSE: {mse:.2f}\")\n", "print(f\"RMSE: {rmse:.2f}\")\n", "print(f\"R²: {r2:.4f}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "ssFzfn1Pl4AI" }, "source": [ "### Задание 5\n", "\n", "Сделайте выводы о результатах обучения." ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" } }, "nbformat": 4, "nbformat_minor": 0 }