Оценка теста NaN при попытке оценить модель регрессора дерева решений

ProgramBox

Оценка теста NaN при попытке оценить модель регрессора дерева решений

Post author:admin
Запись опубликована:18 января, 2022
Post category:Вопросы по программированию

#python #decision-tree #scikit-learn-pipeline

Вопрос:

Я пытаюсь оценить точность модели дерева решений, используя как числовые, так и категориальные характеристики из набора данных жилищного фонда Эймса. Для предварительной обработки числовых функций я использовал SimpleImputer и StandardScalar. Что касается категориальных функций, я использовал один горячий кодировщик. Я попытался оценить модель дерева решений (регрессор дерева решений), используя 10-кратную перекрестную проверку, но я получаю значение Nan для оценки теста. Это мой код:

 import pandas as pd
ames_housing = pd.read_csv("../datasets/house_prices.csv", na_values="?")
target_name = "SalePrice"
data = ames_housing.drop(columns=target_name)
target = ames_housing[target_name]

numerical_features = [
"LotFrontage", "LotArea", "MasVnrArea", "BsmtFinSF1", "BsmtFinSF2",
"BsmtUnfSF", "TotalBsmtSF", "1stFlrSF", "2ndFlrSF", "LowQualFinSF",
"GrLivArea", "BedroomAbvGr", "KitchenAbvGr", "TotRmsAbvGrd", "Fireplaces",
"GarageCars", "GarageArea", "WoodDeckSF", "OpenPorchSF", "EnclosedPorch",
"3SsnPorch", "ScreenPorch", "PoolArea", "MiscVal",]

data_numerical = data[numerical_features]

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_validate
from sklearn.tree import DecisionTreeRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_selector as selector
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import OneHotEncoder

categorical_columns = selector(dtype_include=object)(data)
numerical_columns = selector(dtype_exclude=object)(data)

preprocessor = make_column_transformer(
(OneHotEncoder(handle_unknown="ignore"), categorical_columns),
(StandardScaler(), SimpleImputer(), numerical_columns),
)

model = make_pipeline(preprocessor, DecisionTreeRegressor())

cv_results = cross_validate(
model, data, target, cv=10, return_estimator=True, n_jobs=2,
)

scores = cv_results["test_score"]
print(f"Accuracy score by cross-validation "
  f"search:n{scores.mean():.3f}  /- {scores.std():.3f}")

Это то, что я получаю за результат теста:

 Accuracy score by cross-validation search:
nan  /- nan

Чтобы выяснить источник проблемы, я передал (error_score= «поднять») в качестве параметра при перекрестной проверке. В результате выяснилось, что ошибка:

  ValueError: No valid specification of the columns. Only a scalar, list or slice of all integers 
  or all strings, or boolean mask is allowed

Как мне решить эту проблему? Любая помощь будет очень признательна. Спасибо 🙂

Вот как выглядит моя модель:

 Pipeline(steps=[('columntransformer',
             ColumnTransformer(transformers=[('onehotencoder',
                                              OneHotEncoder(handle_unknown='ignore'),
                                              ['MSZoning', 'Street',
                                               'Alley', 'LotShape',
                                               'LandContour', 'Utilities',
                                               'LotConfig', 'LandSlope',
                                               'Neighborhood', 'Condition1',
                                               'Condition2', 'BldgType',
                                               'HouseStyle', 'RoofStyle',
                                               'RoofMatl', 'Exterior1st',
                                               'Exterior2nd', 'MasVnrType',
                                               'ExterQual', 'ExterCond',
                                               'Foundation', 'BsmtQual',
                                               'BsmtCond', 'BsmtExposure',
                                               'BsmtFinType1',
                                               'BsmtFinType2', 'Heating',
                                               'HeatingQC', 'CentralAir',
                                               'Electrical', ...]),
                                             ('standardscaler',
                                              StandardScaler(),
                                              SimpleImputer())])),
            ('decisiontreeregressor', DecisionTreeRegressor())])

Data:

 <class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 80 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallCond    1460 non-null   int64  
 19  YearBuilt      1460 non-null   int64  
 20  YearRemodAdd   1460 non-null   int64  
 21  RoofStyle      1460 non-null   object 
 22  RoofMatl       1460 non-null   object 
 23  Exterior1st    1460 non-null   object 
 24  Exterior2nd    1460 non-null   object 
 25  MasVnrType     1452 non-null   object 
 26  MasVnrArea     1452 non-null   float64
 27  ExterQual      1460 non-null   object 
 28  ExterCond      1460 non-null   object 
 29  Foundation     1460 non-null   object 
 30  BsmtQual       1423 non-null   object 
 31  BsmtCond       1423 non-null   object 
 32  BsmtExposure   1422 non-null   object 
 33  BsmtFinType1   1423 non-null   object 
 34  BsmtFinSF1     1460 non-null   int64  
 35  BsmtFinType2   1422 non-null   object 
 36  BsmtFinSF2     1460 non-null   int64  
 37  BsmtUnfSF      1460 non-null   int64  
 38  TotalBsmtSF    1460 non-null   int64  
 39  Heating        1460 non-null   object 
 40  HeatingQC      1460 non-null   object 
 41  CentralAir     1460 non-null   object 
 42  Electrical     1459 non-null   object 
 43  1stFlrSF       1460 non-null   int64  
 44  2ndFlrSF       1460 non-null   int64  
 45  LowQualFinSF   1460 non-null   int64  
 46  GrLivArea      1460 non-null   int64  
 47  BsmtFullBath   1460 non-null   int64  
 48  BsmtHalfBath   1460 non-null   int64  
 49  FullBath       1460 non-null   int64  
 50  HalfBath       1460 non-null   int64  
 51  BedroomAbvGr   1460 non-null   int64  
 52  KitchenAbvGr   1460 non-null   int64  
 53  KitchenQual    1460 non-null   object 
 54  TotRmsAbvGrd   1460 non-null   int64  
 55  Functional     1460 non-null   object 
 56  Fireplaces     1460 non-null   int64  
 57  FireplaceQu    770 non-null    object 
 58  GarageType     1379 non-null   object 
 59  GarageYrBlt    1379 non-null   float64
 60  GarageFinish   1379 non-null   object 
 61  GarageCars     1460 non-null   int64  
 62  GarageArea     1460 non-null   int64  
 63  GarageQual     1379 non-null   object 
 64  GarageCond     1379 non-null   object 
 65  PavedDrive     1460 non-null   object 
 66  WoodDeckSF     1460 non-null   int64  
 67  OpenPorchSF    1460 non-null   int64  
 68  EnclosedPorch  1460 non-null   int64  
 69  3SsnPorch      1460 non-null   int64  
 70  ScreenPorch    1460 non-null   int64  
 71  PoolArea       1460 non-null   int64  
 72  PoolQC         7 non-null      object 
 73  Fence          281 non-null    object 
 74  MiscFeature    54 non-null     object 
 75  MiscVal        1460 non-null   int64  
 76  MoSold         1460 non-null   int64  
 77  YrSold         1460 non-null   int64  
 78  SaleType       1460 non-null   object 
 79  SaleCondition  1460 non-null   object 
dtypes: float64(3), int64(34), object(43)
memory usage: 912.6  KB

Цель:

 0       208500
1       181500
2       223500
3       140000
4       250000
         ...  
1455    175000
1456    210000
1457    266500
1458    142125
1459    147500
Name: SalePrice, Length: 1460, dtype: int64

1. Можете ли вы предоставить образец того model data , что и target как выглядит при печати? Похоже, вы смешиваете типы данных в одном из них.

2. Привет, я опубликовал образец модели, данных и цели в своем посте. Я не могу опубликовать его здесь, так как я набрал максимальное количество символов в разделе комментариев.

Ответ №1:

Если у одного из ваших трансформаторов более 1 оценщика, в данном случае для числового столбца, у вас есть StandardScaler(), SimpleImputer() , вам нужно обернуть его трубопроводом, например:

 np.random.seed(111)
data = pd.DataFrame(np.random.uniform(0,1,(100,3)),columns=['n1','n2','n3'])
data['c1'] = np.random.choice(['A','B',],100)
target = np.random.normal(0,1,100)

cat_columns = selector(dtype_include=object)(data)
num_columns = selector(dtype_exclude=object)(data)

num_transformer = Pipeline(steps=[
    ('scaler', StandardScaler()),
    ('imputer', SimpleImputer())
    ])

preprocessor = make_column_transformer(
(OneHotEncoder(handle_unknown="ignore"), cat_columns),
(num_transformer, num_columns),
)

Просто проверьте это на наборе данных, это работает:

 preprocessor.fit_transform(data)[:2]

array([[ 0.        ,  1.        ,  0.42472149, -1.23160187, -0.1782728 ],
       [ 0.        ,  1.        ,  0.95749076, -0.79751471, -1.12825996]])

Затем запустите все:

 model = make_pipeline(preprocessor, DecisionTreeRegressor())

cv_results = cross_validate(
model, data, target, cv=10, return_estimator=True, n_jobs=2,
)

scores = cv_results["test_score"]

array([-2.45981423, -7.88563769, -1.15523361, -0.56772717, -0.84663734,
       -0.61938564, -3.1854688 , -1.44865232, -0.41933732, -3.13719368])

Метки: Оценка теста NaN при попытке оценить модель регрессора дерева решений

Вопрос:

Комментарии:

Ответ №1:

Вам также может понравиться

отправка документов с помощью nodemailer и чат-бота

Извлечение исходного кода Android из github

Обрезка путей из git diff приводит к получению Groovyscript