Как исправить ошибку "Найдены входные переменные с несогласованным количеством выборок: [100, 50]"?

#python #arrays #numpy #scikit-learn #jupyter-notebook

#python #массивы #numpy #scikit-learn #jupyter-ноутбук

Вопрос:

Я получаю эту ошибку, но не знаю, как ее решить.Я хочу иметь две переменные x для моей регрессии, поэтому я объединил их в коде. Однако я не получаю эту ошибку и не знаю, как изменить мой массив, чтобы решить эту проблему.

 from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score,mean_squared_error

X = maindf[['Graduate Degree','Asian American Population']].values.reshape(-1,1)
Y = maindf["Democrats 2016"].values.reshape(-1,1)
x_train, x_test, y_train, y_test, = train_test_split(X, Y,train_size=49, random_state=np.random)
DecisionTreeRegModel = DecisionTreeRegressor(max_depth=3).fit(x_train, y_train)
y_pred = DecisionTreeRegModel.predict(x_test)
from sklearn import tree

Вот ошибка.

 ValueError                                Traceback (most recent call last)
<ipython-input-85-9aaccff5b23d> in <module>
      5 X = maindf[['Graduate Degree','Asian American Population']].values.reshape(-1,1)
      6 Y = maindf["Democrats 2016"].values.reshape(-1,1)
----> 7 x_train, x_test, y_train, y_test, = train_test_split(X, Y,train_size=49, random_state=np.random)
      8 DecisionTreeRegModel = DecisionTreeRegressor(max_depth=3).fit(x_train, y_train)
      9 y_pred = DecisionTreeRegModel.predict(x_test)

~anaconda3libsite-packagessklearnmodel_selection_split.py in train_test_split(*arrays, **options)
   2125         raise TypeError("Invalid parameters passed: %s" % str(options))
   2126 
-> 2127     arrays = indexable(*arrays)
   2128 
   2129     n_samples = _num_samples(arrays[0])

~anaconda3libsite-packagessklearnutilsvalidation.py in indexable(*iterables)
    291     """
    292     result = [_make_indexable(X) for X in iterables]
--> 293     check_consistent_length(*result)
    294     return result
    295 

~anaconda3libsite-packagessklearnutilsvalidation.py in check_consistent_length(*arrays)
    254     uniques = np.unique(lengths)
    255     if len(uniques) > 1:
--> 256         raise ValueError("Found input variables with inconsistent numbers of"
    257                          " samples: %r" % [int(l) for l in lengths])
    258 

ValueError: Found input variables with inconsistent numbers of samples: [100, 50]

1. Вы проверили X.shape и y.shape поняли, почему train_test_split один ввод содержит 100 строк, а другой — только 50 строк?

2. В сторону: .to_numpy() рекомендуется завершить .values . Смотрите Документы .

Ответ №1:

Вам не нужно изменять свои предикторы, это приведет к выравниванию вашей матрицы, поэтому вместо:

 X = maindf[['Graduate Degree','Asian American Population']].values.reshape(-1,1)

Сделать:

 X = maindf[['Graduate Degree','Asian American Population']]

Ниже приведен ваш код с примером набора данных:

 import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score,mean_squared_error

maindf = pd.DataFrame({'Graduate Degree':np.random.choice([0,1],100),
                      'Asian American Population':np.random.choice([0,1],100),
                      "Democrats 2016":np.random.choice([0,1],100)})

X = maindf[['Graduate Degree','Asian American Population']]
Y = maindf["Democrats 2016"].values.reshape(-1,1)
x_train, x_test, y_train, y_test, = train_test_split(X, Y,train_size=49, random_state=np.random)
DecisionTreeRegModel = DecisionTreeRegressor(max_depth=3).fit(x_train, y_train)
y_pred = DecisionTreeRegModel.predict(x_test)