Kaggle Ames Housing Prices Regression Best Predictor Variables

First I tran my train data with random forest algorithm. This is a proven algorithm with its success. Outset I try to run across results most information technology.

          X = train.drop('SalePrice',centrality = 1)
y = railroad train['SalePrice']          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Then I railroad train the "X" data with "y" characterization and take the predictions from "X_test" data which is test data features.

          regr = RandomForestRegressor(max_depth=ii, random_state=0)
regr.fit(X_train, y_train)          predictions = regr.predict(X_test)

The result is: 2220031963.926703

          mean_squared_error(predictions, y_test)

That seems very high. Withal, the log transformations change it to very low.

Then I apply randomized search for it. Notwithstanding. I delete the running code because it takes then much time on kernel. The lawmaking is taken from some other medium post.

          # Number of copse in random forest
n_estimators = [int(x) for 10 in np.linspace(starting time = 200, stop = 2000, num = x)]
# Number of features to consider at every split up
max_features = ['motorcar', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(10) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [one, 2, 4]
# Method of selecting samples for grooming each tree
bootstrap = [True, False]
# Create the random grid
random_grid = {'n_estimators': n_estimators,
            'max_features': max_features,
            'max_depth': max_depth,
            'min_samples_split': min_samples_split,
            'min_samples_leaf': min_samples_leaf,
            'bootstrap': bootstrap}
print(random_grid)

This random search is used for random forest algorithm. You can use it for all the other auto learning algorithms if you want.

Next, I extract PCA features with PCA analysis. The total column number is three.

          from sklearn.decomposition import PCA
pca = PCA(n_components=iii)
principalComponents_train = pca.fit_transform(X)
principalComponents_test = pca.fit_transform(test)
sum(pca.explained_variance_ratio_)

Then, I load these features into the "train" and "examination" dataframe.

          train['component_1'] = [i[0] for i in principalComponents_train]
train['component_2'] = [i[1] for i in principalComponents_train]
train['component_3'] = [i[2] for i in principalComponents_train]          test['component_1'] = [i[0] for i in principalComponents_test]
test['component_2'] = [i[1] for i in principalComponents_test]
test['component_3'] = [i[2] for i in principalComponents_test]

once more some steps for random wood algorithm.

          Ten = train.drop('SalePrice',centrality = ane)
y = train['SalePrice']          X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)          regr = RandomForestRegressor(n_estimators = 400,min_samples_split = 2,min_samples_leaf = one,max_features= 'sqrt',max_depth =None,bootstrap= False)
regr.fit(X, y)          predictions = regr.predict(10)          mean_squared_error(predictions, y)

The error charge per unit is 23.29888698630137. That is lower than before.

Ensemble Learning

This method is like to ensemble learning. I utilise and bagging algorithm in the end. I had only used it. for details you can search the role and library on the google. I was using seven unlike regressor for machine learning table to employ as ensemble learning.

          model_1 = RandomForestRegressor(n_estimators = 400,min_samples_split = 2,min_samples_leaf = 1,max_features= 'sqrt',max_depth =None,bootstrap= Faux)
model_1.fit(X, y)          predict_1 = model_1.predict(X)          model_2= linear_model.Ridge()
model_2.fit(Ten,y)
predict_2 =model_2.predict(10)          model_3 =KNeighborsRegressor(10,weights='compatible')
model_3.fit(X,y)
predict_3 = model_3.predict(X)          model_4 = linear_model.BayesianRidge()
model_4.fit(X,y)
predict_4 =model_4.predict(Ten)          model_5 = tree.DecisionTreeRegressor(max_depth=ane)
model_5.fit(X,y)
predict_5 =model_5.predict(10)          model_6= svm.SVR(C=one.0, epsilon=0.2)
model_6.fit(Ten,y)
predict_6 = model_6.predict(X)          model_7 = xgb.XGBRegressor()
model_7.fit(X,y)
predict_7 = model_7.predict(X)

Then, I collect them in an other dataframe.

          final_df = pd.DataFrame()
final_df['SalePrice'] = y          final_df['RandomForest'] = predict_1
final_df['Ridge'] = predict_2
final_df['Kneighboors'] = predict_3
final_df['BayesianRidge'] = predict_4
final_df['DecisionTreeRegressor'] = predict_5
final_df['Svm'] = predict_6
final_df['XGBoost'] = predict_7

I loaded predictions into this dataframe. Adjacent, I will use bagging algorithm for predictions.

Once more, if y'all print the errors on the information, the almost authentic is random woods.

          print(mean_squared_error(final_df['SalePrice'], predict_1))
print(mean_squared_error(final_df['SalePrice'], predict_2))
print(mean_squared_error(final_df['SalePrice'], predict_3))
print(mean_squared_error(final_df['SalePrice'], predict_4))
print(mean_squared_error(final_df['SalePrice'], predict_5))
impress(mean_squared_error(final_df['SalePrice'], predict_6))
print(mean_squared_error(final_df['SalePrice'], predict_7))

After that, I take the features and characterization from this terminal dataframe and railroad train it BaggingRegressor.

          X_final = final_df.drop('SalePrice',axis = 1)
y_final = final_df['SalePrice']          model_last = RandomForestRegressor()
model_last.fit(X_final, y_final)          predict_final = model_last.predict(X_final)          final_dt = RandomForestRegressor()            
model_last = BaggingRegressor(base_estimator=final_dt, n_estimators=40, random_state=1, oob_score=True)          model_last.fit(X_final, y_final)
predict_final = model_last.predict(X_final)          acc_oob = model_last.oob_score_
print(acc_oob)

The mistake is 8578886.582733957.

          mean_squared_error(predict_final, y_final)

That is very loftier. Nonetheless, but random wood very with pca values was very low. I select that mode over this complicated 1. Sometimes fifty-fifty if the ideas seems good results could not be pleasant by ways of motorcar learning research. This area is fuzzy.

All the same the results are not delightful, I will explicate the other steps. This methodology could piece of work with some changes and could be improved.

Test Case

We predict previous model the examination dataframe.

          test_predictions_1 = model_1.predict(examination)
test_predictions_2 = model_2.predict(test)
test_predictions_3 = model_3.predict(examination)
test_predictions_4 = model_4.predict(test)
test_predictions_5 = model_5.predict(test)
test_predictions_6 = model_6.predict(test)
test_predictions_7 = model_7.predict(test)

Next, I create another dataframe for test results.

          test_final_df = pd.DataFrame()          test_final_df['RandomForest'] = test_predictions_1
test_final_df['Ridge'] = test_predictions_2
test_final_df['Kneighboors'] = test_predictions_3
test_final_df['BayesianRidge'] = test_predictions_4
test_final_df['DecisionTreeRegressor'] = test_predictions_5
test_final_df['Svm'] = test_predictions_6
test_final_df['XGBoost'] = test_predictions_7

Finally, I predict the last dataframe with lastly trained model.

          last_predictions = model_last.predict(test_final_df)

Then, I load the submission csv,

          submission = pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/sample_submission.csv')

Then matching the right values with correct indexis.

          submission['SalePrice'] = last_predictions

I changed this last_predictions with "test_predictions_1" variable. Finally I write the csv file into kaggle platform. That is it. Then, you lot should find the output and submit it.

          submission.to_csv('submission.csv',index = Simulated)

Cheers for reading. Have a nice week.

Don't forget to give us your 👏 !

blacklorineve.blogspot.com

Source: https://becominghuman.ai/house-prices-advanced-regression-techniques-ad9341385712

Kaggle Ames Housing Prices Regression Best Predictor Variables

Ensemble Learning

Test Case

Don't forget to give us your 👏 !

0 Response to "Kaggle Ames Housing Prices Regression Best Predictor Variables"

Enregistrer un commentaire

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel