Kaggle Ames Housing Prices Regression Best Predictor Variables
First I tran my train data with random forest algorithm. This is a proven algorithm with its success. Outset I try to run across results most information technology.
X = train.drop('SalePrice',centrality = 1)
y = railroad train['SalePrice'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
Then I railroad train the "X" data with "y" characterization and take the predictions from "X_test" data which is test data features.
regr = RandomForestRegressor(max_depth=ii, random_state=0)
regr.fit(X_train, y_train) predictions = regr.predict(X_test)
The result is: 2220031963.926703
mean_squared_error(predictions, y_test)
That seems very high. Withal, the log transformations change it to very low.
Then I apply randomized search for it. Notwithstanding. I delete the running code because it takes then much time on kernel. The lawmaking is taken from some other medium post.
# Number of copse in random forest
n_estimators = [int(x) for 10 in np.linspace(starting time = 200, stop = 2000, num = x)]
# Number of features to consider at every split up
max_features = ['motorcar', 'sqrt']
# Maximum number of levels in tree
max_depth = [int(10) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10]
# Minimum number of samples required at each leaf node
min_samples_leaf = [one, 2, 4]
# Method of selecting samples for grooming each tree
bootstrap = [True, False]
# Create the random grid
random_grid = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap}
print(random_grid)
This random search is used for random forest algorithm. You can use it for all the other auto learning algorithms if you want.
Next, I extract PCA features with PCA analysis. The total column number is three.
from sklearn.decomposition import PCA
pca = PCA(n_components=iii)
principalComponents_train = pca.fit_transform(X)
principalComponents_test = pca.fit_transform(test)
sum(pca.explained_variance_ratio_)
Then, I load these features into the "train" and "examination" dataframe.
train['component_1'] = [i[0] for i in principalComponents_train]
train['component_2'] = [i[1] for i in principalComponents_train]
train['component_3'] = [i[2] for i in principalComponents_train] test['component_1'] = [i[0] for i in principalComponents_test]
test['component_2'] = [i[1] for i in principalComponents_test]
test['component_3'] = [i[2] for i in principalComponents_test]
once more some steps for random wood algorithm.
Ten = train.drop('SalePrice',centrality = ane)
y = train['SalePrice'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) regr = RandomForestRegressor(n_estimators = 400,min_samples_split = 2,min_samples_leaf = one,max_features= 'sqrt',max_depth =None,bootstrap= False)
regr.fit(X, y) predictions = regr.predict(10) mean_squared_error(predictions, y)
The error charge per unit is 23.29888698630137. That is lower than before.
Ensemble Learning
This method is like to ensemble learning. I utilise and bagging algorithm in the end. I had only used it. for details you can search the role and library on the google. I was using seven unlike regressor for machine learning table to employ as ensemble learning.
model_1 = RandomForestRegressor(n_estimators = 400,min_samples_split = 2,min_samples_leaf = 1,max_features= 'sqrt',max_depth =None,bootstrap= Faux)
model_1.fit(X, y) predict_1 = model_1.predict(X) model_2= linear_model.Ridge()
model_2.fit(Ten,y)
predict_2 =model_2.predict(10) model_3 =KNeighborsRegressor(10,weights='compatible')
model_3.fit(X,y)
predict_3 = model_3.predict(X) model_4 = linear_model.BayesianRidge()
model_4.fit(X,y)
predict_4 =model_4.predict(Ten) model_5 = tree.DecisionTreeRegressor(max_depth=ane)
model_5.fit(X,y)
predict_5 =model_5.predict(10) model_6= svm.SVR(C=one.0, epsilon=0.2)
model_6.fit(Ten,y)
predict_6 = model_6.predict(X) model_7 = xgb.XGBRegressor()
model_7.fit(X,y)
predict_7 = model_7.predict(X)
Then, I collect them in an other dataframe.
final_df = pd.DataFrame()
final_df['SalePrice'] = y final_df['RandomForest'] = predict_1
final_df['Ridge'] = predict_2
final_df['Kneighboors'] = predict_3
final_df['BayesianRidge'] = predict_4
final_df['DecisionTreeRegressor'] = predict_5
final_df['Svm'] = predict_6
final_df['XGBoost'] = predict_7
I loaded predictions into this dataframe. Adjacent, I will use bagging algorithm for predictions.
Once more, if y'all print the errors on the information, the almost authentic is random woods.
print(mean_squared_error(final_df['SalePrice'], predict_1))
print(mean_squared_error(final_df['SalePrice'], predict_2))
print(mean_squared_error(final_df['SalePrice'], predict_3))
print(mean_squared_error(final_df['SalePrice'], predict_4))
print(mean_squared_error(final_df['SalePrice'], predict_5))
impress(mean_squared_error(final_df['SalePrice'], predict_6))
print(mean_squared_error(final_df['SalePrice'], predict_7))
After that, I take the features and characterization from this terminal dataframe and railroad train it BaggingRegressor.
X_final = final_df.drop('SalePrice',axis = 1)
y_final = final_df['SalePrice'] model_last = RandomForestRegressor()
model_last.fit(X_final, y_final) predict_final = model_last.predict(X_final) final_dt = RandomForestRegressor()
model_last = BaggingRegressor(base_estimator=final_dt, n_estimators=40, random_state=1, oob_score=True) model_last.fit(X_final, y_final)
predict_final = model_last.predict(X_final) acc_oob = model_last.oob_score_
print(acc_oob)
The mistake is 8578886.582733957.
mean_squared_error(predict_final, y_final)
That is very loftier. Nonetheless, but random wood very with pca values was very low. I select that mode over this complicated 1. Sometimes fifty-fifty if the ideas seems good results could not be pleasant by ways of motorcar learning research. This area is fuzzy.
All the same the results are not delightful, I will explicate the other steps. This methodology could piece of work with some changes and could be improved.
Test Case
We predict previous model the examination dataframe.
test_predictions_1 = model_1.predict(examination)
test_predictions_2 = model_2.predict(test)
test_predictions_3 = model_3.predict(examination)
test_predictions_4 = model_4.predict(test)
test_predictions_5 = model_5.predict(test)
test_predictions_6 = model_6.predict(test)
test_predictions_7 = model_7.predict(test)
Next, I create another dataframe for test results.
test_final_df = pd.DataFrame() test_final_df['RandomForest'] = test_predictions_1
test_final_df['Ridge'] = test_predictions_2
test_final_df['Kneighboors'] = test_predictions_3
test_final_df['BayesianRidge'] = test_predictions_4
test_final_df['DecisionTreeRegressor'] = test_predictions_5
test_final_df['Svm'] = test_predictions_6
test_final_df['XGBoost'] = test_predictions_7
Finally, I predict the last dataframe with lastly trained model.
last_predictions = model_last.predict(test_final_df)
Then, I load the submission csv,
submission = pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/sample_submission.csv')
Then matching the right values with correct indexis.
submission['SalePrice'] = last_predictions
I changed this last_predictions with "test_predictions_1" variable. Finally I write the csv file into kaggle platform. That is it. Then, you lot should find the output and submit it.
submission.to_csv('submission.csv',index = Simulated)
Cheers for reading. Have a nice week.
Don't forget to give us your đ !
Source: https://becominghuman.ai/house-prices-advanced-regression-techniques-ad9341385712
0 Response to "Kaggle Ames Housing Prices Regression Best Predictor Variables"
Enregistrer un commentaire